1

Working in SQLAlchemy Core, I am trying to create an UPDATE statement that populates a column with an MD5 hash from values in other columns in the same table. I don't know in advance what the column names will be but I want to append the value in each column together and then create the hash value from that. Here is a sense of the SET clause of the SQL I'm trying to generate...

SET master_key = MD5(CONCAT(last_name, first_name))

This update statement could potentially include millions of rows, so I would like the work to be done in the database, rather than bringing the data set into Python, applying the new value, and then writing it back to the DB.

Here is my Python code...

    stmt = self.tbl.update().where(self.tbl.c.master_key==None).\
        values({'master_key': func.MD5(concat(key_col_names))})
    qry_engine.execute(statement=stmt)

key_col_names is a string containing a list of column names, separated by commas (eg. 'last_name, first_name').

SQLAlchemy appears to be generating the following where I have the MD5 function: MD5('last_name, first_name') and, therefore, the hash value comes out the same on every row. How do I get this to actually use the column names in the query and not the literal string that I provide?

I'm writing this for MySQL right now but it would be great to do it using SQLAlchemy functions that port to other databases, rather than being MySQL-specific.

1 Answer 1

1

Lookup the actual columns from your table and unpack as arguments to CONCAT():

key_cols = [self.tbl.c[name.strip()] for name in key_col_names.split(",")]
stmt = self.tbl.update().where(self.tbl.c.master_key==None).\
        values({'master_key': func.MD5(func.concat(*key_cols))})
qry_engine.execute(statement=stmt)

If the columns in question have string type, use + operator in Python to produce the concatenation expression:

key_cols = [self.tbl.c[name.strip()] for name in key_col_names.split(",")]
# This is a bit of a hack and effectively the same as using reduce and operator.add.
# Another option would be to use a good old for-loop to reduce.
key_col_cat = sum(key_cols)
stmt = self.tbl.update().where(self.tbl.c.master_key==None).\
        values({'master_key': func.MD5(key_col_cat)})
qry_engine.execute(statement=stmt)
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect! Two quick related questions if you have a second. 1) Do you know if self.tbl.primary_key.columns will always return its values in the same order? (if not, I should sort key_cols so I get a consistent result). 2) Will the top version of your answer also work for non-string columns (eg bool, int, numeric)? I'm going to try it myself in a bit but it would be nice to document it here for future reference. Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.