1

I have to issue about ~1M sql queries in the following form:

update table1 ta join table2 tr on ta.tr_id=tr.id 
set start_date=null, end_date=null 
where title_id='X' and territory_id='AG' and code='FREE';

The sql statements are in a text document -- I can only copy paste them in as-is.

What would be the fastest way to do this? Is there some checks that I can disable so it only inserts them at the end? For example something like:

start transaction;
copy/paste all sql statements here;
commit;

I tried the above approach but saw zero speed improvement on the inserts. Are there any other things I can try?

3
  • Is there some way you can combine them into a single update? Is there any pattern to what you're doing? Commented Feb 10, 2020 at 18:31
  • @Barmar not sure, but the statements are from excel. Commented Feb 10, 2020 at 18:31
  • Are they all testing the same columns? Make sure you have a multi-column index on them. Commented Feb 10, 2020 at 18:32

1 Answer 1

2

The performance cost is partly attributed to running 1M separate SQL statements, but it's also attributed to the cost of rewriting rows and the corresponding indexes.

What I mean is, there are several steps to executing an SQL statement, and each of them take non-zero amount of time:

  1. Start a transaction.
  2. Parse the SQL, validate the syntax, check your privileges to make sure you have permission to update those tables, etc.
  3. Change the values you updated in the row.
  4. Change the values you updated in each index on that table that contain the columns you changed.
  5. Commit the transaction.

In autocommit mode, the start & commit transaction implicitly happens for every SQL statement, so that causes maximum overhead. Using explict START and COMMIT as you showed reduces that overhead by doing each once.

Caveat: I don't usually run 1M updates in a single transaction. That causes other types of overhead, because MySQL needs to keep the original rows in case you ROLLBACK. As a compromise, I would execute maybe 1000 updates, then commit and start a new transaction. That at least reduces the START/COMMIT overhead by 99.9%.

In any case, the overhead of transactions isn't great. It might be unnoticeable compared to the cost of updating indexes.

MyISAM tables have an option to DISABLE KEYS, which means it doesn't have to update non-unique indexes during the transaction. But this might not be a good optimization for you, because (a) you might need indexes to be active, to help performance of lookups in your WHERE clause and the joins; and (b) it doesn't work in InnoDB, which is the default storage engine, and it's a better idea to use InnoDB.

You could also review if you have too many indexes or redundant indexes on your table. There's no sense having extra indexes you don't need, which only add cost to your updates.

There's also a possibility that you don't have enough indexes, and your UPDATE is slow because it's doing a table-scan for every statement. The table-scans might be so expensive that you'd be better off creating the needed indexes to optimize the lookups. You should use EXPLAIN to see if your UPDATE statement is well-optimized.

If you want me to review that, please run SHOW CREATE TABLE <tablename> for each of your tables in your update, and run EXPLAIN UPDATE ... for your example SQL statement. Add the output to your question above (please don't paste in a comment).

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.