23

I'm working with the MySQLdb module in Python to interact with a database. I have a situation where there is a very large list (tens of thousands of elements) which I need to insert as rows into a table.

My solution right now is to generate a large INSERT statement as a string and execute it.

Is there a smarter way?

3 Answers 3

20

There is a smarter way.

The problem with bulk insertions is that by default autocommit is enabled thus causing each insert statement to be saved to stable store before the next insert can initiate.

As the manual page notes:

By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk to make it permanent. To disable autocommit mode, use the following statement:

SET autocommit=0; 

After disabling autocommit mode by setting the autocommit variable to zero, changes to transaction-safe tables (such as those for InnoDB, BDB, or NDBCLUSTER) are not made permanent immediately. You must use COMMIT to store your changes to disk or ROLLBACK to ignore the changes.

This is a pretty common feature of RDBMs systems which presume that database integrity is paramount. It does make bulk inserts take on the order of 1s per insert instead of 1ms. The alternative of making an overlarge insert statement tries to achieve this single commit at risk of overloading the SQL parser.

Sign up to request clarification or add additional context in comments.

2 Comments

Starting with 1.2.0, MySQLdb disables autocommit by default, as required by the DB-API standard (PEP-249). source: mysql-python.sourceforge.net/FAQ.html
If you feel that your inserts are still slower than they should be, make sure to also tweak your mysql server settings. In my case my innodb_buffer_pool_size was way too small for my transaction sizes and by raising it I achieved a +40% speedup for bulk inserts. See: dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool.html
14

If you have to insert very large amount of data why are you trying to insert all of them in one single insert? (This will unecessary put load on your memory in making this large insert string and also while executing it. Also this isn't a very good solution if your data to be inserted is very very large.)

Why don't you put one row per insert command in the db and put all the rows using a for...loop and commit all the changes in the end?

con = mysqldb.connect(
                        host="localhost",
                        user="user",
                        passwd="**",
                        db="db name"
                     )
cur = con.cursor()

for data in your_data_list:
    cur.execute("data you want to insert: %s" %data)

con.commit()
con.close()

(Believe me, this is really fast but if you are getting slower results then it means your autocommit must be True. Set it to False as msw says.)

6 Comments

If there is a statement for each insertion, wouldn't it be very slow? I don't mind using memory. It will only be megabytes so I'm not concerned.
no it won`t be slow that is what I want to say... as long as you are not commiting in between the loop. Try both and see if you don't believe me...
I don't know about that as I don't use MYISAM, but if it is so, you can turn off commiting by executing this line SET autocommit = 0; from the mysql shell (or to do so from within the your python program use cur.execute('SET autocommit = 0'); con.commit())
I realize this is very old and maybe at time it was written this wasn't the case but the doc suggests using multi-row inserts to cut down on communication overhead. dev.mysql.com/doc/refman/5.5/en/…
Just a quick side note: When wanting to add data to the insert statement like so: for data in your_data_list: cur.execute("data you want to insert: %s" %data), opens the flood gates to SQL injection attacks. Its best to use the following: for data in your_data_list: cur.execute("data you want to insert: %s",item 1, item 2) (each comma after the SQL statement string would replace a ? in the SQL string with the item) See mkleehammer.github.io/pyodbc under 'Insert Data' Happy Coding!
|
1

As long as you're doing it as a single INSERT and not thousands of individual ones, then yes this is the best way to do it. Watch out for not exceeding mysqls's max packet size, and adjust it if necessary. For example this sets the server packet max to 32Mb. You need to do the same on the client too.

mysqld --max_allowed_packet=32M

2 Comments

this is a hack around the transaction mechanism which addresses the symptom and not the cause
Could you expand a bit on that? Or say what you would do instead?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.