How to execute larger .sql file to database faster?

Question

I'm currently dealing with a 4gb dump.sql file so I've tried to create a database from it using mysql server console.

These are the commands I've used in the terminal:

mysql -u username -ppassword

mysql> create database test;
mysql> use test;
mysql> source dump.sql

This took me like 3 hours to complete the process. After that I was able to access the created database with no problem.

Specs: 16 cores intel processor, 60gb ram, 120gb ssd.

The thing is I have a dump file of 8gb or more so I'm looking for any faster way to execute the .sql script. I'm not sure the first method is optimized.

I've also tried to do it in python,

import mysql.connector

conn = mysql.connector.connect(user='root', password='root')
cursor = conn.cursor()

cursor.execute(open('dump.sql').read(), multi=True)
conn.commit()

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-7-b5009cf1d04b> in <module>
----> 1 cursor.execute(open('dump.sql').read(), multi=True)

~/miniconda3/lib/python3.7/site-packages/mysql/connector/cursor_cext.py in execute(self, operation, params, multi)
    264             result = self._cnx.cmd_query(stmt, raw=self._raw,
    265                                          buffered=self._buffered,
--> 266                                          raw_as_string=self._raw_as_string)
    267         except MySQLInterfaceError as exc:
    268             raise errors.get_mysql_exception(msg=exc.msg, errno=exc.errno,

~/miniconda3/lib/python3.7/site-packages/mysql/connector/connection_cext.py in cmd_query(self, query, raw, buffered, raw_as_string)
    487             self._cmysql.query(query,
    488                                raw=raw, buffered=buffered,
--> 489                                raw_as_string=raw_as_string)
    490         except MySQLInterfaceError as exc:
    491             raise errors.get_mysql_exception(exc.errno, msg=exc.msg,

OverflowError: size does not fit in an int

This returned a overflow error for int. I couldn't find any help to overcome this error online.

Where are the dumps coming from? Do you have control over that process? — Chris
– Chris, Commented Jan 23, 2020 at 21:42
I'm more of a PostgreSQL guy (and not anything close to a DBA), but I know that its proprietary COPY can be a lot faster than executing SQL. Just wondering if you needed to start from the file you're given or if you had the option to change it. — Chris
– Chris, Commented Jan 23, 2020 at 21:47

Bill Karwin · Accepted Answer · 2020-01-23 21:58:01Z

Importing a dump file produced with mysqldump is notoriously slow. It has to execute SQL statements serially in a single thread, so it doesn't matter how many cores you have on your server. Only one core will be used.

It's unlikely that you can write a python script that does the import any faster, since you are still bound to run SQL statements serially.

Also the dump file contains some client commands that your python script doesn't implement, and the MySQL SQL parser doesn't recognize. You can't execute these client built-in commands using the SQL API. See https://dev.mysql.com/doc/refman/8.0/en/mysql-commands.html

One alternative is to dump using mysqldump --tab which dumps tab-separated data into one file per table instead of one huge .sql file for all tables.

Then import these files with mysqlimport. Internally, this uses LOAD DATA INFILE which is similar to the PostgreSQL COPY command that is mentioned in a comment above by Chris.

Optionally, mysqlimport --use-threads so it imports tables in parallel. In my experience, you get diminishing returns if you try to use more than 4 concurrent threads, even if your CPU has more cores, because you'll max out the rate MySQL can write data.

But loading in parallel will still load each table serially, it won't be split up into pieces. Therefore if your data consists of one very large table and a set of smaller tables (this is a pretty typical scenario), you'll still be bound by the largest individual table.

To do that, you'd basically have to develop your own original data-loading client that splits data and loads in parallel. How much development time are you willing to invest into this, to avoid waiting 6 hours for the larger data load?

One thing to add is if the file can be zipped and transferred to DB server, then unzipped and executed then that would save some time too.
You can use the --compress option to the mysql client if it's the network transfer speed that is the bottleneck, but it's more likely to be the SQL execution time and the fact that it's loading serially.

Collectives™ on Stack Overflow

How to execute larger .sql file to database faster?

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related