3

I am using SQLAlchemy for the first time to export around 6 million records to MySQL. Following is the error I receive:

OperationalError: (mysql.connector.errors.OperationalError) 2055: Lost connection to MySQL server at '127.0.0.1:3306', system error: 10053 An established connection was aborted by the software in your host machine

Code:

import pandas as pd
import sqlalchemy

df=pd.read_excel(r"C:\Users\mazin\1-601.xlsx")

database_username = 'root'
database_password = 'aUtO1115'
database_ip       = '127.0.0.1'
database_name     = 'patenting in psis'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}@{2}/{3}'.
                                               format(database_username, database_password, 
                                                      database_ip, database_name), pool_recycle=1, pool_timeout=30).connect()

df.to_sql(con=database_connection, name='sample', if_exists='replace')
database_connection.close()

Note: I do not get the error if I export around 100 records. After referring to similar posts, I have added the pool_recycle and pool_timeout parameters but the error still persists.

3
  • If you're inserting 6 million rows, you for sure exceed timeout of 30 seconds. Have you tried to input chunks instead all at once. to_sql has optional parameter chunksize that you can use. Commented Feb 9, 2018 at 21:20
  • @PerunSS - I got the same error when I used a timeout of 57600 seconds. Also, when I use the chunksize parameter, it gives me Programming Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%(Maintenance Status (US))s, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'Ma' at line 1 Commented Feb 10, 2018 at 4:01
  • @PerunSS - The use of the chunksize parameter and setting appropriate values for pool_recycle and pool_timeout made the code work. Do you want to post it as an answer? Commented Feb 23, 2018 at 15:26

1 Answer 1

5

Problem is that you're trying to import 6 million rows as one chunk. And it is taking time. With your current config, pool_recycle is set to 1 second, meaning connection will close after 1 second, and that for sure is not enough time to insert 6 mill rows. My suggestion is next:

database_connection = sqlalchemy.create_engine(
    'mysql+mysqlconnector://{0}:{1}@{2}/{3}'.format(
        database_username, 
        database_password,
        database_ip, database_name
    ), pool_recycle=3600, pool_size=5).connect()
df.to_sql(
    con=database_connection, 
    name='sample', 
    if_exists='replace',
    chunksize=1000
)

This will set pool of 5 connections with recycle time of 1 hour. And second line will insert 1000 at a time (instead of all the rows at once). You can experiment with values to achieve best performance.

Sign up to request clarification or add additional context in comments.

1 Comment

This was super useful

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.