3

I am trying to write a script to populate a mySQL database with multiple pandas dataframes. For the sake of simplicity, I will demonstrate here the code to populate the db with a single pandas df

I am connecting to the db as follows:

import mysql.connector import pandas as pd

# create the cursor and the connector
conn = mysql.connector.connect(
        host='localhost',
        user='root',
        password='my_password')

c = conn.cursor(buffered=True)

# Create the database
c.execute('CREATE DATABASE IF NOT EXISTS ss_json_interop')


# Connect now to the ss_json_interop database
conn = mysql.connector.connect(
            host='localhost',
            user='root',
            password='my_password', 
            database='ss_json_interop')

c = conn.cursor(buffered=True)



#### Create the table
c.execute("""CREATE TABLE IF NOT EXISTS sample_sheet_stats_json (
        ss_ID int NOT NULL AUTO_INCREMENT,
        panel text,
        run_ID text,
        sample_ID text,
        i7_index_ID text,
        i7_index_seq text,
        i5_index_ID text,
        i5_index_seq text,
        number_reads_lane1 varchar(255),
        number_reads_lane2 varchar(255),
        total_reads varchar(255),
        PRIMARY KEY (ss_ID)
        )""")


#### create the engine
# more here: https://stackoverflow.com/questions/16476413/how-to-insert-pandas-dataframe-via-mysqldb-into-database
database_username = 'root'
database_password = 'my_password'
database_ip       = '127.0.0.1'
database_name     = 'ss_json_interop'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}@{2}/{3}'.
                                               format(database_username, database_password, 
                                                      database_ip, database_name))

# define the engine
engine = create_engine("mysql+mysqldb://root:my_password@localhost/sample_sheet_stats_json")

I am trying to populate my df into a table called sample_sheet_stats_json. If I do:

df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='replace')

the command works and the table in the db is correctly populated. However, if I replace the if_exists='replace' by if_exists='append':

df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append')

I get a long error message, like so: (the error message is not complete. it continues replicating the structure of my df

(mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'index' in 'field list' [SQL: 'INSERT INTO sample_sheet_stats_json 

Strange enough, I can do df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append') as long as I run df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='replace before') i.e. if the table is already populated.

The same problem was already reported here. However, If I do:

df.to_sql('sample_sheet_stats_json', engine, if_exists='append')

I get the following error message:

(_mysql_exceptions.OperationalError) (2002, "Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)") (Background on this error at: http://sqlalche.me/e/e3q8)

which does not make much sense, as I could already connect to the database with other commands, as shown above.

Does anyone know how can I fix it?

1 Answer 1

8

I have figure out what happened. The error message is telling that there is no column index in the pandas dataframe, which is in fact true.

Therefore I have to simply pass the argument index=False with the command df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append'):

df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append', index=False)

And that solves the problem.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.