Pandas to_sql changing datatype in database table

Question

Has anyone experienced this before?

I have a table with "int" and "varchar" columns - a report schedule table.

I am trying to import an excel file with ".xls" extension to this table using a python program. I am using pandas to_sql to read in 1 row of data.

Data imported is 1 row 11 columns.

Import works successfully but after the import I noticed that the datatypes in the original table have now been altered from:

        int --> bigint
        char(1) --> varchar(max)
        varchar(30) --> varchar(max)

Any idea how I can prevent this? The switch in datatypes is causing issues in downstrean routines.

   df = pd.read_excel(schedule_file,sheet_name='Schedule')
   params = urllib.parse.quote_plus(r'DRIVER={SQL Server};SERVER=<<IP>>;DATABASE=<<DB>>;UID=<<UDI>>;PWD=<<PWD>>')
   conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
   engine = create_engine(conn_str)

   table_name='REPORT_SCHEDULE'
   df.to_sql(name=table_name,con=engine, if_exists='replace',index=False)

TIA

scūriolus · Accepted Answer · 2021-10-21 02:58:26Z

38

Consider using the dtype argument of pandas.DataFrame.to_sql where you pass a dictionary of SQLAlchemy types to named columns:

import sqlalchemy 
...
data.to_sql(name=table_name, con=engine, if_exists='replace', index=False,
            dtype={'name_of_datefld': sqlalchemy.types.DateTime(), 
                   'name_of_intfld': sqlalchemy.types.INTEGER(),
                   'name_of_strfld': sqlalchemy.types.VARCHAR(length=30),
                   'name_of_floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True),
                   'name_of_booleanfld': sqlalchemy.types.Boolean}

edited Oct 21, 2021 at 2:58

scūriolus

1,0303 gold badges9 silver badges24 bronze badges

answered Nov 14, 2018 at 16:36

Parfait

108k19 gold badges102 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

curiouz Jan 20 at 5:07

I have tried this, but it does not strictly ensure the datatype is followed by the dataframe. I still can use another dataframe with totally different column name and datatype and insert within the table.

Bogdan Mircea · Accepted Answer · 2021-11-22 11:03:42Z

4

I think this has more to do with how pandas handles the table if it exists. The "replace" value to the if_exists argument tells pandas to drop your table and recreate it. But when re-creating your table, it will do it based on its own terms (and the data stored in that particular DataFrame).

While providing column datatypes will work, doing it for every such case might be cumbersome. So I would rather truncate the table in a separate statement and then just append data to it, like so:

Instead of:

df.to_sql(name=table_name, con=engine, if_exists='replace',index=False)

I'd do:

with engine.connect() as con:
    con.execute("TRUNCATE TABLE %s" % table_name)

df.to_sql(name=table_name, con=engine, if_exists='append',index=False)

The truncate statement basically drops and recreates your table too, but it's done internally by the database, and the table gets recreated with the same definition.

answered Nov 22, 2021 at 11:03

Bogdan Mircea

9571 gold badge8 silver badges11 bronze badges

4 Comments

Ak777 Over a year ago

Ok so if we do with this approach, what would be a size of the varchar type column as i have many string column/field with values that are greater than 255. In the past, using dataframe.Write() using jdbc, ended up just creating varchar(255) and causing me issues.

Bogdan Mircea Over a year ago

@Ak777 TRUNCATE statements generally tell the database to drop and recreate the table with the same definition. So whatever length and type the columns had before, they will end up the same way. This is the beauty of it, because Pandas cannot interfere with that.

Ak777 Over a year ago

Thanks for the input @Bogdan Mircea, so what would happen if use if_exists='replace' as param value? Coz, i've experienced in other approach that replace would also replace the table schema based on the Dataframe schema. And thats where i have issues, where string type fields are just getting converted to varchar(255) in Oracle whlie the values are larger in size than 255. It works fine in MS SQL as it converts to varchar(MAX). Just the Oracle part is where i end up into this issue.

Bogdan Mircea Over a year ago

@Ak777 It pretty much does what it says. It will replace the table, but this will be orchestrated by Pandas, not the underlying database. Pandas will issue a DROP command and then make up a CREATE statement based on the dataframe definition. And, while I don't know for sure, it most likely will default to bigger datatypes and lengths than you would when creating a table.

Collectives™ on Stack Overflow

Pandas to_sql changing datatype in database table

2 Answers 2

1 Comment

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Linked

Related