16

Has anyone experienced this before?

I have a table with "int" and "varchar" columns - a report schedule table.

I am trying to import an excel file with ".xls" extension to this table using a python program. I am using pandas to_sql to read in 1 row of data.

Data imported is 1 row 11 columns.

Import works successfully but after the import I noticed that the datatypes in the original table have now been altered from:

        int --> bigint
        char(1) --> varchar(max)
        varchar(30) --> varchar(max)

Any idea how I can prevent this? The switch in datatypes is causing issues in downstrean routines.

   df = pd.read_excel(schedule_file,sheet_name='Schedule')
   params = urllib.parse.quote_plus(r'DRIVER={SQL Server};SERVER=<<IP>>;DATABASE=<<DB>>;UID=<<UDI>>;PWD=<<PWD>>')
   conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
   engine = create_engine(conn_str)

   table_name='REPORT_SCHEDULE'
   df.to_sql(name=table_name,con=engine, if_exists='replace',index=False)

TIA

2 Answers 2

38

Consider using the dtype argument of pandas.DataFrame.to_sql where you pass a dictionary of SQLAlchemy types to named columns:

import sqlalchemy 
...
data.to_sql(name=table_name, con=engine, if_exists='replace', index=False,
            dtype={'name_of_datefld': sqlalchemy.types.DateTime(), 
                   'name_of_intfld': sqlalchemy.types.INTEGER(),
                   'name_of_strfld': sqlalchemy.types.VARCHAR(length=30),
                   'name_of_floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True),
                   'name_of_booleanfld': sqlalchemy.types.Boolean}
Sign up to request clarification or add additional context in comments.

1 Comment

I have tried this, but it does not strictly ensure the datatype is followed by the dataframe. I still can use another dataframe with totally different column name and datatype and insert within the table.
4

I think this has more to do with how pandas handles the table if it exists. The "replace" value to the if_exists argument tells pandas to drop your table and recreate it. But when re-creating your table, it will do it based on its own terms (and the data stored in that particular DataFrame).

While providing column datatypes will work, doing it for every such case might be cumbersome. So I would rather truncate the table in a separate statement and then just append data to it, like so:

Instead of:

df.to_sql(name=table_name, con=engine, if_exists='replace',index=False)

I'd do:

with engine.connect() as con:
    con.execute("TRUNCATE TABLE %s" % table_name)

df.to_sql(name=table_name, con=engine, if_exists='append',index=False)

The truncate statement basically drops and recreates your table too, but it's done internally by the database, and the table gets recreated with the same definition.

4 Comments

Ok so if we do with this approach, what would be a size of the varchar type column as i have many string column/field with values that are greater than 255. In the past, using dataframe.Write() using jdbc, ended up just creating varchar(255) and causing me issues.
@Ak777 TRUNCATE statements generally tell the database to drop and recreate the table with the same definition. So whatever length and type the columns had before, they will end up the same way. This is the beauty of it, because Pandas cannot interfere with that.
Thanks for the input @Bogdan Mircea, so what would happen if use if_exists='replace' as param value? Coz, i've experienced in other approach that replace would also replace the table schema based on the Dataframe schema. And thats where i have issues, where string type fields are just getting converted to varchar(255) in Oracle whlie the values are larger in size than 255. It works fine in MS SQL as it converts to varchar(MAX). Just the Oracle part is where i end up into this issue.
@Ak777 It pretty much does what it says. It will replace the table, but this will be orchestrated by Pandas, not the underlying database. Pandas will issue a DROP command and then make up a CREATE statement based on the dataframe definition. And, while I don't know for sure, it most likely will default to bigger datatypes and lengths than you would when creating a table.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.