2

I loaded data (from CSV or Excel) into Python's Panda DF.

import pandas as pd
df = pd.read_csv('table1.csv')

Data looks like this:

c1,c2,c3
1,abc,1.5
2,bcd,2.53
3,agf,3.571

How to make SQL create table statement based on data in a dataframe which in this example would be:

create table table1 (c1 int, c2 varchar(3), c3 float);

Thanks.

4
  • Why would you want to generate a statement to create a table when you could use df.to_sql and create the thing yourself? Commented Jul 28, 2018 at 23:36
  • @coldspeed - I need to create a logic that would go trough 1000 CSV files and for each of them - create: 1) create table statement 2) execute that create table statement against mysql db and 3) import each data from DF (or from CSV directly) into a new created MySql table. Commented Jul 29, 2018 at 15:53
  • And like I said. What reason do you need to generate sql statements to do that? Why can't you do the normal things everyone else opts to do? Commented Jul 29, 2018 at 17:20
  • @coldspeed - tried df.to_sql - do not see any sql coming out of it except errors. Commented Jul 30, 2018 at 14:54

1 Answer 1

2

You can add loops to this for any additional datatypes you are dealing with. The result should be a create table statement including your column names and datatypes.

#Get rid of invalid characters in the column names
Col_list = []
for i in range(df.shape[1]):   
    Col_Name = df.columns[i].replace(" - ", "_").replace(" ", "_").replace("&", "and") # Some characters were unacceptable
    Col_list.append(Col_Name)
df.columns = Col_list


#This function creates a create table statement out of my dataframe columns and data types
Text_list = []
for i in range(df.shape[1]):   
    Col_Name = df.columns[i]
    Python = df.convert_dtypes().dtypes[i]  # Most of the data types were listed as object so this reassigns them
    if Python == float:
        Oralce = "FLOAT"
    elif Python == 'datetime64[ns]':
        Oracle = 'DATE'
    elif Python == 'Int64':
        Oracle = "NUMBER"
    else:
        Oracle = "VARCHAR2(50)"
    Text_list.append(Col_Name)
    Text_list.append(' ')
    Text_list.append(Oracle)
    if i < (df.shape[1] - 1): 
        Text_list.append(", ")
Text_Block = ''.join(Text_list)
Text_Block

cursor = conn.cursor()

drop = """
drop TABLE tableName
"""
create = """
    CREATE TABLE tableName (
    {}    )
""".format(Text_Block)

cursor.execute(drop)
cursor.execute(create)
cursor.execute("commit")
cursor.close()
Sign up to request clarification or add additional context in comments.

1 Comment

instead of fixed length of 50, you could try to derive the length by str( df[Col_Name].astype(str).str.len().max() ) (my python needs some practise, but this seems to work)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.