6

I'm trying to import 200 SAS XPT files to my PostgreSQL database:

engine = create_engine('postgresql://user:pwd@server:5432/dbName')
for file in listdir(dataPath):
    name, ext = file.split('.', 1)
    with open(join(dataPath, file), 'rb') as f:
        xport.to_dataframe(f).to_sql(name, engine, schema='schemaName', if_exists='replace', index=False)
    print("Successfully wrote ", file, " to database.")

However, the SQL generated has double quotation marks for all identifiers, for example: CREATE TABLE "Y2009"."ACQ_F" ("SEQN" FLOAT(53), "ACD010A" FLOAT(53));. The problem is, if the column / table / schema is created with quotation marks, every time I need to query them, I must include the quotation marks as well, at the same time use the exact capitalization.

I want to get rid of quotation marks, while I cannot write custom SQLs myself, because these files each has very different structure.

5
  • 1
    If you ensure that the schema, table, and column names are all lower-case, then you can write queries without double-quotes, regardless of how SQLAlchemy creates the table. Commented Jan 3, 2018 at 3:18
  • 2
    If you are reading SAS version 5 transport files then all of the variable (column) names will be in uppercase. You should add a step in your process to change the names to lowercase before pushing into PostgreSQL Then you don't need to worry that quoted identifiers will be used to create them. Note you will still need to use quoted identifiers to refer to columns with names like "from", "select" or other PostgreSQL keywords. Commented Jan 3, 2018 at 3:28
  • @Tom thanks that solved my problem. To help others better understand this: Postgres requires uppercase table / column names to be quoted (reference). Commented Jan 3, 2018 at 4:43
  • @Tom could you please add the answer below so I can mark this question as answered? Thanks! Commented Jan 3, 2018 at 4:46
  • @Ryan It would be better if someone that actually knows HOW to change the case of the variable names in Python could post a working answer. Commented Jan 3, 2018 at 14:24

1 Answer 1

5

PostgreSQL requires uppercase table / column names to be quoted (reference). That is why identifiers in the SQL constructed by SQLalchemy are quoted. To avoid this, convert the column names of the dataframe to all lowercase:

with open(join(dataPath, file), 'rb') as f:
     data = xport.to_dataframe(f)
     data.columns = map(str.lower, data.columns)
     data.to_sql(name.lower(), engine, schema='y2007')
Sign up to request clarification or add additional context in comments.

2 Comments

Same problem and same solution for Oracle!
I can't believe there is no other answer for that

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.