I have a csv file with about 280 columns, which are possibly changing from time to time. Is there a way to import a csv file to sqlite3 and have it 'guess' the column types? I am using a python script to import this.
2 Answers
If you are able to use a 3rd-party library in this project I recommend pandas.
Using pandas you could do this in two steps:
For example:
import pandas, pandas.io.sql, sqlite3
# some sample csv data copied from: http://wesmckinney.com/blog/?p=635
csvfilepath = '/path/to/file.csv'
# `index_col` set to False ensures pandas doesn't use 1st col of data as index
df = pandas.io.parsers.read_csv(csvfilepath,index_col=False) 
# connect to in-memory database for testing; replace `:memory:` w/ file path
con = sqlite3.connect(':memory:')
pandas.io.sql.write_frame(df, 'test_tbl', con)
con.execute('select * from test_tbl').fetchone()
con.close()
Query results:
(u'C00410118',
 u'P20002978',
 u'Bachmann, Michele',
 u'HARVEY, WILLIAM',
 u'MOBILE',
 u'AL',
 366010290,
 u'RETIRED',
 u'RETIRED',
 250,
 u'20-JUN-11',
 None,
 None,
 None,
 u'SA17A',
 736166,
 u'A1FDABC23D2D545A1B83',
 u'P2012')
And with an introspective query you can see that pandas has done the work of creating the table and even inferred the datatypes:
con.execute("select * from sqlite_master where type='table';").fetchone()[4]
Gives:
CREATE TABLE test_tbl ( [cmte_id] TEXT, [cand_id] TEXT, [cand_nm] TEXT, [contbr_nm] TEXT, [contbr_city] TEXT, [contbr_st] TEXT, [contbr_zip] INTEGER, [contbr_employer] TEXT, [contbr_occupation] TEXT, [contb_receipt_amt] INTEGER, [contb_receipt_dt] TEXT, [receipt_desc] REAL, [memo_cd] REAL, [memo_text] REAL, [form_tp] TEXT, [file_num] INTEGER, [tran_id] TEXT, [election_tp] TEXT )
4 Comments
Eric
 This is perfect! Thanks a lot! I have 5 csv files with 280-350 columns each. You just saved me a lot of time defining column types.
  Eric
 This works fine for most of my files but one is giving me this error: pandas.parser.CParserError: Error tokenizing data. C error: Expected 19 fields in line 22217, saw 20 I pass the csv test in with this: df = pandas.read_csv(io.BytesIO(string_buffer),index_col=False)
  mechanical_meat
 I'd have to see that line to comment conclusively. I'm sure malformed CSV data can cause that sort of problem.
  Eric
 I tried this, and it works for most files, but sometimes I get AttributeError: 'numpy.float64' object has no attribute 'type' not sure why.
  
