csv import sqlite3 without specifying column types

Question

I have a csv file with about 280 columns, which are possibly changing from time to time. Is there a way to import a csv file to sqlite3 and have it 'guess' the column types? I am using a python script to import this.

Doesn't sound like a simple CSV file - the site has some tips about simple CSV files. The better question here would be to ask, what data do you have that both spans nearly 300 columns, and is variant? — Makoto
– Makoto, Commented Feb 25, 2014 at 4:47

mechanical_meat · Accepted Answer · 2014-02-25 06:38:33Z

If you are able to use a 3rd-party library in this project I recommend pandas.

Using pandas you could do this in two steps:

For example:

import pandas, pandas.io.sql, sqlite3

# some sample csv data copied from: http://wesmckinney.com/blog/?p=635
csvfilepath = '/path/to/file.csv'

# `index_col` set to False ensures pandas doesn't use 1st col of data as index
df = pandas.io.parsers.read_csv(csvfilepath,index_col=False) 

# connect to in-memory database for testing; replace `:memory:` w/ file path
con = sqlite3.connect(':memory:')
pandas.io.sql.write_frame(df, 'test_tbl', con)
con.execute('select * from test_tbl').fetchone()
con.close()

Query results:

(u'C00410118',
 u'P20002978',
 u'Bachmann, Michele',
 u'HARVEY, WILLIAM',
 u'MOBILE',
 u'AL',
 366010290,
 u'RETIRED',
 u'RETIRED',
 250,
 u'20-JUN-11',
 None,
 None,
 None,
 u'SA17A',
 736166,
 u'A1FDABC23D2D545A1B83',
 u'P2012')

And with an introspective query you can see that pandas has done the work of creating the table and even inferred the datatypes:

con.execute("select * from sqlite_master where type='table';").fetchone()[4]

Gives:

CREATE TABLE test_tbl (
  [cmte_id] TEXT,
  [cand_id] TEXT,
  [cand_nm] TEXT,
  [contbr_nm] TEXT,
  [contbr_city] TEXT,
  [contbr_st] TEXT,
  [contbr_zip] INTEGER,
  [contbr_employer] TEXT,
  [contbr_occupation] TEXT,
  [contb_receipt_amt] INTEGER,
  [contb_receipt_dt] TEXT,
  [receipt_desc] REAL,
  [memo_cd] REAL,
  [memo_text] REAL,
  [form_tp] TEXT,
  [file_num] INTEGER,
  [tran_id] TEXT,
  [election_tp] TEXT )

This is perfect! Thanks a lot! I have 5 csv files with 280-350 columns each. You just saved me a lot of time defining column types.
This works fine for most of my files but one is giving me this error: pandas.parser.CParserError: Error tokenizing data. C error: Expected 19 fields in line 22217, saw 20 I pass the csv test in with this: df = pandas.read_csv(io.BytesIO(string_buffer),index_col=False)
I'd have to see that line to comment conclusively. I'm sure malformed CSV data can cause that sort of problem.
I tried this, and it works for most files, but sometimes I get AttributeError: 'numpy.float64' object has no attribute 'type' not sure why.

loki · Accepted Answer · 2014-02-25 05:51:08Z

0

make headers of the columns in csv as the same column names in sqlite3 table. Then directly read and check the type by using type() before inserting into DB.

answered Feb 25, 2014 at 5:51

loki

3872 silver badges8 bronze badges

Collectives™ on Stack Overflow

csv import sqlite3 without specifying column types

2 Answers 2

4 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Related