2

I have a csv file with about 280 columns, which are possibly changing from time to time. Is there a way to import a csv file to sqlite3 and have it 'guess' the column types? I am using a python script to import this.

1
  • Doesn't sound like a simple CSV file - the site has some tips about simple CSV files. The better question here would be to ask, what data do you have that both spans nearly 300 columns, and is variant? Commented Feb 25, 2014 at 4:47

2 Answers 2

5

If you are able to use a 3rd-party library in this project I recommend pandas.

Using pandas you could do this in two steps:

  1. Read CSV file into pandas DataFrame
  2. Write pandas DataFrame to SQLite

For example:

import pandas, pandas.io.sql, sqlite3

# some sample csv data copied from: http://wesmckinney.com/blog/?p=635
csvfilepath = '/path/to/file.csv'

# `index_col` set to False ensures pandas doesn't use 1st col of data as index
df = pandas.io.parsers.read_csv(csvfilepath,index_col=False) 

# connect to in-memory database for testing; replace `:memory:` w/ file path
con = sqlite3.connect(':memory:')
pandas.io.sql.write_frame(df, 'test_tbl', con)
con.execute('select * from test_tbl').fetchone()
con.close()

Query results:

(u'C00410118',
 u'P20002978',
 u'Bachmann, Michele',
 u'HARVEY, WILLIAM',
 u'MOBILE',
 u'AL',
 366010290,
 u'RETIRED',
 u'RETIRED',
 250,
 u'20-JUN-11',
 None,
 None,
 None,
 u'SA17A',
 736166,
 u'A1FDABC23D2D545A1B83',
 u'P2012')

And with an introspective query you can see that pandas has done the work of creating the table and even inferred the datatypes:

con.execute("select * from sqlite_master where type='table';").fetchone()[4]

Gives:

CREATE TABLE test_tbl (
  [cmte_id] TEXT,
  [cand_id] TEXT,
  [cand_nm] TEXT,
  [contbr_nm] TEXT,
  [contbr_city] TEXT,
  [contbr_st] TEXT,
  [contbr_zip] INTEGER,
  [contbr_employer] TEXT,
  [contbr_occupation] TEXT,
  [contb_receipt_amt] INTEGER,
  [contb_receipt_dt] TEXT,
  [receipt_desc] REAL,
  [memo_cd] REAL,
  [memo_text] REAL,
  [form_tp] TEXT,
  [file_num] INTEGER,
  [tran_id] TEXT,
  [election_tp] TEXT )
Sign up to request clarification or add additional context in comments.

4 Comments

This is perfect! Thanks a lot! I have 5 csv files with 280-350 columns each. You just saved me a lot of time defining column types.
This works fine for most of my files but one is giving me this error: pandas.parser.CParserError: Error tokenizing data. C error: Expected 19 fields in line 22217, saw 20 I pass the csv test in with this: df = pandas.read_csv(io.BytesIO(string_buffer),index_col=False)
I'd have to see that line to comment conclusively. I'm sure malformed CSV data can cause that sort of problem.
I tried this, and it works for most files, but sometimes I get AttributeError: 'numpy.float64' object has no attribute 'type' not sure why.
0

make headers of the columns in csv as the same column names in sqlite3 table. Then directly read and check the type by using type() before inserting into DB.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.