Method to transfer huge CSV into database is good because we can easily use SQL query.
We have also to take into account two things.
FIRST POINT: SQL also are not a rubber, it will not be able to stretch the memory.
For example converted to bd file:
https://nycopendata.socrata.com/Social-Services/311-Service-Requests-
from-2010-to-Present/erm2-nwe9
For this db file SQL language:
pd.read_sql_query("SELECT * FROM 'table'LIMIT 600000", Mydatabase)
It can read maximum about 0,6 mln records no more with 16 GB RAM memory of PC (time of operation 15,8 second).
It could be malicious to add that downloading directly from a csv file is a bit more efficient:
giga_plik = 'c:/1/311_Service_Requests_from_2010_to_Present.csv'
Abdul = pd.read_csv(giga_plik, nrows=1100000)
(time of operation 16,5 second)
SECOND POINT: To effectively using SQL data series converted from CSV we ought to memory about suitable form of date. So I proposer add to ryguy72's code this:
df['ColumnWithQuasiDate'] = pd.to_datetime(df['Date'])
All code for file 311 as about I pointed:
start_time = time.time()
### sqlalchemy create_engine
plikcsv = 'c:/1/311_Service_Requests_from_2010_to_Present.csv'
WM_csv_datab7 = create_engine('sqlite:///C:/1/WM_csv_db77.db')
#----------------------------------------------------------------------
chunksize = 100000
i = 0
j = 1
## --------------------------------------------------------------------
for df in pd.read_csv(plikcsv, chunksize=chunksize, iterator=True, encoding='utf-8', low_memory=False):
df = df.rename(columns={c: c.replace(' ', '') for c in df.columns})
## -----------------------------------------------------------------------
df['CreatedDate'] = pd.to_datetime(df['CreatedDate']) # to datetimes
df['ClosedDate'] = pd.to_datetime(df['ClosedDate'])
## --------------------------------------------------------------------------
df.index += j
i+=1
df.to_sql('table', WM_csv_datab7, if_exists='append')
j = df.index[-1] + 1
print(time.time() - start_time)
At the end I would like to add: converting a csv file directly from the Internet to db seems to me a bad idea. I propose to download base and convert locally.
split -l 100 filename) into smaller files before processing them with python.