I am trying to process a text file of more than 1GB and saving the data in to Mysql database using python.
I had pasted some sample code below
import os
import MySQLdb as mdb
conn = mdb.connect(user='root', passwd='redhat', db='Xml_Data', host='localhost', charset="utf8")
file_path = "/home/local/user/Main/Module-1.0.4/file_processing/part-00000.txt"
file_open = open('part-00000','r')
for line in file_open:
result_words = line.split('\t')
query = "insert into PerformaceReport (campaignID, keywordID, keyword, avgPosition)"
query += " VALUES (%s,%s,'%s',%s) " % (result_words[0],result_words[1],result_words[2],result_words[3])
cursor = conn.cursor()
cursor.execute( query )
conn.commit()
Actually there are more than 18 columns the data is being inserted in to, i had just pasted only four(for example)
So when i run the above code the execution time is taking some hours
All my doubts are
- Is there any alternate way for processing the 1GB text file in python very fastly ?
- Is there any framework that process the 1GB text file and saves the data in to database very fastly ?
- How to process a text file of large size(1GB) within minutes(is it possible) and save data in to database? All my concern about is , we need to process the 1GB file as fast as possible but not in hours
Edited Code
query += " VALUES (%s,%s,'%s',%s) " % (int(result_words[0] if result_words[0] != '' else ''),int(result_words[2] if result_words[2] != '' else ''),result_words[3] if result_words[3] != '' else '',result_words[4] if result_words[4] != '' else '')
Actually i am submitting the values in the above format(by checking the result existence)
int('')raisesValueError