background: I have 500 formatted *.txt files that I need to insert into a mysql database. Currently I have a python script to read the files line by line and insert into mySQL database.
Problem: the files are quite big (~100M per txt file), I tested the script and it takes too long to insert just one file to database.
How can I speed up the process by modifying the scripts?
code:
for file in os.listdir(INPUTFILEPATH):
## index += 1
## print "processing %s out of %s files " % (index, totalfiles)
inputfilename = INPUTFILEPATH + "/" + file
open_file = open(inputfilename, 'r')
contents = open_file.readlines()
totalLines = len(contents)
## index2 = 0
for i in range(totalLines):
## index2 +=1
## print "processing %s out of %s lines " % (index2, totalLines)
lineString = contents[i]
lineString = lineString.rstrip('\n')
values = lineString.split('\t')
if ( len(re.findall(r'[0123456789_\'\.]',values[0])) > 0 ):
continue
message = """INSERT INTO %s(word,year,count,volume)VALUES('%s','%s','%s','%s')"""% ('1gram', values[0],values[1],values[2],values[3])
cursor.execute(message)
db.commit()
cursor.close()
db.close()