Return to Answer

added 155 characters in body

Source Link

edited Apr 4, 2018 at 13:57

17.5k
8
52
93

Here are a few things you can do to improve the performance of the data loading:

convert the JSON file to CSV and use LOAD DATA to load from a file (sample). It is probably the fastest way to do what you are trying to do.

use .executemany() instead of .execute():

 datajson = json.load(testFile)

 insert_query = """
     INSERT INTO 
         cstore (AccountNo) 
     VALUES (%(AccountNo)s)
 """
 cursor.executemany(insert_query, datajson)

look into disabling/removing existing indexes during the insertion and then re-creating them after the insertion is done
make sure you are not doing the data-load "over the internet" and there is no network-latency and bandwidth impact, be "closer" to the database server

loading the JSON file is probably not a bottleneck, but you may also look into faster JSON parsers like ujson
I remember we also had some data-loading-performance-related success with the ultra-fast umysql Python database driver, but it looks like the package has not been maintained for quite a while

Here are a few things you can do to improve the performance of the data loading:

convert the JSON file to CSV and use LOAD DATA to load from a file (sample). It is probably the fastest way to do what you are trying to do.

use .executemany() instead of .execute():

 datajson = json.load(testFile)

 insert_query = """
     INSERT INTO 
         cstore (AccountNo) 
     VALUES (%(AccountNo)s)
 """
 cursor.executemany(insert_query, datajson)

look into disabling/removing existing indexes during the insertion and then re-creating them after the insertion is done
loading the JSON file is probably not a bottleneck, but you may also look into faster JSON parsers like ujson
I remember we also had some data-loading-performance-related success with the ultra-fast umysql Python database driver, but it looks like the package has not been maintained for quite a while

Here are a few things you can do to improve the performance of the data loading:

convert the JSON file to CSV and use LOAD DATA to load from a file (sample). It is probably the fastest way to do what you are trying to do.

use .executemany() instead of .execute():

 datajson = json.load(testFile)

 insert_query = """
     INSERT INTO 
         cstore (AccountNo) 
     VALUES (%(AccountNo)s)
 """
 cursor.executemany(insert_query, datajson)

look into disabling/removing existing indexes during the insertion and then re-creating them after the insertion is done
make sure you are not doing the data-load "over the internet" and there is no network-latency and bandwidth impact, be "closer" to the database server

loading the JSON file is probably not a bottleneck, but you may also look into faster JSON parsers like ujson
I remember we also had some data-loading-performance-related success with the ultra-fast umysql Python database driver, but it looks like the package has not been maintained for quite a while

Source Link

answered Apr 4, 2018 at 13:52

alecxe

17.5k
8
52
93

Here are a few things you can do to improve the performance of the data loading:

convert the JSON file to CSV and use LOAD DATA to load from a file (sample). It is probably the fastest way to do what you are trying to do.

use .executemany() instead of .execute():

 datajson = json.load(testFile)

 insert_query = """
     INSERT INTO 
         cstore (AccountNo) 
     VALUES (%(AccountNo)s)
 """
 cursor.executemany(insert_query, datajson)

look into disabling/removing existing indexes during the insertion and then re-creating them after the insertion is done
loading the JSON file is probably not a bottleneck, but you may also look into faster JSON parsers like ujson
I remember we also had some data-loading-performance-related success with the ultra-fast umysql Python database driver, but it looks like the package has not been maintained for quite a while