Skip to main content
added 155 characters in body
Source Link
alecxe
  • 17.5k
  • 8
  • 52
  • 93

Here are a few things you can do to improve the performance of the data loading:

  • convert the JSON file to CSV and use LOAD DATA to load from a file (sample). It is probably the fastest way to do what you are trying to do.

  • use .executemany() instead of .execute():

     datajson = json.load(testFile)
    
     insert_query = """
         INSERT INTO 
             cstore (AccountNo) 
         VALUES (%(AccountNo)s)
     """
     cursor.executemany(insert_query, datajson)
    
  • look into disabling/removing existing indexes during the insertion and then re-creating them after the insertion is done

  • make sure you are not doing the data-load "over the internet" and there is no network-latency and bandwidth impact, be "closer" to the database server

  • loading the JSON file is probably not a bottleneck, but you may also look into faster JSON parsers like ujson

  • I remember we also had some data-loading-performance-related success with the ultra-fast umysql Python database driver, but it looks like the package has not been maintained for quite a while

Here are a few things you can do to improve the performance of the data loading:

  • convert the JSON file to CSV and use LOAD DATA to load from a file (sample). It is probably the fastest way to do what you are trying to do.

  • use .executemany() instead of .execute():

     datajson = json.load(testFile)
    
     insert_query = """
         INSERT INTO 
             cstore (AccountNo) 
         VALUES (%(AccountNo)s)
     """
     cursor.executemany(insert_query, datajson)
    
  • look into disabling/removing existing indexes during the insertion and then re-creating them after the insertion is done

  • loading the JSON file is probably not a bottleneck, but you may also look into faster JSON parsers like ujson

  • I remember we also had some data-loading-performance-related success with the ultra-fast umysql Python database driver, but it looks like the package has not been maintained for quite a while

Here are a few things you can do to improve the performance of the data loading:

  • convert the JSON file to CSV and use LOAD DATA to load from a file (sample). It is probably the fastest way to do what you are trying to do.

  • use .executemany() instead of .execute():

     datajson = json.load(testFile)
    
     insert_query = """
         INSERT INTO 
             cstore (AccountNo) 
         VALUES (%(AccountNo)s)
     """
     cursor.executemany(insert_query, datajson)
    
  • look into disabling/removing existing indexes during the insertion and then re-creating them after the insertion is done

  • make sure you are not doing the data-load "over the internet" and there is no network-latency and bandwidth impact, be "closer" to the database server

  • loading the JSON file is probably not a bottleneck, but you may also look into faster JSON parsers like ujson

  • I remember we also had some data-loading-performance-related success with the ultra-fast umysql Python database driver, but it looks like the package has not been maintained for quite a while

Source Link
alecxe
  • 17.5k
  • 8
  • 52
  • 93

Here are a few things you can do to improve the performance of the data loading:

  • convert the JSON file to CSV and use LOAD DATA to load from a file (sample). It is probably the fastest way to do what you are trying to do.

  • use .executemany() instead of .execute():

     datajson = json.load(testFile)
    
     insert_query = """
         INSERT INTO 
             cstore (AccountNo) 
         VALUES (%(AccountNo)s)
     """
     cursor.executemany(insert_query, datajson)
    
  • look into disabling/removing existing indexes during the insertion and then re-creating them after the insertion is done

  • loading the JSON file is probably not a bottleneck, but you may also look into faster JSON parsers like ujson

  • I remember we also had some data-loading-performance-related success with the ultra-fast umysql Python database driver, but it looks like the package has not been maintained for quite a while