0

I have problems splitting the values ​​of bulk-insert because the idea is to make 1 insert every 10 values ​​at a time and reading the entire contents of CSV file

The code already inserts in a single line reading the entire CSV file but I am unable to perform the division of VALUES in the case in the future perform an insert of 10 thousand values ​​at a time.

def bulk_insert(table_name, **kwargs):

    mysqlConnection = MySqlHook(mysql_conn_id='id_db')
    a = mysqlConnection.get_conn()
    c = a.cursor()

    with open('/pasta/arquivo.csv') as f: 
        reader = csv.reader(f, delimiter='\t')

        sql ="""INSERT INTO user (id,user_name) VALUES""" 

            for row in reader:           
                sql +="(" + row[0] + " , '" + row[1] + "'),"
            c.execute(sql[:-1])  

    a.commit()
2
  • I would suggest just using MySQL's LOAD DATA bulk insert tool. No need in re-inventing the wheel and trying to manually do this from a Python script. Commented Oct 31, 2019 at 2:42
  • The script is part of an ETL that is doing SQL from one database, exporting to CSV and then inserting it into another database. If I do it for LOAD DATA, I am not able to split the number of inserts in sequence. Commented Oct 31, 2019 at 2:58

1 Answer 1

2

Something like this ought to work. The batch_csv function is a generator that yields a list of rows of size size on each iteration.

The bulk_insert function is amended to use parameter substitution and the cursor's executemany method. Parameter substitution is safer than manually constructing SQL.

cursor.executemany may batch SQL inserts as in the original function, though this is implementation-dependent and should be tested.

def batch_csv(size=10):
    with open('/pasta/arquivo.csv') as f: 
        reader = csv.reader(f, delimiter='\t')
        batch = []
        for row in reader:
            batch.append(row)
            if len(row) == size:
                yield batch
                del batch[:]
        yield batch


def bulk_insert(table_name, **kwargs):

    mysqlConnection = MySqlHook(mysql_conn_id='id_db')
    a = mysqlConnection.get_conn()
    c = a.cursor()
    sql ="""INSERT INTO user (id,user_name) VALUES (%s, %s)""" 
    batcher = batch_csv()
    for batch in batcher:
        c.executemany(sql, [row[0:2] for row in batch])  

    a.commit()
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your help. In the test I performed the insertion was correct but I was only in doubt it was in a single line or several because you put a force in "executemany" To further explain my doubt, the INSERT command should work like this ... INSERT INTO user (id,username) VALUES (1,'name'),(2,'name) INSERT INTO user (id,username) VALUES (3,'name'),(4,'name) ... It's not like that ... INSERT INTO user (id,username) VALUES (1,'name'),(2,'name),(3,'name'),(4,'name')... I think I should mess with a.commit () to do INSERT in batches.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.