0

I have a pretty huge dataset that needs to be uploaded for database (in my case Postgres). If I use standard to_sql with chunksize of 5000, there will be some bad lines because some data is corrupted. However if I would try to catch exception and iterate over each line to catch them, it would take quite a bit time, iterating over each line takes quite a bit of time.

What I try to do:

def insert_do_nothing_on_conflicts(sqltable, conn, keys, data_iter):
    from sqlalchemy.dialects.postgresql import insert
    from sqlalchemy import table, column
    columns = []
    for c in keys:
        columns.append(column(c))
    if sqltable.schema:
        table_name = '{}.{}'.format(sqltable.schema, sqltable.name)
    else:
        table_name = sqltable.name
    mytable = table(table_name, *columns)
    insert_stmt = insert(mytable).values(list(data_iter))
    do_nothing_stmt = insert_stmt.on_conflict_do_nothing(index_elements=['timestamp'])
    conn.execute(do_nothing_stmt)

def write_to_db(filename):
    chunksize = 10 ** 6
    chunk_count=0
    for chunk in pd.read_csv(filename, chunksize=chunksize,on_bad_lines='skip',header = None,encoding='windows-1251'):

        while chunk_count<=5:
            chunk_count+=1
            continue
        i=0
        while i < len(chunk) + 1:
            #try:
            chunk[i:i+5000].to_sql(name='test_table_2',schema='kkt', con=engine_postgres, index=False, if_exists='append',method='insert_do_nothing_on_conflicts')
            i+=5000
            if i % 100000 ==0:
                print (i//5000, (len(chunk)//5000)+2)
        chunk_count+=1

But I receive error

ValueError: Invalid parameter method: insert_do_nothing_on_conflicts

Method should work and it presents in some code. It does not matter which method I put in code they all do not work.

3
  • What is a good line and what is a bad line? minimal reproducible example Commented Nov 4, 2023 at 23:37
  • Does it matter? I need to skip all errors: incorrect length of line, incorrect datatype etc., etc. What I need is a way to quickly write to DB skipping all errors. Commented Nov 17, 2023 at 9:09
  • remove the quotes around 'insert_do_nothing_on_conflicts' when you specify it as value of method. Commented Nov 29, 2023 at 12:02

1 Answer 1

0

"Searched all the web cannot find the answer."

In that case, let's get back to the docs:

Exception ValueError: Raised when an operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError.

But also

Passing arguments of the wrong type (e.g. passing a list when an int is expected) should result in a TypeError, but passing arguments with the wrong value (e.g. a number outside expected boundaries) should result in a ValueError.


"Method should work"

How are you so sure? According to the above, at least one argument passed to insert_do_nothing_on_conflicts either has an incorrect datatype or an incorrect value.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.