Pandas newbie.
A SQL table is made of 3 columns (ID is the primary key):
> ID VALUE1 VALUE2
> 1 11 28
> 2 21 (None)
> 3 31 56
> 4 41 (None)
With Pandas I load all the rows where VALUE2 is (None):
query = "SELECT * FROM `TABLE_NAME` WHERE (`VALUE2` IS NULL)"
engine = create_engine("mysql://user:pwd@ip/db"
df = pd.read_sql(query, con=engine)
engine.dispose()
Everything ok till now.
Following the load the missing VALUE2 are calculated according to some rules.
THE PROBLEM
If I update the database with
df.to_sql(TABLE_NAME, con=engine, if_exists="replace", index=False)
All the original lines that were not loaded into the dataframe are LOST:
> ID VALUE1 VALUE2
> 2 21 103
> 4 41 72
Is there a way to update leaving the original lines untouched?
I want to obtain this:
> ID VALUE1 VALUE2
> 1 11 28
> 2 21 103
> 3 31 56
> 4 41 72
It looks like the whole table is rewritten instead of updated...
It would be highly inefficient to load the whole table just to update a few rows. That would virtually solve the problem but it is not acceptable.
Any idea about "why"?