2

I am learning more and more about python, moving from less efficient looping over lists of lists and have been learning numpy/pandas.

Let's say I have a pandas data frame df, for example something that looks like:

| id | Year | Value |
---------------------
| 1  | 2015 | 1.00  |
| 1  | 2014 | 0.80  |
| 2  | 2015 | 1.10  |
| 3  | 2015 | 0.6   |

etc

Now how would I effectively in python, query my DB based on the values in a specific column and then append the queried values to new columns. Right now my code would look something like this

uIDs = set(df['id'])
uIDs = ",".join([str(x) for x in uIDs])

with dbconnection as con:
    data = con.execute("SELECT id, value2 FROM table WHERE ids IN ("+uIDs")")

for i in data.fetchall():
    df[df['id'] == i[0]]['newcol'] = i[1]

Sorry for any errors in the code, I was typing off my head to give an example.

As you see I am trying to pull all unique IDs from the pandas data frame, select the new values from the database and then iteratively loop through them to append to a new column in the data frame.

Is there a more effective/efficient/pythonic way to do this? If you are looking at more than just 3 but 3,000 or 30,000 rows is this the best way to approach it?

1 Answer 1

2

Since you want an SQL-style left join, just use

pd.merge(df, data.fetchall(), on='id', how='left')
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.