Equivalent of SQL query in Pandas data frame

Question

I have two tables 1.Table_A 2.Table_B I would like to update table_A particular column using table_B new values for that column, it might happen that number of rows from Table_A doesn't match Table_B. I know how to write query for updating Table_A using SQL but not sure how do I do it in pandas, I need an equivalent of update query in pandas

Update Query :

update table_A
    set dt_of_join = sq.dt_of_join
    from (select id_emp, max(joining) as dt_of_join
            from table_B 
            group by id_emp ) as sq
    where table_A.id_emp = sq.id_emp

I need equivalent of above query in Pandas Dataframe, any help really appreciated.

Example :

Table_A
id_emp    |   dt_of_join     
  2       |   30-03-2018
  4       |   03-04-2018
  5       |   04-05-2018
  7       |   10-06-2018
  12      |   20-07-2018
  10      |   09-08-2018
  19      |   25-12-2018

Table B is the subquery that is inside the above query

Table_B
 id_emp   |   dt_of_join
   4      |    01-01-2019
   12     |    03-02-2019
   10     |    09-05-2019
   5      |    21-06-2019

After update query is successful the table_A should look like this

Table_A
id_emp    |   dt_of_join     
  2       |   30-03-2018
  4       |   01-01-2019
  5       |   21-06-2019
  7       |   10-06-2018
  12      |   03-02-2019
  10      |   09-05-2019
  19      |   25-12-2018

U13-Forward · Accepted Answer · 2019-07-04 11:05:35Z

1

Why not reindex:

>>> df['dt_of_join'] = df2.set_index('id_emp').reindex(df['id_emp']).reset_index()['dt_of_join'].fillna(df['dt_of_join'])
>>> df
   id_emp  dt_of_join
0       2  30-03-2018
1       4  01-01-2019
2       5  21-06-2019
3       7  10-06-2018
4      12  03-02-2019
5      10  09-05-2019
6      19  25-12-2018
>>>

edited Jul 4, 2019 at 11:05

answered Jul 4, 2019 at 3:35

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

RAHUL VISHWAKARMA Over a year ago

yeah I mean the same but that NaN should not be NaN it should be whatever it was previous for above data frame you mentioned it should be c

U13-Forward Over a year ago

@RAHULVISHWAKARMA I edited my answer, please accept it

RAHUL VISHWAKARMA Over a year ago

Hey I really appreciate your effort but still it haven't answered my question properly can you please have a look at the question that I have just edited so you will get an idea what I am taking about.

anky · Accepted Answer · 2019-07-04 03:59:01Z

You can use series.map() with fillna() which is a faster alternative for a single col update (assuming id_emp is a column, if not d should be df2['dt_of_join'] ):

d=df2.set_index('id_emp')['dt_of_join']
df1.dt_of_join=df1.id_emp.map(d).fillna(df1.dt_of_join)
print(df1)

   id_emp      dt_of_join
0       2      30-03-2018
1       4      01-01-2019
2       5      21-06-2019
3       7      10-06-2018
4      12      03-02-2019
5      10      09-05-2019
6      19      25-12-2018

Parfait · Accepted Answer · 2019-07-04 12:07:55Z

0

Consider DataFrame.update after setting emp_id as index in both.

final_df = (tbl1_df.set_index('id_emp')
                   .update(tbl2_df.set_index('id_emp'))
           )

edited Jul 4, 2019 at 12:07

answered Jul 4, 2019 at 12:02

Parfait

108k19 gold badges102 silver badges138 bronze badges

Collectives™ on Stack Overflow

Equivalent of SQL query in Pandas data frame

3 Answers 3

3 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Related