Get value from another dataframe column based on condition

Question

I have a dataframe like below:

>>> df1
           a   b
0  [1, 2, 3]  10
1  [4, 5, 6]  20
2     [7, 8]  30

and another like:

I need to create column 'c' in df2 from column 'b' of df1 if column 'a' value of df2 is in coulmn 'a' df1. In df1 each tuple of column 'a' is a list.

I have tried to implement from following url, but got nothing so far: https://medium.com/@Imaadmkhan1/using-pandas-to-create-a-conditional-column-by-selecting-multiple-columns-in-two-different-b50886fabb7d

expect result is

jezrael · Accepted Answer · 2019-04-04 10:34:59Z

5

Use Series.map by flattening values from df1 to dictionary:

d = {c: b for a, b in zip(df1['a'], df1['b']) for c in a}
print (d)
{1: 10, 2: 10, 3: 10, 4: 20, 5: 20, 6: 20, 7: 30, 8: 30}

df2['new'] = df2['a'].map(d)
print (df2)
   a  new
0  1   10
1  2   10
2  3   10
3  4   20
4  5   20

EDIT: I think problem is mixed integers in list in column a, solution is use if/else for test it for new dictionary:

d = {}
for a, b in zip(df1['a'], df1['b']):
    if isinstance(a, list):
        for c in a:
            d[c] = b
    else:
        d[a] = b

df2['new'] = df2['a'].map(d)

edited Apr 4, 2019 at 10:34

answered Apr 4, 2019 at 9:35

jezrael

867k102 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Binayak Chatterjee Over a year ago

Actual data is a list of ip. I am getting the below error, there: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <dictcomp> TypeError: 'int' object is not iterable

jezrael Over a year ago

@BinayakChatterjee - Data are confidental?

jezrael Over a year ago

@BinayakChatterjee - Is possible edit question? Because bad formating of comments data. Thank you.

jezrael Over a year ago

@BinayakChatterjee - so change d = {c: b for a, b in zip(df1['a'], df1['b']) for c in a} to d = {c: b for a, b in zip(df1['b'], df1['a']) for c in a} - swap a, b

anky · Accepted Answer · 2019-04-04 09:33:15Z

4

Use :

m=pd.DataFrame({'a':np.concatenate(df.a.values),'b':df.b.repeat(df.a.str.len())})
df2.merge(m,on='a')

answered Apr 4, 2019 at 9:33

anky

75.3k11 gold badges46 silver badges76 bronze badges

Comments

Erfan · Accepted Answer · 2019-04-04 09:34:20Z

2

First we unnest the list df1 to rows, then we merge them on column a:

df1 = df1.set_index('b').a.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'a'})
print(df1, '\n')

df_final = df2.merge(df1, on='a')
print(df_final)

    b    a
0  10  1.0
1  10  2.0
2  10  3.0
0  20  4.0
1  20  5.0
2  20  6.0
0  30  7.0
1  30  8.0 

   a   b
0  1  10
1  2  10
2  3  10
3  4  20
4  5  20

answered Apr 4, 2019 at 9:34

Erfan

43.3k9 gold badges75 silver badges86 bronze badges

3 Comments

Erfan Over a year ago

Thanks for heads up @jezrael. I use this method from Wen-Bens post here: stackoverflow.com/questions/53218931/…

jezrael Over a year ago

Ya, unfortunately is still used a lot, but the best never used it - to slow.

Erfan Over a year ago

Maybe you should post an answer, eplaining this dict flattening you use? @jezrael

Collectives™ on Stack Overflow

Get value from another dataframe column based on condition

3 Answers 3

4 Comments

Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

3 Comments

Linked

Related