3

I have a DataFrame that looks like this:

FirstDF=
              C
A    B      
'a' 'blue'   43
    'green'  59
'b' 'red     56
'c' 'green'  80
    'orange' 72

Where A and B are set as indexes. I also have a DataFrame that looks like:

SecondDF=

    A     B
0  'a'  'green'
1  'b'  'red'
2  'c'  'green'

Is there a way I can directly query the first DataFrame with the last one, and obtain an output like the following?

C
59
56
80

I did it by iterating over the second DataFrame, as shown below, but I would like to do it using pandas logic instead of for loops.

data=[]
for i in range(SecondDF.shape[0]):
    data.append(FirstDF.loc[tuple(SecondDF.iloc[i])])
data=pd.Series(data)
0

4 Answers 4

2

Use merge with parameter left_index and right_on:

df = FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'])['C'].to_frame()
print (df)
    C
0  59
1  56
2  80

Another solution with isin of MultiIndexes and filtering by boolean indexing:

mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[mask, ['C']].reset_index(drop=True)
print (df)
    C
0  59
1  56
2  80

Detail:

print (FirstDF.loc[mask, ['C']])
              C
A   B          
'a' 'green'  59
'b' 'red'    56
'c' 'green'  80

EDIT:

You can use merge with outer join and indicator=True parameter, then filter by boolean indexing:

df1=FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'], indicator=True, how='outer')
print (df1)
    C    A         B     _merge
2  43  'a'    'blue'  left_only
0  59  'a'   'green'       both
1  56  'b'     'red'       both
2  80  'c'   'green'       both
2  72  'c'  'orange'  left_only

mask = df1['_merge'] != 'both'
df1 = df1.loc[mask, ['C']].reset_index(drop=True)
print (df1)
    C
0  43
1  72

For second solution invert boolen mask by ~:

mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[~mask, ['C']].reset_index(drop=True)
print (df)
    C
0  43
1  72
Sign up to request clarification or add additional context in comments.

1 Comment

Is there a way of doing the opposite? Like, with the same dataframes, getting: C 43 72.
2
FirstDF.loc[zip(SecondDF['A'],SecondDF['B']),]

Explanation:-

Idea is to get the indexes from second data frame and use them on first data frame. For multi-indexes you can pass the tuple of indexes to get the row.

FirstDF.loc[('bar','two'),] 

will give you all the rows whose first index is 'bar and second index is 'two'.

FirstDF.loc[(SecondDF['A'],SecondDF['B']),] 

takes those indexes directly from SecondDF which you want but the catch is it will take all the combinations of 'A' and 'B'. So adding zip will take only the indexes which are part of same row in SecondDF

Comments

0

You can use merge to get the result;

In [35]: df1
Out[35]:
   A       B   C
0  a    blue  43
1  a   green  59
2  b     red  56
3  c   green  80
4  c  orange  72

In [36]: df2
Out[36]:
   A      B
0  a  green
1  b    red
2  c  green

In [37]: pd.merge(df1, df2, on=['A', 'B'])['C']
Out[37]:
0    59
1    56
2    80
Name: C, dtype: int64

Comments

0

Ok I found an answer:

tuple_list = list(map(tuple,SecondDF.values))
insDF = FirstDF.loc[tuple_list].dropna()
outsDF = FirstDF.loc[~FirstDF.index.isin(tuple_list)]

This gives both the values that are and the values that are not in FirstDF. The dropna method is used here because this querying leaves the values in SecondDF that are not in FirstDF as NaN, so they should be dropped.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.