Pandas Dataframe find and replace

Question

I have two lists:

single = ['A','B']
double = ['AA','BB']

Data stored in dataframe df:

     0  1    2    3
0  All  1   AA  Yes
1    A  2  All   No

where All means ['A','B'] in column 0 and means ['AA','BB'] in column 2, I want to obtain the following dataframe df2

    0  1   2    3
0   A  1  AA  Yes
1   B  1  AA  Yes
2   A  2  AA   No
3   A  2  BB   No

and the order of the row index doesn't matter. I am now doing:

single = ['A','B']
double = ['AA','BB']
df=pd.DataFrame([['All',1,'AA','Yes'],['A',2,'All','No']])

index = []
for i in range(len(df)):
    if df.loc[i,0] == 'All':
        index.append(i)
        for j in single:
            df.loc[len(df),:] = df.loc[i,:]
            df.loc[len(df)-1,0] = j
df = df.drop(index).reset_index(drop=True)

index = []
for i in range(len(df)):
    if df.loc[i,2] == 'All':
        index.append(i)
        for j in double:
            df.loc[len(df),:] = df.loc[i,:]
            df.loc[len(df)-1,2] = j
df2 = df.drop(index).reset_index(drop=True)
print df2

It first adds two rows to represent 'All' in column 0 and delete this row. Then for the 'All' in column 2.

Any easier way to do this 'find and replace'?

unutbu · Accepted Answer · 2016-07-06 21:41:33Z

import pandas as pd

single = ['A','B']
double = ['AA','BB']
df = pd.DataFrame([['All',1,'AA','Yes'],['A',2,'All','No']])

first = pd.DataFrame([x for item in single 
                      for x in [('All', item), (item, item)]], columns=[0, 'first']) 
third = pd.DataFrame([x for item in double 
                      for x in [('All', item), (item, item)]], columns=[2, 'third']) 

result = pd.merge(pd.merge(df, first, how='left'), third, how='left')
result = result.drop([0, 2], axis=1)
result = result.rename(columns={'first':0, 'third':2})
result = result.sortlevel(axis=1)

yields

   0  1   2    3
0  A  1  AA  Yes
1  B  1  AA  Yes
2  A  2  AA   No
3  A  2  BB   No

The main idea is to prepare two helper DataFrames:

first = pd.DataFrame([x for item in single 
                      for x in [('All', item), (item, item)]], columns=[0, 'first']) 
#      0 first
# 0  All     A
# 1    A     A
# 2  All     B
# 3    B     B

third = pd.DataFrame([x for item in double 
                      for x in [('All', item), (item, item)]], columns=[2, 'third']) 
#      2 third
# 0  All    AA
# 1   AA    AA
# 2  All    BB
# 3   BB    BB

Then the desired DataFrame is the result of merging df with first and third:

result = pd.merge(pd.merge(df, first, how='left'), third, how='left')
#      0  1    2    3 first third
# 0  All  1   AA  Yes     A    AA
# 1  All  1   AA  Yes     B    AA
# 2    A  2  All   No     A    AA
# 3    A  2  All   No     A    BB

Finally, drop the 0 and 2 columns and replace them with the first and third columns:

result = result.drop([0, 2], axis=1)
result = result.rename(columns={'first':0, 'third':2})
result = result.sortlevel(axis=1)
#    0  1   2    3
# 0  A  1  AA  Yes
# 1  B  1  AA  Yes
# 2  A  2  AA   No
# 3  A  2  BB   No

Collectives™ on Stack Overflow

Pandas Dataframe find and replace

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related