0

I have two lists:

single = ['A','B']
double = ['AA','BB']

Data stored in dataframe df:

     0  1    2    3
0  All  1   AA  Yes
1    A  2  All   No

where All means ['A','B'] in column 0 and means ['AA','BB'] in column 2, I want to obtain the following dataframe df2

    0  1   2    3
0   A  1  AA  Yes
1   B  1  AA  Yes
2   A  2  AA   No
3   A  2  BB   No

and the order of the row index doesn't matter. I am now doing:

single = ['A','B']
double = ['AA','BB']
df=pd.DataFrame([['All',1,'AA','Yes'],['A',2,'All','No']])

index = []
for i in range(len(df)):
    if df.loc[i,0] == 'All':
        index.append(i)
        for j in single:
            df.loc[len(df),:] = df.loc[i,:]
            df.loc[len(df)-1,0] = j
df = df.drop(index).reset_index(drop=True)

index = []
for i in range(len(df)):
    if df.loc[i,2] == 'All':
        index.append(i)
        for j in double:
            df.loc[len(df),:] = df.loc[i,:]
            df.loc[len(df)-1,2] = j
df2 = df.drop(index).reset_index(drop=True)
print df2

It first adds two rows to represent 'All' in column 0 and delete this row. Then for the 'All' in column 2.

Any easier way to do this 'find and replace'?

1 Answer 1

2
import pandas as pd

single = ['A','B']
double = ['AA','BB']
df = pd.DataFrame([['All',1,'AA','Yes'],['A',2,'All','No']])

first = pd.DataFrame([x for item in single 
                      for x in [('All', item), (item, item)]], columns=[0, 'first']) 
third = pd.DataFrame([x for item in double 
                      for x in [('All', item), (item, item)]], columns=[2, 'third']) 

result = pd.merge(pd.merge(df, first, how='left'), third, how='left')
result = result.drop([0, 2], axis=1)
result = result.rename(columns={'first':0, 'third':2})
result = result.sortlevel(axis=1)

yields

   0  1   2    3
0  A  1  AA  Yes
1  B  1  AA  Yes
2  A  2  AA   No
3  A  2  BB   No

The main idea is to prepare two helper DataFrames:

first = pd.DataFrame([x for item in single 
                      for x in [('All', item), (item, item)]], columns=[0, 'first']) 
#      0 first
# 0  All     A
# 1    A     A
# 2  All     B
# 3    B     B

third = pd.DataFrame([x for item in double 
                      for x in [('All', item), (item, item)]], columns=[2, 'third']) 
#      2 third
# 0  All    AA
# 1   AA    AA
# 2  All    BB
# 3   BB    BB

Then the desired DataFrame is the result of merging df with first and third:

result = pd.merge(pd.merge(df, first, how='left'), third, how='left')
#      0  1    2    3 first third
# 0  All  1   AA  Yes     A    AA
# 1  All  1   AA  Yes     B    AA
# 2    A  2  All   No     A    AA
# 3    A  2  All   No     A    BB

Finally, drop the 0 and 2 columns and replace them with the first and third columns:

result = result.drop([0, 2], axis=1)
result = result.rename(columns={'first':0, 'third':2})
result = result.sortlevel(axis=1)
#    0  1   2    3
# 0  A  1  AA  Yes
# 1  B  1  AA  Yes
# 2  A  2  AA   No
# 3  A  2  BB   No
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.