2

I have a pandas data frame

 df = pd.DataFrame({'id':[1,2,3,4],
                    'attr1':[1,1,0,0],
                    'attr2':[0,1,1,0],
                    'attr3':[1,1,1,0],
                    'attr4':[1,1,1,1]})

enter image description here

I want to convert it to

enter image description here

Basically create a new variable which will contain previous dataframe columns if its value is 1

1 Answer 1

4

Use:

df1 = df.filter(like='attr')
df = df.drop(df1.columns, axis=1)
df['var'] = df1.dot(df1.columns + ' ').str.rstrip()
print (df)
   id                      var
0   1        attr1 attr3 attr4
1   2  attr1 attr2 attr3 attr4
2   3        attr2 attr3 attr4
3   4                    attr4

Explanation:

  1. Filter columns by filter - only attribute columns
  2. Remove columns by drop
  3. Matrix multiplication by columns with DataFrame.dot
  4. Last remove last whitespaces by rstrip

Alternative solution:

cols = df.columns[df.columns.str.startswith('attr')]
df = df.drop(cols, axis=1).assign(var=df[cols].dot(cols + ' ').str.rstrip())
print (df)
   id                      var
0   1        attr1 attr3 attr4
1   2  attr1 attr2 attr3 attr4
2   3        attr2 attr3 attr4
3   4                    attr4

For revert back use str.get_dummies:

df1 = df.join(df.pop('var').str.get_dummies(' '))
print (df1)
   id  attr1  attr2  attr3  attr4
0   1      1      0      1      1
1   2      1      1      1      1
2   3      0      1      1      1
3   4      0      0      0      1
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for prompt response! How can I revert back to original data frame using newly created data frame?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.