1

How can I merge multiple columns into one column in pandas?

I have this table:

ID   | A   | B   | C | D
1      1     1     0   3
2      1     0     1   2
3      0     0     1   8

I want to get this table:

ID   | X | D
1      A   3
1      B   3
2      A   2
2      C   2
3      C   8

I want to merge column A,B,C into column X based on their values. If one id has multiple true (1) values merging columns (A/B/C) there will be a new row copy of that id.

2 Answers 2

3

You could use melt to reshape the DataFrame, and then keep only the relevant columns and rows using query to select the rows and drop to drop the now obsolete column, like this:

(df.melt(id_vars=['ID', 'D'], var_name='X')
.query('value == 1')
.drop(columns=['value']))
#       ID  D  X
#0   1  3  A
#1   2  2  A
#3   1  3  B
#7   2  2  C
#8   3  8  C

The DataFrame after melt looks like this:

#   ID  D  X  value
#0   1  3  A      1
#1   2  2  A      1
#2   3  8  A      0
#3   1  3  B      1
#4   2  2  B      0
#5   3  8  B      0
#6   1  3  C      0
#7   2  2  C      1
#8   3  8  C      1

By using ID and D as id_vars, those are duplicated for every different value of all the other variables (A, B and C). The values of these columns are now in the value column. Then, it's a matter of keeping the rows where value == 1

Sign up to request clarification or add additional context in comments.

Comments

0

One of possible solutions:

df2 = df.set_index('ID')
df2[['A', 'B', 'C']].multiply(df2['D'], axis='index').reset_index()\
    .melt(id_vars='ID', var_name='X', value_name='D').query('D > 0')\
    .sort_values('ID')

The result is:

   ID  X  D
0   1  A  3
3   1  B  3
1   2  A  2
7   2  C  2
8   3  C  8

Compared to your desired result, there is additional (index) column, but I think it's not important.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.