0

Suppose I have a Dataframe whit this structure:

  T1P1_T0   Count T1P1_T1  Count.1 T1P1_T3  Count.2
0     one  1207.0    four     1936     one    644.0
1     two   816.0     two     1601   seven    414.0
2   three   712.0    five     1457     NaN      NaN
3     NaN     NaN     six     4564     NaN      NaN

Mi desired output is this:

     Element    T1P1_T0  T1P1_T1  T1P1_T3
0        one    1207      NaN    644.0
1        two     816   1601.0      NaN
2      three     712      NaN      NaN
3       four     NaN   1936.0      NaN
4       five           1456.0      NaN
5        six     NaN   4564.0      NaN
6      seven     NaN      NaN    414.0

What I've tried, is to separate the initial dataframe into three:

df1 = df.iloc[:,:2]
df2 = df.iloc[:,2:4]
df3 = df.iloc[:,4:]

And try to merge the first two, and then the third one, using different approaches of pd.merge:

for example:

result = pd.merge(df1, df2, right_on=df.iloc[:,0], left_on=df.iloc[:,0])

but the result is not what I want:

   key_0 T1P1_T0   Count T1P1_T1  Count.1
0    one     one  1207.0    four     1936
1    two     two   816.0     two     1601
2  three   three   712.0    five     1457
3    NaN     NaN     NaN     six     4564

I don't know how to specify the columns with the element names as the key value for the merge operation.

Any suggestion with that?

Thanks

2 Answers 2

1

Let us do concat

out = pd.concat([x.set_index(x.columns[0]).iloc[:,0].dropna() for x in [df1,df2,df3]],keys=df.columns[::2],axis=1)
       T1P1_T0  T1P1_T1  T1P1_T3
one     1207.0      NaN    644.0
two      816.0   1601.0      NaN
three    712.0      NaN      NaN
four       NaN   1936.0      NaN
five       NaN   1457.0      NaN
six        NaN   4564.0      NaN
seven      NaN      NaN    414.0
Sign up to request clarification or add additional context in comments.

Comments

0

Proceeding from your data, you can do some more wrangling to get the data into the desired form; also, instead of merging, maybe try concatenating:

As a side note, wondering if the data could be received in a better format, where you did not have to do this wrangling where errors can seep through.

df1 = df.iloc[:, :2].dropna()
df1 = (
    df1.set_index(df1.iloc[:, 0].rename("Element"))
    .iloc[:, -1]
    .rename(df1.iloc[:, 0].name)
)
df2 = df.iloc[:, 2:4].dropna()
df2 = (
    df2.set_index(df2.iloc[:, 0].rename("Element"))
    .iloc[:, -1]
    .rename(df2.iloc[:, 0].name)
)
df3 = df.iloc[:, 4:].dropna()
df3 = (
    df3.set_index(df3.iloc[:, 0].rename("Element"))
    .iloc[:, -1]
    .rename(df3.iloc[:, 0].name)
)

df1
Element
one      1207.0
two       816.0
three     712.0
Name: T1P1_T0, dtype: float64

df2
Element
four    1936
two     1601
five    1457
six     4564
Name: T1P1_T1, dtype: int64

df3
Element
one      644.0
seven    414.0
Name: T1P1_T3, dtype: float64

Now, concatenate:

pd.concat([df1, df2, df3], axis="columns")



       T1P1_T0  T1P1_T1 T1P1_T3
Element         
one     1207.0  NaN     644.0
two     816.0   1601.0  NaN
three   712.0   NaN     NaN
four    NaN     1936.0  NaN
five    NaN     1457.0  NaN
six     NaN     4564.0  NaN
seven   NaN     NaN     414.0

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.