2
           name                     address             contact_info    
        first_name  last_name       stret   city    mobile      email
    1   AAA             BBB         XXX     YYY     02020       [email protected]
    2   111             222         333     444     239393      [email protected]

I have an excel in the above format. What I want is to have every column inside name and then only mobile column inside contact_info would someone please let me know how I can do this. Following code gives me everything inside name and contact_info

import pandas as pd
df = pd.read_excel("test.xlsx", header=[0, 1], sheet_name="Mapping")
print df[["name", "contact_info"]]

I am trying to get something like this,

first_name  last_name   mobile
AAA         BBB        102020
111         222        239393

5 Answers 5

3

By using IndexSlice + concat

idx = pd.IndexSlice
pd.concat([df.loc[:, idx['name',:]],df.loc[:,idx[:,'mobile']]])
Out[104]: 
   contact_info       name          
         mobile first_name last_name
1           NaN        AAA       BBB
2           NaN        111       222
1          2020        NaN       NaN
2        239393        NaN       NaN
Sign up to request clarification or add additional context in comments.

2 Comments

Hmm, looks like concat cannot be avoided here.
@cᴏʟᴅsᴘᴇᴇᴅ yep , cause the multiple index
3

You can use df.xs here:

i = df.xs('name', axis=1)
j = df.xs('mobile', axis=1, level=-1)

pd.concat([i, j], axis=1)

  first_name last_name  contact_info
1        AAA       BBB          2020
2        111       222        239393

1 Comment

is there any reason I could do without concat.
2

Option 1
Simplest I could think of would be column slicing:

df['name'].join(df['contact_info']['mobile'])

  first_name last_name  mobile
1        AAA       BBB  020202
2        111       222  239393

Option 2
pd.DataFrame.filter

df.filter(regex='name|mobile')

        name           contact_info
  first_name last_name       mobile
1        AAA       BBB       020202
2        111       222       239393

And we can drop the level

d = df.filter(regex='name|mobile')
d.columns = d.columns.droplevel(0)
d

  first_name last_name  mobile
1        AAA       BBB  020202
2        111       222  239393

1 Comment

Nice, the filter idiom is useful here, will commit to memory.
0

Not sure why you want to avoid concat, but this does it:

df = pd.read_excel("multi-index-test.xlsx", header=[0, 1], sheet_name="Mapping")
df.drop('address', level=0, axis=1).drop('e-mail', level=1, axis=1)

This takes advantage of MultiIndex.drop().

Comments

0

What you're looking for just requires basic indexing on a multiindex along with concat. Here's an example:

df = pd.read_excel("multi-index-test.xlsx", header=[0, 1])
df1 = df[["name"]]
df2 = df['contact_info', 'mobile']
pd.concat([df1, df2], axis=1)

I believe this solution has the benefit of being 1) simple and 2) general.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.