0

I have a tortuous list of column names in a dataframe that I'm reading from an excel sheet. The data is being imported as a multi-indexed dataframe, with two column label levels. I would like to create a list of certain column names that contain a specific string so that I can drop them from the dataframe.

My thought was to use something like this:

# Create list of names for unwanted columns.
lst = [col for col in df.columns if 'ISTD' in col]
# Returns empty.

# Drop columns from dataframe.
df.drop(labels = lst, axis=1, level=0, inplace=True)

The list returns empty though, so I guess the issue is that I don't know how to properly select columns in multi-indexed dataframes. I'm finding it the documentation difficult to understand, so I'm hoping for answers here.

Here are what my column names look like for reference:

df.columns
Out[44]: 
MultiIndex([('115  In ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('115  In ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (         '137  Ba  [ He Gas ] ',           'Conc. RSD'),
            (         '137  Ba  [ He Gas ] ',       'Conc. [ ppb ]'),
            (         '137  Ba  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            ('159  Tb ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('159  Tb ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            ('175  Lu ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('175  Lu ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (         '208  Pb  [ He Gas ] ',           'Conc. RSD'),
            (         '208  Pb  [ He Gas ] ',       'Conc. [ ppb ]'),
            (         '208  Pb  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '23  Na  [ He Gas ] ',           'Conc. RSD'),
            (          '23  Na  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '23  Na  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '24  Mg  [ He Gas ] ',           'Conc. RSD'),
            (          '24  Mg  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '24  Mg  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '27  Al  [ He Gas ] ',           'Conc. RSD'),
            (          '27  Al  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '27  Al  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (           '39  K  [ He Gas ] ',           'Conc. RSD'),
            (           '39  K  [ He Gas ] ',       'Conc. [ ppb ]'),
            (           '39  K  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '44  Ca  [ He Gas ] ',           'Conc. RSD'),
            (          '44  Ca  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '44  Ca  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            ( '45  Sc ( ISTD )  [ He Gas ] ',                 'CPS'),
            ( '45  Sc ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (          '52  Cr  [ He Gas ] ',           'Conc. RSD'),
            (          '52  Cr  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '52  Cr  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '55  Mn  [ He Gas ] ',           'Conc. RSD'),
            (          '55  Mn  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '55  Mn  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '56  Fe  [ He Gas ] ',           'Conc. RSD'),
            (          '56  Fe  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '56  Fe  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '60  Ni  [ He Gas ] ',           'Conc. RSD'),
            (          '60  Ni  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '60  Ni  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '63  Cu  [ He Gas ] ',           'Conc. RSD'),
            (          '63  Cu  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '63  Cu  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '66  Zn  [ He Gas ] ',           'Conc. RSD'),
            (          '66  Zn  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '66  Zn  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (  '7  Li ( ISTD )  [ He Gas ] ',                 'CPS'),
            (  '7  Li ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            ( '72  Ge ( ISTD )  [ He Gas ] ',                 'CPS'),
            ( '72  Ge ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (          '75  As  [ He Gas ] ',           'Conc. RSD'),
            (          '75  As  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '75  As  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '78  Se  [ He Gas ] ',           'Conc. RSD'),
            (          '78  Se  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '78  Se  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '82  Se  [ He Gas ] ',           'Conc. RSD'),
            (          '82  Se  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '82  Se  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '95  Mo  [ He Gas ] ',           'Conc. RSD'),
            (          '95  Mo  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '95  Mo  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (                       'Sample',      'Acq. Date-Time'),
            (                       'Sample',             'Comment'),
            (                       'Sample',           'Data File'),
            (                       'Sample',               'Level'),
            (                       'Sample',                'Rjct'),
            (                       'Sample',         'Sample Name'),
            (                       'Sample',          'Total Dil.'),
            (                       'Sample',                'Type'),
            (                       'Sample',  'Unnamed: 0_level_1'),
            (                       'Sample',         'Vial Number')]

Thanks for reading.

1
  • 1
    Have you tried using the .tolist() after df.columns? Commented Jul 31, 2020 at 19:08

4 Answers 4

1

So, in case of multicolumns, df.columns returns an object that you can think of as a list of tuples (of type MultiIndex.

You can iterate over them and delete them like this:

cols = [(first, second) for first, second in df.columns if 'ISTD' in second]
df.drop(cols, axis=1, level=1)

This will look for "ISTD" only in the second layer (the second value of the tuples you get from df.columns).

Sign up to request clarification or add additional context in comments.

1 Comment

Nice, and it works if I get rid of the level argument in the drop function. It gives me a KeyError (f"labels {codes} not found in level"). Not sure what that means.
1

Multi-index columns are a list of tuples. You can do:

lst = [col for col in df.columns if 'ISTD' in col[0]]
df = df.drop(lst, axis=1)

Comments

0

You don't need to create a list, you can not read the columns while reading the file using "usecols"

data = pd.read_excel(directory, usecols = lambda x: False if "unwanted_string" in x else True)

If you still want to make a list, you can get the header row separately, then go through that list to eliminate ones with the unwanted string.

#Read in the column names as a list:
cols = pd.read_excel(directory, header=None, nrows=1, index_col = 0).values[0]
cols = cols.tolist()

#remove the elements that contain the unwanted string
for item in cols:
    if "string" in str(item):
        cols.remove(item)
    else:
        continue

#then assign cols list as columns of the dataframe:
data.columns = cols

Comments

0

Here is yet another way. First, create a sample MultiIndex with 4 rows (each row is a tuple):

midx = pd.MultiIndex.from_tuples([
        ('115  In ( ISTD )  [ He Gas ] ',           'CPS'),
        ('115  In ( ISTD )  [ He Gas ] ',       'CPS RSD'),
        (         '137  Ba  [ He Gas ] ',     'Conc. RSD'),
        (         '137  Ba  [ He Gas ] ', 'Conc. [ ppb ]'),
])

Now, create the mask (looking for ISTD in the first part of the multi index):

mask = np.array(['ISTD' in idx for idx in midx.get_level_values(0)])
midx[ ~ mask ]

MultiIndex([('137  Ba  [ He Gas ] ',     'Conc. RSD'),
            ('137  Ba  [ He Gas ] ', 'Conc. [ ppb ]')],
           )

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.