Get column index from column name in python pandas

Question

In R when you need to retrieve a column index based on the name of the column you could do

idx <- which(names(my_data)==my_colum_name)

Is there a way to do the same with pandas dataframes?

DSM · Accepted Answer · 2012-10-23 00:06:36Z

670

Sure, you can use .get_loc():

In [45]: df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

In [46]: df.columns
Out[46]: Index([apple, orange, pear], dtype=object)

In [47]: df.columns.get_loc("pear")
Out[47]: 2

although to be honest I don't often need this myself. Usually access by name does what I want it to (df["pear"], df[["apple", "orange"]], or maybe df.columns.isin(["orange", "pear"])), although I can definitely see cases where you'd want the index number.

answered Oct 23, 2012 at 0:06

DSM

355k67 gold badges605 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

abe Over a year ago

Column number is useful when using .iloc operator, where you must pass only integers for both rows and columns.

Tom Walker Over a year ago

Or when using libraries which want the DF converted to a numpy array and indices of columns with particular features. For example CatBoost wants a list of indices of categorical features.

haneulkim Over a year ago

Is there a way to get list of indexes?

cyclux Over a year ago

In my case I want to use the index of the column get values of "itertuples" by column name. Fetching the indeces of the column names instead of hardcoding keeps it dynamic in case of changes to the DF.

seanv507 Over a year ago

it's also useful for the insert function (to enable you to insert after a given column)

|

cs95 · Accepted Answer · 2019-01-22 02:52:32Z

81

Here is a solution through list comprehension. cols is the list of columns to get index for:

[df.columns.get_loc(c) for c in cols if c in df]

edited Jan 22, 2019 at 2:52

cs95

406k106 gold badges744 silver badges794 bronze badges

answered Sep 9, 2017 at 8:20

snovik

1,10710 silver badges16 bronze badges

Comments

cottontail · Accepted Answer · 2022-11-18 23:06:15Z

For returning multiple column indices, I recommend using the pandas.Index method get_indexer, if you have unique labels:

df = pd.DataFrame({"pear": [1, 2, 3], "apple": [2, 3, 4], "orange": [3, 4, 5]})
df.columns.get_indexer(['pear', 'apple'])
# Out: array([0, 1], dtype=int64)

If you have non-unique labels in the index (columns only support unique labels) get_indexer_for. It takes the same args as get_indexer:

df = pd.DataFrame(
    {"pear": [1, 2, 3], "apple": [2, 3, 4], "orange": [3, 4, 5]}, 
    index=[0, 1, 1])
df.index.get_indexer_for([0, 1])
# Out: array([0, 1, 2], dtype=int64)

Both methods also support non-exact indexing with, f.i. for float values taking the nearest value with a tolerance. If two indices have the same distance to the specified label or are duplicates, the index with the larger index value is selected:

df = pd.DataFrame(
    {"pear": [1, 2, 3], "apple": [2, 3, 4], "orange": [3, 4, 5]},
    index=[0, .9, 1.1])
df.index.get_indexer([0, 1])
# array([ 0, -1], dtype=int64)

Wes McKinney · Accepted Answer · 2012-10-23 18:27:34Z

18

DSM's solution works, but if you wanted a direct equivalent to which you could do (df.columns == name).nonzero()

answered Oct 23, 2012 at 18:27

Wes McKinney

106k32 gold badges146 silver badges109 bronze badges

Comments

salhin · Accepted Answer · 2023-01-11 12:36:29Z

14

Update: "Deprecated since version 0.25.0: Use np.asarray(..) or DataFrame.values() instead." pandas docs

In case you want the column name from the column location (the other way around to the OP question), you can use:

>>> df.columns.values()[location]

Using @DSM Example:

>>> df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

>>> df.columns

Index(['apple', 'orange', 'pear'], dtype='object')

>>> df.columns.values()[1]

'orange'

Other ways:

df.iloc[:,1].name

df.columns[location] #(thanks to @roobie-nuby for pointing that out in comments.)

edited Jan 11, 2023 at 12:36

answered Mar 2, 2018 at 11:35

salhin

2,6824 gold badges41 silver badges68 bronze badges

4 Comments

Sheldon Over a year ago

df.columns is a useful way to access a specific col of a csv

Sanjay Manohar Over a year ago

AttributeError: 'Index' object has no attribute 'get_values' (python 3.8.8 pandas 1.3.1)

salhin Over a year ago

@SanjayManohar get_values deprecated since pandas version 0.25.0: Use np.asarray(..) or DataFrame.values() instead. (answer updated)

Dulangi_Kanchana Sep 12 at 5:34

Hi,, while df.columns.values()[1] is deprecated df.columns[1] works in 2025

Divakar · Accepted Answer · 2017-09-04 05:40:03Z

When you might be looking to find multiple column matches, a vectorized solution using searchsorted method could be used. Thus, with df as the dataframe and query_cols as the column names to be searched for, an implementation would be -

def column_index(df, query_cols):
    cols = df.columns.values
    sidx = np.argsort(cols)
    return sidx[np.searchsorted(cols,query_cols,sorter=sidx)]

Sample run -

In [162]: df
Out[162]: 
   apple  banana  pear  orange  peach
0      8       3     4       4      2
1      4       4     3       0      1
2      1       2     6       8      1

In [163]: column_index(df, ['peach', 'banana', 'apple'])
Out[163]: array([4, 1, 0])

JoeTheShmoe · Accepted Answer · 2020-12-23 16:34:19Z

To modify DSM's answer a bit, get_loc has some weird properties depending on the type of index in the current version of Pandas (1.1.5) so depending on your Index type you might get back an index, a mask, or a slice. This is somewhat frustrating for me because I don't want to modify the entire columns just to extract one variable's index. Much simpler is to avoid the function altogether:

list(df.columns).index('pear')

Very straightforward and probably fairly quick.

QuentinJS · Accepted Answer · 2022-01-04 22:16:49Z

6

When the column might or might not exist, then the following (variant from above works.

ix = 'none'
try:
     ix = list(df.columns).index('Col_X')
except ValueError as e:
     ix = None  
     pass

if ix is None:
   # do something

answered Jan 4, 2022 at 22:16

QuentinJS

3422 silver badges10 bronze badges

Comments

Siraj S. · Accepted Answer · 2019-11-15 07:01:17Z

4

how about this:

df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})
out = np.argwhere(df.columns.isin(['apple', 'orange'])).ravel()
print(out)
[1 2]

answered Nov 15, 2019 at 7:01

Siraj S.

3,7714 gold badges37 silver badges50 bronze badges

Comments

Shawn Seamons · Accepted Answer · 2021-08-20 23:37:24Z

0

import random
def char_range(c1, c2):                      # question 7001144
    for c in range(ord(c1), ord(c2)+1):
        yield chr(c)      
df = pd.DataFrame()
for c in char_range('a', 'z'):               
    df[f'{c}'] = random.sample(range(10), 3) # Random Data
rearranged = random.sample(range(26), 26)    # Random Order
df = df.iloc[:, rearranged]
print(df.iloc[:,:15])                        # 15 Col View         

for col in df.columns:             # List of indices and columns
    print(str(df.columns.get_loc(col)) + '\t' + col)

![Results](Results

edited Aug 20, 2021 at 23:37

answered Aug 20, 2021 at 23:29

Shawn Seamons

112 bronze badges

1 Comment

antipattern Over a year ago

This does ... something, but without any explanation is not of any use.

Collectives™ on Stack Overflow

Get column index from column name in python pandas

10 Answers 10

6 Comments

Comments

Comments

Comments

4 Comments

Comments

Comments

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

6 Comments

Comments

Comments

Comments

4 Comments

Comments

Comments

Comments

Comments

1 Comment

Linked

Related