How to reset index in a pandas dataframe?

Question

I have a dataframe from which I remove some rows. As a result, I get a dataframe in which index is something like [1,5,6,10,11] and I would like to reset it to [0,1,2,3,4]. How can I do it?

The following seems to work:

df = df.reset_index()
del df['index']

The following does not work:

df = df.reindex()

Shubham Sharma · Accepted Answer · 2020-06-28 05:50:04Z

1167

DataFrame.reset_index is what you're looking for. If you don't want it saved as a column, then do:

df = df.reset_index(drop=True)

If you don't want to reassign:

df.reset_index(drop=True, inplace=True)

edited Jun 28, 2020 at 5:50

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

answered Dec 10, 2013 at 10:19

mkln

15.1k4 gold badges21 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

alhuelamo Over a year ago

Instead of reassign the dataframe to the same variable you can set inplace=True argument.

alyaxey Over a year ago

Note that in case of inplace=True the method returns None

Tms91 Over a year ago

This solved my problem of incurring in ValueError: cannot insert level_0, already exists while using df = df.reset_index() multiple times

Amir Over a year ago

@mkln, I have increased in both columns (even with drop =True) and rows. df.reset_index() only fixed the rows. Can be used to fix the columns as well?

Caleb McNevin Over a year ago

@Victor - If you don't "drop" the index, it will add a new index, and save the old index values as a series in your dataframe

|

jezrael · Accepted Answer · 2017-08-15 11:40:58Z

Another solutions are assign RangeIndex or range:

df.index = pd.RangeIndex(len(df.index))

df.index = range(len(df.index))

It is faster:

df = pd.DataFrame({'a':[8,7], 'c':[2,4]}, index=[7,8])
df = pd.concat([df]*10000)
print (df.head())

In [298]: %timeit df1 = df.reset_index(drop=True)
The slowest run took 7.26 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 105 µs per loop

In [299]: %timeit df.index = pd.RangeIndex(len(df.index))
The slowest run took 15.05 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.84 µs per loop

In [300]: %timeit df.index = range(len(df.index))
The slowest run took 7.10 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.2 µs per loop

This is an elegant solution to reset the index. Thank you! I found out that if you try to convert an hdf5 object to pandas.DataFrame object, you have to reset the index before you can edit certain sections of the DataFrame.
Does the timing change much if you do df.reset_index(drop=True, inplace=True) to avoid the copy?
My quick timeit test with pandas 2.2.3 indicates that df.reset_index(drop=True, inplace=True) @ 1.98 μs edges out RangeIndex @ 2.82 μs. So inplace is fastest of all of these options.

rsc · Accepted Answer · 2019-01-31 19:47:12Z

25

data1.reset_index(inplace=True)

edited Jan 31, 2019 at 19:47

rsc

10.7k5 gold badges43 silver badges38 bronze badges

answered Nov 22, 2018 at 18:46

user10692571

2693 silver badges2 bronze badges

Comments

cottontail · Accepted Answer · 2024-01-11 20:27:07Z

df.reset_index(drop=True) effectively replaces the index by the default RangeIndex. Another way to do the same thing is to straight away assign a new index using set_axis() (which I believe is what OP attempted with reindex). So the following two return the same output:

df1 = df.set_axis(range(len(df)))

df2 = df.reset_index(drop=True)

Note that most method/functions in pandas that remove/modify rows such as drop_duplicates(), sort_values(), dropna(), pd.concat() etc. have ignore_index parameter, which when passed True resets the index into a RangeIndex in a single function call. So keep an eye out for this parameter if you were removing/adding rows to a dataframe. An example:

df.dropna().reset_index(drop=True)    # <--- instead of this

df.dropna(ignore_index=True)          # <--- use this

In this way, you can use inplace parameter as well.

df1 = df.dropna().reset_index(drop=True)     # <--- must assign to dataframe
df.dropna(ignore_index=True, inplace=True)   # <--- `df` modified in-place

If you used groupby and want to replace the index into the default RangeIndex, there is the as_index parameter when passed False resets the index into RangeIndex in the same function call. So instead of df.groupby('col1').mean().reset_index(), use df.groupby('col1', as_index=False).mean().

Collectives™ on Stack Overflow

How to reset index in a pandas dataframe?

4 Answers 4

7 Comments

3 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

3 Comments

Comments

Comments

Linked

Related