589

I have a dataframe from which I remove some rows. As a result, I get a dataframe in which index is something like [1,5,6,10,11] and I would like to reset it to [0,1,2,3,4]. How can I do it?


The following seems to work:

df = df.reset_index()
del df['index']

The following does not work:

df = df.reindex()
0

4 Answers 4

1167

DataFrame.reset_index is what you're looking for. If you don't want it saved as a column, then do:

df = df.reset_index(drop=True)

If you don't want to reassign:

df.reset_index(drop=True, inplace=True)
Sign up to request clarification or add additional context in comments.

7 Comments

Instead of reassign the dataframe to the same variable you can set inplace=True argument.
Note that in case of inplace=True the method returns None
This solved my problem of incurring in ValueError: cannot insert level_0, already exists while using df = df.reset_index() multiple times
@mkln, I have increased in both columns (even with drop =True) and rows. df.reset_index() only fixed the rows. Can be used to fix the columns as well?
@Victor - If you don't "drop" the index, it will add a new index, and save the old index values as a series in your dataframe
|
75

Another solutions are assign RangeIndex or range:

df.index = pd.RangeIndex(len(df.index))

df.index = range(len(df.index))

It is faster:

df = pd.DataFrame({'a':[8,7], 'c':[2,4]}, index=[7,8])
df = pd.concat([df]*10000)
print (df.head())

In [298]: %timeit df1 = df.reset_index(drop=True)
The slowest run took 7.26 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 105 µs per loop

In [299]: %timeit df.index = pd.RangeIndex(len(df.index))
The slowest run took 15.05 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.84 µs per loop

In [300]: %timeit df.index = range(len(df.index))
The slowest run took 7.10 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.2 µs per loop

3 Comments

This is an elegant solution to reset the index. Thank you! I found out that if you try to convert an hdf5 object to pandas.DataFrame object, you have to reset the index before you can edit certain sections of the DataFrame.
Does the timing change much if you do df.reset_index(drop=True, inplace=True) to avoid the copy?
My quick timeit test with pandas 2.2.3 indicates that df.reset_index(drop=True, inplace=True) @ 1.98 μs edges out RangeIndex @ 2.82 μs. So inplace is fastest of all of these options.
25
data1.reset_index(inplace=True)

Comments

4

df.reset_index(drop=True) effectively replaces the index by the default RangeIndex. Another way to do the same thing is to straight away assign a new index using set_axis() (which I believe is what OP attempted with reindex). So the following two return the same output:

df1 = df.set_axis(range(len(df)))

df2 = df.reset_index(drop=True)

Note that most method/functions in pandas that remove/modify rows such as drop_duplicates(), sort_values(), dropna(), pd.concat() etc. have ignore_index parameter, which when passed True resets the index into a RangeIndex in a single function call. So keep an eye out for this parameter if you were removing/adding rows to a dataframe. An example:

df.dropna().reset_index(drop=True)    # <--- instead of this

df.dropna(ignore_index=True)          # <--- use this

In this way, you can use inplace parameter as well.

df1 = df.dropna().reset_index(drop=True)     # <--- must assign to dataframe
df.dropna(ignore_index=True, inplace=True)   # <--- `df` modified in-place

If you used groupby and want to replace the index into the default RangeIndex, there is the as_index parameter when passed False resets the index into RangeIndex in the same function call. So instead of df.groupby('col1').mean().reset_index(), use df.groupby('col1', as_index=False).mean().

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.