Pandas dataframe - how to assign index?

Question

My code is

import numpy as np
import pandas as pd
ser_1 = pd.Series(np.random.randn(6))
ser_2 = pd.Series(np.random.randn(6))
ser_3 = pd.Series(np.random.randn(6))
df = pd.DataFrame(data= {'Col1': ser_1, 'Col2': ser_2, 'Col3':ser_3 } ,  )
df

It gives me a table consists of generated rand #s:

    Col1    Col2    Col3
0   -0.594436   -0.014419   0.512523
1   0.208414    0.804857    0.261830
2   1.714547    -0.765586   -0.153386
3   -0.834847   -0.683258   -1.341085
4   2.726621    0.379711    -0.276410
5   0.151987    0.622103    0.966635

However, I would like to have labels for the rows instead of 0, 1, ...5, I tried

df = pd.DataFrame(data= {'Col1': ser_1, 'Col2': ser_2, 'Col3':ser_3 } , index=['row0', 'row1', 'row2', 'row3', 'row4', 'row5', 'row6'] )

But as expected it gives me NaNs

    Col1    Col2    Col3
row0    NaN     NaN     NaN
row1    NaN     NaN     NaN
row2    NaN     NaN     NaN
row3    NaN     NaN     NaN
row4    NaN     NaN     NaN
row5    NaN     NaN     NaN
row6    NaN     NaN     NaN

Question is what can be done so that it won't give NaNs and I can still label them?

Andy Hayden · Accepted Answer · 2015-10-09 23:17:41Z

You can set the index directly:

In [11]: df.index = ['row0', 'row1', 'row2', 'row3', 'row4', 'row5']

In [12]: df
Out[12]:
          Col1      Col2      Col3
row0 -1.094278 -0.689078 -0.465548
row1  1.555546 -0.388261  1.211150
row2 -0.143557  1.769561 -0.679080
row3 -0.064910  1.959216  0.227133
row4 -0.383729  0.113739 -0.954082
row5  0.434357 -0.646387  0.883319

Note: you can also do this with map (which is a little cleaner):

df.index = df.index.map(lambda x: 'row%s' % x)

...though I should say that usually this isn't something you usually need to do, keeping integer index is A Good Thing^TM.

Alexander · Accepted Answer · 2015-10-09 23:44:05Z

1

A list comprehension would also work:

df.index = ['row{0}'.format(n) for n in range(df.index.shape[0])]

>>> df
          Col1      Col2      Col3
row0 -1.213463 -1.331086  0.306792
row1  0.334060 -0.127397 -0.107466
row2 -0.893235  0.580098 -0.191778
row3 -0.663146 -1.269988 -1.303429
row4  0.418924  0.316321 -0.940015
row5 -0.082087 -1.893178 -1.809514

answered Oct 9, 2015 at 23:44

Alexander

110k32 gold badges212 silver badges208 bronze badges

Comments

AChampion · Accepted Answer · 2015-10-09 23:44:24Z

For you to be able to do this on the DataFrame constructor you would need nest dicts, and the indexes are used to extract the values from the nest dict (which is why you got NaN), e.g.:

>>> ser_1 = {'row{}'.format(i): v for i, v in enumerate(np.random.randn(6))}
>>> ser_2 = {'row{}'.format(i): v for i, v in enumerate(np.random.randn(6))}
>>> ser_3 = {'row{}'.format(i): v for i, v in enumerate(np.random.randn(6))}
>>> pd.DataFrame(data={'Col1': ser_1, 'Col2': ser_2, 'Col3':ser_3 },
...              index=('row'+str(i) for i in range(6)))
          Col1      Col2      Col3
row0 -0.431470  2.086320 -2.903402
row1  1.306443  1.431721 -0.344296
row2 -0.166202 -1.227531  0.351672
row3  0.929919  0.305378  0.233215
row4  0.553945  0.904051  0.681783
row5  1.424173  0.279041 -0.110876

But this seems unnecessary when you can reindex after creating as per @AndyHayden post.

Collectives™ on Stack Overflow

Pandas dataframe - how to assign index?

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related