3

My code is

import numpy as np
import pandas as pd
ser_1 = pd.Series(np.random.randn(6))
ser_2 = pd.Series(np.random.randn(6))
ser_3 = pd.Series(np.random.randn(6))
df = pd.DataFrame(data= {'Col1': ser_1, 'Col2': ser_2, 'Col3':ser_3 } ,  )
df

It gives me a table consists of generated rand #s:

    Col1    Col2    Col3
0   -0.594436   -0.014419   0.512523
1   0.208414    0.804857    0.261830
2   1.714547    -0.765586   -0.153386
3   -0.834847   -0.683258   -1.341085
4   2.726621    0.379711    -0.276410
5   0.151987    0.622103    0.966635

However, I would like to have labels for the rows instead of 0, 1, ...5, I tried

df = pd.DataFrame(data= {'Col1': ser_1, 'Col2': ser_2, 'Col3':ser_3 } , index=['row0', 'row1', 'row2', 'row3', 'row4', 'row5', 'row6'] )

But as expected it gives me NaNs

    Col1    Col2    Col3
row0    NaN     NaN     NaN
row1    NaN     NaN     NaN
row2    NaN     NaN     NaN
row3    NaN     NaN     NaN
row4    NaN     NaN     NaN
row5    NaN     NaN     NaN
row6    NaN     NaN     NaN

Question is what can be done so that it won't give NaNs and I can still label them?

3 Answers 3

2

You can set the index directly:

In [11]: df.index = ['row0', 'row1', 'row2', 'row3', 'row4', 'row5']

In [12]: df
Out[12]:
          Col1      Col2      Col3
row0 -1.094278 -0.689078 -0.465548
row1  1.555546 -0.388261  1.211150
row2 -0.143557  1.769561 -0.679080
row3 -0.064910  1.959216  0.227133
row4 -0.383729  0.113739 -0.954082
row5  0.434357 -0.646387  0.883319

Note: you can also do this with map (which is a little cleaner):

df.index = df.index.map(lambda x: 'row%s' % x)

...though I should say that usually this isn't something you usually need to do, keeping integer index is A Good ThingTM.

Sign up to request clarification or add additional context in comments.

Comments

1

A list comprehension would also work:

df.index = ['row{0}'.format(n) for n in range(df.index.shape[0])]

>>> df
          Col1      Col2      Col3
row0 -1.213463 -1.331086  0.306792
row1  0.334060 -0.127397 -0.107466
row2 -0.893235  0.580098 -0.191778
row3 -0.663146 -1.269988 -1.303429
row4  0.418924  0.316321 -0.940015
row5 -0.082087 -1.893178 -1.809514

Comments

0

For you to be able to do this on the DataFrame constructor you would need nest dicts, and the indexes are used to extract the values from the nest dict (which is why you got NaN), e.g.:

>>> ser_1 = {'row{}'.format(i): v for i, v in enumerate(np.random.randn(6))}
>>> ser_2 = {'row{}'.format(i): v for i, v in enumerate(np.random.randn(6))}
>>> ser_3 = {'row{}'.format(i): v for i, v in enumerate(np.random.randn(6))}
>>> pd.DataFrame(data={'Col1': ser_1, 'Col2': ser_2, 'Col3':ser_3 },
...              index=('row'+str(i) for i in range(6)))
          Col1      Col2      Col3
row0 -0.431470  2.086320 -2.903402
row1  1.306443  1.431721 -0.344296
row2 -0.166202 -1.227531  0.351672
row3  0.929919  0.305378  0.233215
row4  0.553945  0.904051  0.681783
row5  1.424173  0.279041 -0.110876

But this seems unnecessary when you can reindex after creating as per @AndyHayden post.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.