Assigning index column to empty pandas dataframe

Question

I am creating an empty dataframe that i then want to add data to one row at a time. I want to index on the first column, 'customer_ID'

I have this:

In[1]: df = pd.DataFrame(columns = ['customer_ID','a','b','c'],index=['customer_ID'])
In[2]: df
Out[3]: 
            customer_ID    a    b    c
customer_ID         NaN  NaN  NaN  NaN

So there is already a row of NaN that I don't want. Can I point the index to the first column without adding a row of data?

Aside: adding rows one at a time is usually a bad idea. Each time you do so, pandas has to make a new copy of the whole dataframe, which gives you O(N^2) performance. — DSM
– DSM, Commented Apr 7, 2017 at 2:47
interesting comment - I am iterating through a folder of csv files, processing each one and pulling out key stats about the customer and adding them to the df. The alternative is to create the df with the full list of customers as the index and empty data and then fill in teh data one row at a time. Would this avoid the copying? — doctorer
– doctorer, Commented Apr 7, 2017 at 3:03

doctorer · Accepted Answer · 2017-09-13 06:00:27Z

8

The answer, I think, as hinted at by @JD Long is to set the index in a seprate instruction:

In[1]: df = pd.DataFrame(columns = ['customer_ID','a','b','c'])
In[2]: df.set_index('customer_ID',inplace = True)
In[3]: df
Out[3]: 
Empty DataFrame
Columns: [customer_ID, a, b, c]
Index: []

I can then add rows:

In[4]: id='x123'
In[5]: df.loc[id]=[id,4,5,6]
In[6]: df
Out[7]: 
 customer_ID    a    b    c
x123        x123  4.0  5.0  6.0

edited Sep 13, 2017 at 6:00

answered Apr 7, 2017 at 2:01

doctorer

1,7828 gold badges28 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Marcel M Over a year ago

The df.set_index('customer_ID') line has no effect since it does not change the df object, rather it returns a new DataFrame. You would need to use inplace=True.

Ben Farmer Over a year ago

For me this only works if I do df.loc[id]=[4,5,6] instead of df.loc[id]=[id,4,5,6]. It seems like setting "id" as the index removes it from the "columns".

JD Long · Accepted Answer · 2017-04-07 01:34:57Z

1

yes... and you can dropna at any time if you are so inclined:

df = df.set_index('customer_ID').dropna()
df

answered Apr 7, 2017 at 1:34

JD Long

61k58 gold badges208 silver badges300 bronze badges

Comments

user2775128 · Accepted Answer · 2017-04-07 01:49:16Z

-1

Because you didn't have any row in your dataframe when you just create it.

df= pd.DataFrame({'customer_ID': ['2'],'a': ['1'],'b': ['A'],'c': ['1']})
df.set_index('customer_ID',drop=False)
df

answered Apr 7, 2017 at 1:49

user2775128

4072 gold badges5 silver badges14 bronze badges

1 Comment

doctorer Over a year ago

No, I don't want a row in the dataframe - I want it empty. I will fill it in a loop later on

Collectives™ on Stack Overflow

Assigning index column to empty pandas dataframe

3 Answers 3

2 Comments

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Related