5

I am creating an empty dataframe that i then want to add data to one row at a time. I want to index on the first column, 'customer_ID'

I have this:

In[1]: df = pd.DataFrame(columns = ['customer_ID','a','b','c'],index=['customer_ID'])
In[2]: df
Out[3]: 
            customer_ID    a    b    c
customer_ID         NaN  NaN  NaN  NaN

So there is already a row of NaN that I don't want. Can I point the index to the first column without adding a row of data?

2
  • Aside: adding rows one at a time is usually a bad idea. Each time you do so, pandas has to make a new copy of the whole dataframe, which gives you O(N^2) performance. Commented Apr 7, 2017 at 2:47
  • interesting comment - I am iterating through a folder of csv files, processing each one and pulling out key stats about the customer and adding them to the df. The alternative is to create the df with the full list of customers as the index and empty data and then fill in teh data one row at a time. Would this avoid the copying? Commented Apr 7, 2017 at 3:03

3 Answers 3

8

The answer, I think, as hinted at by @JD Long is to set the index in a seprate instruction:

In[1]: df = pd.DataFrame(columns = ['customer_ID','a','b','c'])
In[2]: df.set_index('customer_ID',inplace = True)
In[3]: df
Out[3]: 
Empty DataFrame
Columns: [customer_ID, a, b, c]
Index: []

I can then add rows:

In[4]: id='x123'
In[5]: df.loc[id]=[id,4,5,6]
In[6]: df
Out[7]: 
 customer_ID    a    b    c
x123        x123  4.0  5.0  6.0
Sign up to request clarification or add additional context in comments.

2 Comments

The df.set_index('customer_ID') line has no effect since it does not change the df object, rather it returns a new DataFrame. You would need to use inplace=True.
For me this only works if I do df.loc[id]=[4,5,6] instead of df.loc[id]=[id,4,5,6]. It seems like setting "id" as the index removes it from the "columns".
1

yes... and you can dropna at any time if you are so inclined:

df = df.set_index('customer_ID').dropna()
df

Comments

-1

Because you didn't have any row in your dataframe when you just create it.

df= pd.DataFrame({'customer_ID': ['2'],'a': ['1'],'b': ['A'],'c': ['1']})
df.set_index('customer_ID',drop=False)
df

1 Comment

No, I don't want a row in the dataframe - I want it empty. I will fill it in a loop later on

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.