3

I have initialized an empty pandas dataframe that I am now trying to fill but I keep running into the same error. This is the (simplified) code I am using

import pandas as pd
cols = list("ABC")
df = pd.DataFrame(columns=cols)
# sett the values for the first two rows
df.loc[0:2,:] = [[1,2],[3,4],[5,6]]

On running the above code I get the following error:

ValueError: cannot copy sequence with size 3 to array axis with dimension 0

I am not sure whats causing this. I tried the same using a single row at a time and it works (df.loc[0,:] = [1,2,3]). I thought this should be the logical expansion when I want to handle more than one rows. But clearly, I am wrong. Whats the correct way to do this? I need to enter values for multiple rows and columns and once. I can do it using a loop but that's not what I am looking for.

Any help would be great. Thanks

4 Answers 4

5

Since you have the columns from empty dataframe use it in dataframe constructor i.e

import pandas as pd
cols = list("ABC")
df = pd.DataFrame(columns=cols)

df = pd.DataFrame(np.array([[1,2],[3,4],[5,6]]).T,columns=df.columns) 

   A  B  C
0  1  3  5
1  2  4  6

Well, if you want to use loc specifically then, reindex the dataframe first then assign i.e

arr = np.array([[1,2],[3,4],[5,6]]).T
df = df.reindex(np.arange(arr.shape[0]))
df.loc[0:arr.shape[0],:] = arr

   A  B  C
0  1  3  5
1  2  4  6
Sign up to request clarification or add additional context in comments.

Comments

1

How about adding data by index as below. You can add externally to a function as and when you receive data.

def add_to_df(index, data):
    for idx,i in zip(index,(zip(*data))):
        df.loc[idx]=i

#Set values for first two rows
data1 = [[1,2],[3,4],[5,6]]
index1 = [0,1]
add_to_df(index1, data1)
print df
print ""

#Set values for next three rows
data2 = [[7,8,9],[10,11,12],[13,14,15]]
index2 = [2,3,4]
add_to_df(index2, data2)
print df

Result

>>> 
     A    B    C
0  1.0  3.0  5.0
1  2.0  4.0  6.0

     A     B     C
0  1.0   3.0   5.0
1  2.0   4.0   6.0
2  7.0  10.0  13.0
3  8.0  11.0  14.0
4  9.0  12.0  15.0
>>> 

Comments

1

Seeing through the documentation and some experiments, my guess is that loc only allows you to insert 1 key at a time. However, you can insert multiple keys first with reindex as @Dark shows.

The .loc/[] operations can perform enlargement when setting a non-existent key for that axis.

http://pandas-docs.github.io/pandas-docs-travis/indexing.html#setting-with-enlargement

Also, while you are using loc[:2, :], you mean you want to select the first two rows. However, there is nothing in the empty df for you to select. There is no rows while you are trying to insert 3 rows. Thus, the message gives

ValueError: cannot copy sequence with size 3 to array axis with dimension 0

BTW, [[1,2],[3,4],[5,6]] will be 3 rows rather than 2.

7 Comments

You say dont want to use the constructor, well you are using one inside concat.
@Dark That's true... Will revise.
they only allow you to insert 1 key at a time. Nope we can assign multiple rows at once. But the thing is there should exist an index before assigning. See in my answer
@Dark I think that is slightly different. I will change the wording. The part I cite is about inserting key with loc. What you do is you first insert keys with reindex methods and then insert rows.
@Dark Thanks for the feedback. Edited.
|
0

Does this get the output you looking for:

   import pandas as pd
   df=pd.DataFrame({'A':[1,2],'B':[3,4],'C':[5,6]})

Output :

    A B C
  0 1 3 5
  1 2 4 6

1 Comment

thats not the question. Once I have an empty dataframe, I want to fill it in. A lot of calculations are happening on the fly and I dont have the values beforehand

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.