8

Lets say I want to create and fill an empty dataframe with values from a loop.

import pandas as pd
import numpy as np

years = [2013, 2014, 2015]
dn=pd.DataFrame()
for year in years:
    df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
                 year: [1, 1, 1 ],
                }).set_index('Incidents')
    print (df1)
    dn=dn.append(df1, ignore_index = False)

The append gives a diagonal matrix even when ignore index is false:

>>> dn
       2013  2014  2015
Incidents                  
C             1   NaN   NaN
B             1   NaN   NaN
A             1   NaN   NaN
C           NaN     1   NaN
B           NaN     1   NaN
A           NaN     1   NaN
C           NaN   NaN     1
B           NaN   NaN     1
A           NaN   NaN     1

[9 rows x 3 columns]

It should look like this:

>>> dn
       2013  2014  2015
Incidents                  
C             1   1   1
B             1   1   1
A             1   1   1

[3 rows x 3 columns]

Is there a better way of doing this? and is there a way to fix the append?

I have pandas version '0.13.1-557-g300610e'

1
  • do you need to have incidents in this way or a normal dataframe is fine for you (I mean just a matrix with names)? Commented Mar 7, 2015 at 1:58

2 Answers 2

15
import pandas as pd

years = [2013, 2014, 2015]
dn = []
for year in years:
    df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
                 year: [1, 1, 1 ],
                }).set_index('Incidents')
    dn.append(df1)
dn = pd.concat(dn, axis=1)
print(dn)

yields

           2013  2014  2015
Incidents                  
C             1     1     1
B             1     1     1
A             1     1     1

Note that calling pd.concat once outside the loop is more time-efficient than calling pd.concat with each iteration of the loop.

Each time you call pd.concat new space is allocated for a new DataFrame, and all the data from each component DataFrame is copied into the new DataFrame. If you call pd.concat from within the for-loop then you end up doing on the order of n**2 copies, where n is the number of years.

If you accumulate the partial DataFrames in a list and call pd.concat once outside the list, then Pandas only needs to perform n copies to make dn.

Sign up to request clarification or add additional context in comments.

Comments

3

As far as I know you should avoid to add line by line to the dataframe due to speed issue

What I usually do is:

l1 = []
l2 = []

for i in range(n):
   compute value v1
   compute value v2
   l1.append(v1)
   l2.append(v2)

d = pd.DataFrame()
d['l1'] = l1
d['l2'] = l2

1 Comment

thanks for your answer. Could you tell me why we should avoid adding rows line by line?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.