Filling empty python dataframe using loops

Question

Lets say I want to create and fill an empty dataframe with values from a loop.

import pandas as pd
import numpy as np

years = [2013, 2014, 2015]
dn=pd.DataFrame()
for year in years:
    df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
                 year: [1, 1, 1 ],
                }).set_index('Incidents')
    print (df1)
    dn=dn.append(df1, ignore_index = False)

The append gives a diagonal matrix even when ignore index is false:

>>> dn
       2013  2014  2015
Incidents                  
C             1   NaN   NaN
B             1   NaN   NaN
A             1   NaN   NaN
C           NaN     1   NaN
B           NaN     1   NaN
A           NaN     1   NaN
C           NaN   NaN     1
B           NaN   NaN     1
A           NaN   NaN     1

[9 rows x 3 columns]

It should look like this:

>>> dn
       2013  2014  2015
Incidents                  
C             1   1   1
B             1   1   1
A             1   1   1

[3 rows x 3 columns]

Is there a better way of doing this? and is there a way to fix the append?

I have pandas version '0.13.1-557-g300610e'

do you need to have incidents in this way or a normal dataframe is fine for you (I mean just a matrix with names)? — Donbeo
– Donbeo, Commented Mar 7, 2015 at 1:58

unutbu · Accepted Answer · 2015-03-07 02:10:15Z

import pandas as pd

years = [2013, 2014, 2015]
dn = []
for year in years:
    df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
                 year: [1, 1, 1 ],
                }).set_index('Incidents')
    dn.append(df1)
dn = pd.concat(dn, axis=1)
print(dn)

yields

           2013  2014  2015
Incidents                  
C             1     1     1
B             1     1     1
A             1     1     1

Note that calling pd.concat once outside the loop is more time-efficient than calling pd.concat with each iteration of the loop.

Each time you call pd.concat new space is allocated for a new DataFrame, and all the data from each component DataFrame is copied into the new DataFrame. If you call pd.concat from within the for-loop then you end up doing on the order of n**2 copies, where n is the number of years.

If you accumulate the partial DataFrames in a list and call pd.concat once outside the list, then Pandas only needs to perform n copies to make dn.

Donbeo · Accepted Answer · 2015-03-07 01:57:23Z

3

As far as I know you should avoid to add line by line to the dataframe due to speed issue

What I usually do is:

l1 = []
l2 = []

for i in range(n):
   compute value v1
   compute value v2
   l1.append(v1)
   l2.append(v2)

d = pd.DataFrame()
d['l1'] = l1
d['l2'] = l2

answered Mar 7, 2015 at 1:57

Donbeo

17.7k39 gold badges123 silver badges193 bronze badges

1 Comment

aerin Over a year ago

thanks for your answer. Could you tell me why we should avoid adding rows line by line?

Collectives™ on Stack Overflow

Filling empty python dataframe using loops

2 Answers 2

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Linked

Related