Using Pandas Dataframe Append in For Loop

Question

I continue to use pandas.dataframe.append, however appending in the for-loop continues to get written over and information is duplicated. The out put of the for-loop should look exactly like the original dataframe (I am not including functions to eliminate any complexity). Any help would be appreciated.

import pandas as pd
df = pd.DataFrame({
    'date': ['2019-01-01','2019-01-01','2019-01-01',
             '2019-02-01','2019-02-01','2019-02-01',
             '2019-03-01','2019-03-01','2019-03-01',],
    'Asset': ['Asset A', 'Asset A', 'Asset A', 'Asset B', 'Asset B', 'Asset B',
             'Asset C', 'Asset C', 'Asset C'],
    'Monthly Value': [2100, 8100, 1400, 1400, 3100, 1600, 2400, 2100, 2100]
})
print(df.sort_values(by=['Asset']))

         date    Asset  Monthly Value
0  2019-01-01  Asset A           2100
1  2019-01-01  Asset A           8100
2  2019-01-01  Asset A           1400
3  2019-02-01  Asset B           1400
4  2019-02-01  Asset B           3100
5  2019-02-01  Asset B           1600
6  2019-03-01  Asset C           2400
7  2019-03-01  Asset C           2100
8  2019-03-01  Asset C           2100

This for loop creates multiple appendages to the df and duplicates rows

assetlist = list(df['Asset'].unique())

for asset in assetlist:
    df_subset = df[df['Asset'] == asset]
    dfcopy = df_subset.copy()
    newdf = newdf.append(dfcopy)
print(newdf)

This output is incorrect, it should look exactly like the original dataframe.

date    Asset   Monthly Value
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
0   2019-01-01  Asset A 2100
1   2019-01-01  Asset A 8100
2   2019-01-01  Asset A 1400
3   2019-02-01  Asset B 1400
4   2019-02-01  Asset B 3100
5   2019-02-01  Asset B 1600
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
0   2019-01-01  Asset A 2100
1   2019-01-01  Asset A 8100
2   2019-01-01  Asset A 1400
3   2019-02-01  Asset B 1400
4   2019-02-01  Asset B 3100
5   2019-02-01  Asset B 1600
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
0   2019-01-01  Asset A 2100
1   2019-01-01  Asset A 8100
2   2019-01-01  Asset A 1400
3   2019-02-01  Asset B 1400
4   2019-02-01  Asset B 3100
5   2019-02-01  Asset B 1600
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
0   2019-01-01  Asset A 2100
1   2019-01-01  Asset A 8100
2   2019-01-01  Asset A 1400
3   2019-02-01  Asset B 1400
4   2019-02-01  Asset B 3100
5   2019-02-01  Asset B 1600
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100

That was a type-o. Edited. Thank you, but still doesnt work. — Starbucks
– Starbucks, Commented Jan 15, 2020 at 21:44
I'm a bit confused what you are trying to accomplish... You are iterating over "Asset"s, filtering down to distinct groups, and then recombining those groups into a single DataFrame... won't your desired result be exactly the same as df, just in a different order? Can you give an example of the output you expect? — 0x5453
– 0x5453, Commented Jan 15, 2020 at 21:46
Yes, it should look exactly like the same. I have several functions that I took out to save time. — Starbucks
– Starbucks, Commented Jan 15, 2020 at 21:47
It might be easier (and faster) to throw all of your dfcopys into a list and then use pd.concat to combine them at the end. — 0x5453
– 0x5453, Commented Jan 15, 2020 at 21:49

YOLO · Accepted Answer · 2020-01-15 21:51:47Z

You are missing one line I think:

assetlist = list(df['Asset'].unique())
newdf = pd.DataFrame()   # <-- define it as a data frame
for asset in assetlist:
    df_subset = df[df['Asset'] == asset]
    dfcopy = df_subset.copy()
    newdf = newdf.append(dfcopy)
print(newdf)

         date    Asset  Monthly Value
0  2019-01-01  Asset A           2100
1  2019-01-01  Asset A           8100
2  2019-01-01  Asset A           1400
3  2019-02-01  Asset B           1400
4  2019-02-01  Asset B           3100
5  2019-02-01  Asset B           1600
6  2019-03-01  Asset C           2400
7  2019-03-01  Asset C           2100
8  2019-03-01  Asset C           2100

However, an easier way to do this is:

newdf = pd.concat([df.query("Asset == @asset") for asset in assetlist])

Not only easier, but entirely more efficient. pd.concat+ list.append doesn't lead to the quadratic copying of DataFrame.append in a loop

Collectives™ on Stack Overflow

Using Pandas Dataframe Append in For Loop

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related