Changes made while iterating over dataframe dont save

Question

I have a dataframe which has "duplicate" rows in a way. Let's say i have a row A = ['name' : john, 'age' : 15, 'email' : NaN, 'school' : middle] and a row B = ['name' : john, 'age' : 15, 'email' : [email protected], 'school' : NaN]. The resulting rows for both A and B should be ['name' : john, 'age' : 15, 'email' : [email protected], 'school' : middle].

So far i have tried using iterrows() over a dataframe and changing the values but the changes don't save. My code:

duplicated = df[df.duplicated(['name', 'age'], keep = False)].sort_values('name')
row_iterator = duplicated.iterrows()

_, last = row_iterator.__next__()
for k, row in row_iterator:
    if row['name'] == last['name']:
        for i in duplicated.columns:
            if row[i] == last[i]:
                continue
            if pd.isna(row[i]):
                row[i] = last[i]
            if pd.isna(last[i]):
                last[i] = row[i]
    last = row

df is the name of the dataframe where I have all the data. Then i cut only the duplicate rows into duplicated. After that I iterate through the dataframe and try to make changes as I go. But the changes I make get lost or something in the end. What am I doing wrong?

Erfan · Accepted Answer · 2019-11-20 19:12:11Z

2

Two ways we can solve your problem:

Method 1: using bfill, ffill and drop_duplicates:

df = df.bfill().ffill().drop_duplicates()

   name  age           email  school
0  john   15  [email protected]  middle

Method 2: GroupBy.first:

df = df.groupby(['name', 'age']).first().reset_index()

   name  age           email  school
0  john   15  [email protected]  middle

answered Nov 20, 2019 at 19:12

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Michael Kročka Over a year ago

Hi! thanks for the quick reply. I tried the second method and it's working very nicely. The only problem is it changes the order of the rows, is there any way I can avoid that? I also tried the first method but it was giving me weird numbers, like halving the non-NaN value.

Erfan Over a year ago

It's hard to understand what you mean. Best way is to provide an example dataframe, where I can see what happens by applying my method. This way I can adjust my method accordingly.

piRSquared Over a year ago

@Erfan groupby with sort=False?

Michael Kročka Over a year ago

Very nice. Groupby with sort=False works. Thank you!

Collectives™ on Stack Overflow

Changes made while iterating over dataframe dont save

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related