I have a dataframe which has "duplicate" rows in a way. Let's say i have a row A = ['name' : john, 'age' : 15, 'email' : NaN, 'school' : middle] and a row B = ['name' : john, 'age' : 15, 'email' : [email protected], 'school' : NaN]. The resulting rows for both A and B should be ['name' : john, 'age' : 15, 'email' : [email protected], 'school' : middle].
So far i have tried using iterrows() over a dataframe and changing the values but the changes don't save. My code:
duplicated = df[df.duplicated(['name', 'age'], keep = False)].sort_values('name')
row_iterator = duplicated.iterrows()
_, last = row_iterator.__next__()
for k, row in row_iterator:
if row['name'] == last['name']:
for i in duplicated.columns:
if row[i] == last[i]:
continue
if pd.isna(row[i]):
row[i] = last[i]
if pd.isna(last[i]):
last[i] = row[i]
last = row
df is the name of the dataframe where I have all the data. Then i cut only the duplicate rows into duplicated. After that I iterate through the dataframe and try to make changes as I go. But the changes I make get lost or something in the end. What am I doing wrong?