Pandas not saving changes when iterating rows

Question

let's say I have the following dataframe:

Shots Goals StG 0 1 2 0.5 1 3 1 0.33 2 4 4 1

Now I want to multiply the variable Shots for a random value (multiplier in the code) and recaclucate the StG variable that is nothing but Shots/Goals, the code I used is:

for index,row in df.iterrows():
        multiplier = (np.random.randint(1,5+1))
        row['Shots'] *= multiplier
        row['StG']=float(row['Shots'])/float(row['Goals'])

Then I saved the .csv and it was identically at the original one, so after the for I simply used print(df) to obtain:

Shots Goals StG
0  1     2    0.5
1  3     1    0.33
2  4     4    1

If I print the values row per row during the for iteration I see they change, but its like they don't save in the df.

I think it is because I'm simply accessing to the values,not the actual dataframe.

I should add something like df.row[], but it returns DataFrame has no row property.

Thanks for the help.

____EDIT____

for index,row in df.iterrows():
        multiplier = (np.random.randint(1,5+1))
        row['Impresions']*=multiplier
        row['Clicks']*=(np.random.randint(1,multiplier+1))
        row['Ctr']= float(row['Clicks'])/float(row['Impresions'])
        row['Mult']=multiplier
        #print (row['Clicks'],row['Impresions'],row['Ctr'],row['Mult'])

The main condition is that the number of Clicks cant be ever higher than the number of impressions.

Then I recalculate the ratio between Clicks/Impressions on CTR.

I am not sure if multiplying the entire column is the best choice to maintain the condition that for each row Impr >= Clicks, hence I went row by row

see related: stackoverflow.com/questions/31458794/…

EdChum
– EdChum

2017-04-11 16:06:19 +00:00
Commented Apr 11, 2017 at 16:06 — EdChum
– EdChum, Commented Apr 11, 2017 at 16:06

sgrg · Accepted Answer · 2017-04-11 16:21:37Z

3

Fom the pandas docs about iterrows(): pandas.DataFrame.iterrows

"You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect."

The good news is you don't need to iterate over rows - you can perform the operations on columns:

# Generate an array of random integers of same length as your DataFrame
multipliers = np.random.randint(1, 5+1, size=len(df))

# Multiply corresponding elements from df['Shots'] and multipliers
df['Shots'] *= multipliers

# Recalculate df['StG']
df['StG'] = df['Shots']/df['Goals']

edited Apr 11, 2017 at 16:21

answered Apr 11, 2017 at 16:16

sgrg

1,2509 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

DDDDEEEEXXXX Over a year ago

Hello sgrg, I tried to propose a simple example. I'll keep in mind to check the doc before posting a question. Imagine now that I want to also multiply the column shots, but the logic is that the amount of shots can never be bigger than goals in the same row, I iterated it row per making that the multiplier of golas should always be between 0 and the amount of goals of the same row, ensuring this way the same condition. Give me 5 minutes and I'll add the original code.

sgrg Over a year ago

It's not quite clear what you're asking and your updated question has a different example but you can also filter columns based on conditions. See stackoverflow.com/questions/18196203/… for an example.

sgrg Over a year ago

Also you're best off posting this as a new question for more traction (and linking back to this question for reference) :)

Haleemur Ali · Accepted Answer · 2017-04-11 17:02:03Z

Define a function that returns a series:

def f(x):
    m = np.random.randint(1,5+1)
    return pd.Series([x.Shots * m, x.Shots/x.Goals * m])

Apply the function to the data frame row-wise, it will return another data frame which can be used to replace some columns in the existing data frame, or create new columns in data frame

df[['Shots', 'StG']] = df.apply(f, axis=1)

This approach is very flexible as long as the new column values depend only on other values in the same row.

Collectives™ on Stack Overflow

Pandas not saving changes when iterating rows

2 Answers 2

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Linked

Related