3

let's say I have the following dataframe:

Shots Goals StG 0 1 2 0.5 1 3 1 0.33 2 4 4 1

Now I want to multiply the variable Shots for a random value (multiplier in the code) and recaclucate the StG variable that is nothing but Shots/Goals, the code I used is:

for index,row in df.iterrows():
        multiplier = (np.random.randint(1,5+1))
        row['Shots'] *= multiplier
        row['StG']=float(row['Shots'])/float(row['Goals'])

Then I saved the .csv and it was identically at the original one, so after the for I simply used print(df) to obtain:

Shots Goals StG
0  1     2    0.5
1  3     1    0.33
2  4     4    1 

If I print the values row per row during the for iteration I see they change, but its like they don't save in the df.

I think it is because I'm simply accessing to the values,not the actual dataframe.

I should add something like df.row[], but it returns DataFrame has no row property.

Thanks for the help.

____EDIT____

for index,row in df.iterrows():
        multiplier = (np.random.randint(1,5+1))
        row['Impresions']*=multiplier
        row['Clicks']*=(np.random.randint(1,multiplier+1))
        row['Ctr']= float(row['Clicks'])/float(row['Impresions'])
        row['Mult']=multiplier
        #print (row['Clicks'],row['Impresions'],row['Ctr'],row['Mult'])

The main condition is that the number of Clicks cant be ever higher than the number of impressions.

Then I recalculate the ratio between Clicks/Impressions on CTR.

I am not sure if multiplying the entire column is the best choice to maintain the condition that for each row Impr >= Clicks, hence I went row by row

1

2 Answers 2

3

Fom the pandas docs about iterrows(): pandas.DataFrame.iterrows

"You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect."

The good news is you don't need to iterate over rows - you can perform the operations on columns:

# Generate an array of random integers of same length as your DataFrame
multipliers = np.random.randint(1, 5+1, size=len(df))

# Multiply corresponding elements from df['Shots'] and multipliers
df['Shots'] *= multipliers

# Recalculate df['StG']
df['StG'] = df['Shots']/df['Goals']
Sign up to request clarification or add additional context in comments.

3 Comments

Hello sgrg, I tried to propose a simple example. I'll keep in mind to check the doc before posting a question. Imagine now that I want to also multiply the column shots, but the logic is that the amount of shots can never be bigger than goals in the same row, I iterated it row per making that the multiplier of golas should always be between 0 and the amount of goals of the same row, ensuring this way the same condition. Give me 5 minutes and I'll add the original code.
It's not quite clear what you're asking and your updated question has a different example but you can also filter columns based on conditions. See stackoverflow.com/questions/18196203/… for an example.
Also you're best off posting this as a new question for more traction (and linking back to this question for reference) :)
0

Define a function that returns a series:

def f(x):
    m = np.random.randint(1,5+1)
    return pd.Series([x.Shots * m, x.Shots/x.Goals * m])

Apply the function to the data frame row-wise, it will return another data frame which can be used to replace some columns in the existing data frame, or create new columns in data frame

df[['Shots', 'StG']] = df.apply(f, axis=1)

This approach is very flexible as long as the new column values depend only on other values in the same row.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.