2

I am just getting into Python, and I am trying to make a for-loop that loops on every row and randomly select two columns on each iteration based on a given condition and change their values. The for-loop works without any problems; however, the results don't change on the dataframe.

A reproducible example:

df= pd.DataFrame({'A': [10,40,10,20,10],
                  'B': [10,10,50,40,50],
                  'C': [10,20,10,10,10],
                  'D': [10,30,10,10,50],
                  'E': [10,10,40,10,10],
                  'F': [2,3,2,2,3]})

df:


    A   B   C   D   E   F
0   10  10  10  10  10  2
1   40  10  20  30  10  3
2   10  50  10  10  40  2
3   20  40  10  10  10  2
4   10  50  10  50  10  3

This is my for-loop; the for loop iterates on all rows and check if the value on column F = 2; it randomly selects two columns with value 10 and change them to 100.

for index, i in df.iterrows():
  if i['F'] == 2:
    i[i==10].sample(2, axis=0)+100
    print(i[i==10].sample(2, axis=0)+100)

This is the output of the loop:

E    110
C    110
Name: 0, dtype: int64
C    110
D    110
Name: 2, dtype: int64
C    110
D    110
Name: 3, dtype: int64

This is what the dataframe is expected to look like:

df:


    A   B   C   D   E   F
0   10  10  110 10  110 2
1   40  10  20  30  10  3
2   10  50  110 110 40  2
3   20  40  110 110 10  2
4   10  50  10  50  10  3

However, the columns on the dataframe are not change. Any idea what's going wrong?

2 Answers 2

2

This line:

i[i==10].sample(2, axis=0)+100

.sample returns a new dataframe so the original dataframe (df) was not updated at all.

Try this:

for index, i in df.iterrows():
    if i['F'] == 2:
        cond = (i == 10)

        # You can only sample 2 rows if there are at
        # least 2 rows meeting the condition
        if cond.sum() >= 2:
            idx = i[cond].sample(2).index
            i[idx] += 100
            print(i[idx])
Sign up to request clarification or add additional context in comments.

4 Comments

Works as intended, thank you very much. Just a question, what if we want to exclude a column from the sample, for example, column E or D and E?
You change the condition: cond = (i == 10) & not i.index.isin(['E', 'D', 'F'])
It didnt work SyntaxError: invalid syntax the error was from not. I tried changing & to and showed an error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I also tried using not in and adding ( ) ,but none of them worked.
I meant in my first comment. What if I want the loop to skip specific columns so their values don't change; for example, column E has 10, but the loop should skip it (exclude it) from the loop, and its value should not be changed. Thank you so much.
0

You should not modify the original df in place. Make a copy and iterate:

df2 = df.copy()
for index, i in df.iterrows():
    if i['F'] == 2:
        s = i[i==10].sample(2, axis=0)+100
        df2.loc[index,i.index.isin(s.index)] = s

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.