8

Say I have a dataframe that looks like:

d = {'option1': ['1', '0', '1', '1'], 'option2': ['0', '0', '1', '0'], 'option3': ['1', '1', '0', '0'], 'views': ['6', '10', '5', '2']
df = pd.DataFrame(data=d)

print(df)

  option1 option2 option3 views
0       1       0       1     6
1       0       0       1    10
2       1       1       0     5
3       1       0       0     2

I'm trying to build a for loop that iterates over each column (except the column "views") and each row. If the value of a cell is not 0, I want to replace it with the corresponding value of the column "views" from the same row.

The following output is required (should be easier to understand):

  option1 option2 option3 views
0       6       0       6     6
1       0       0      10    10
2       5       5       0     5
3       2       0       0     2

I tried something like:

df_range = len(df)

for column in df:
    for i in range(df_range):
        if column != 0:
            column = df.views[i]

But I know I'm missing something, it does not work.

Also please note that in my real dataframe, I have dozens of columns, so I need something that iterates over each column automatically. Thanks!!

I saw this thread Update a dataframe in pandas while iterating row by row but it doesn't exactly apply to my problem, because I'm not only going row by row, I also need to go column by column.

7
  • Try using iterrows Commented Feb 23, 2018 at 15:30
  • Using iterrows would work for the rows, but I'm also trying to iterate over the columns. That's why my question is not exaclty the same as the other thread pointed as the answer... Commented Feb 23, 2018 at 15:42
  • I'll reopen it and take a harder look at the dupiness. Commented Feb 23, 2018 at 15:46
  • When you are iterating over a DataFrame with for column in df, your column variable will be the column name. column != 0: won't work because of that. If you are trying to access that specific cell, you need to check df[column].iloc[i] !=0. Know that this is horribly inefficient though. Commented Feb 23, 2018 at 15:50
  • 1
    Is this a representative example? If so, I would just multiply option columns with the view column: df.loc[:, 'option1':'option3'].mul(df['views'], axis=0) You can assign this back in the same way df.loc[:, 'option1':'option3']. = df.loc[:, 'option1':'option3'].mul(df['views'], axis=0) Commented Feb 23, 2018 at 15:59

4 Answers 4

8

You can also achieve the result you want this way:

for col in df:
    if col == 'views':
        continue
    for i, row_value in df[col].iteritems():
        df[col][i] = row_value * df['views'][i]

Notice the following about this solution:

1) This solution operates on each value in the dataframe individually and so is less efficient than broadcasting, because it's performing two loops (one outer, one inner).

2) This solution assumes that option1...option N are binary because essentially this solution is multiplying each binary value in option1...option N with the values in views.

3) This solution will work for any number of option columns. The option columns may have any labels you desire.

4) This solution assumes there is a column labeled views.

Sign up to request clarification or add additional context in comments.

1 Comment

.iteritems() is deprecated in current versions of Pandas; use .items() instead.
3

You don't need to iterate through rows. This one should be faster: Ensure that the columns values are integers.

## convert column type to integer
for i in df:
    df[i] = df[i].astype(int)

## update columns
for col in df:
    if col != 'views':
        df[col] = df[col] * df['views']

df

    option1     option2     option3     views
0      6          0            6          6
1      0          0           10         10  
2      5          5            0          5
3      2          0            0          2

Comments

2
dataSet = pd.read_excel("dataset.xlsx")
i = 0 ;
for column in dataSet:
    for i in dataSet[column].iteritems():
        if (column == 'views'):
            print (i)

1 Comment

Welcome to Stack Overflow! Thank you for the code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by describing why this is a good solution to the problem, and would make it more useful to future readers with other similar questions. Please edit your answer to add some explanation, including the assumptions you've made.
0

I think this would work:

df=df.astype(int)
df[df.columns[:-1]]= np.where(df[df.columns[:-1]]>0, 1, 0)
df[df.columns[:-1]]= df[df.columns[:-1]].mul(df['views'].as_matrix(), axis=0)

1 Comment

if your dataframe is binary you don't have to use : df[df.columns[:-1]]= np.where(df[df.columns[:-1]]>0, 1, 0)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.