Iteration over columns and rows in Pandas Dataframe

Question

Say I have a dataframe that looks like:

d = {'option1': ['1', '0', '1', '1'], 'option2': ['0', '0', '1', '0'], 'option3': ['1', '1', '0', '0'], 'views': ['6', '10', '5', '2']
df = pd.DataFrame(data=d)

print(df)

  option1 option2 option3 views
0       1       0       1     6
1       0       0       1    10
2       1       1       0     5
3       1       0       0     2

I'm trying to build a for loop that iterates over each column (except the column "views") and each row. If the value of a cell is not 0, I want to replace it with the corresponding value of the column "views" from the same row.

The following output is required (should be easier to understand):

  option1 option2 option3 views
0       6       0       6     6
1       0       0      10    10
2       5       5       0     5
3       2       0       0     2

I tried something like:

df_range = len(df)

for column in df:
    for i in range(df_range):
        if column != 0:
            column = df.views[i]

But I know I'm missing something, it does not work.

Also please note that in my real dataframe, I have dozens of columns, so I need something that iterates over each column automatically. Thanks!!

I saw this thread Update a dataframe in pandas while iterating row by row but it doesn't exactly apply to my problem, because I'm not only going row by row, I also need to go column by column.

Using iterrows would work for the rows, but I'm also trying to iterate over the columns. That's why my question is not exaclty the same as the other thread pointed as the answer... — Notna
– Notna, Commented Feb 23, 2018 at 15:42
When you are iterating over a DataFrame with for column in df, your column variable will be the column name. column != 0: won't work because of that. If you are trying to access that specific cell, you need to check df[column].iloc[i] !=0. Know that this is horribly inefficient though. — user2285236
– user2285236, Commented Feb 23, 2018 at 15:50
Is this a representative example? If so, I would just multiply option columns with the view column: df.loc[:, 'option1':'option3'].mul(df['views'], axis=0) You can assign this back in the same way df.loc[:, 'option1':'option3']. = df.loc[:, 'option1':'option3'].mul(df['views'], axis=0) — user2285236
– user2285236, Commented Feb 23, 2018 at 15:59

Keith Dowd · Accepted Answer · 2018-02-23 17:02:51Z

You can also achieve the result you want this way:

for col in df:
    if col == 'views':
        continue
    for i, row_value in df[col].iteritems():
        df[col][i] = row_value * df['views'][i]

Notice the following about this solution:

1) This solution operates on each value in the dataframe individually and so is less efficient than broadcasting, because it's performing two loops (one outer, one inner).

2) This solution assumes that option1...option N are binary because essentially this solution is multiplying each binary value in option1...option N with the values in views.

3) This solution will work for any number of option columns. The option columns may have any labels you desire.

4) This solution assumes there is a column labeled views.

.iteritems() is deprecated in current versions of Pandas; use .items() instead.

YOLO · Accepted Answer · 2018-02-23 18:02:07Z

You don't need to iterate through rows. This one should be faster: Ensure that the columns values are integers.

## convert column type to integer
for i in df:
    df[i] = df[i].astype(int)

## update columns
for col in df:
    if col != 'views':
        df[col] = df[col] * df['views']

df

    option1     option2     option3     views
0      6          0            6          6
1      0          0           10         10  
2      5          5            0          5
3      2          0            0          2

luqman ahmad · Accepted Answer · 2018-12-24 15:16:43Z

2

dataSet = pd.read_excel("dataset.xlsx")
i = 0 ;
for column in dataSet:
    for i in dataSet[column].iteritems():
        if (column == 'views'):
            print (i)

answered Dec 24, 2018 at 15:16

luqman ahmad

1991 silver badge10 bronze badges

1 Comment

sepehr Over a year ago

Welcome to Stack Overflow! Thank you for the code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by describing why this is a good solution to the problem, and would make it more useful to future readers with other similar questions. Please edit your answer to add some explanation, including the assumptions you've made.

user4398691user4398691 · Accepted Answer · 2018-02-23 16:34:38Z

0

I think this would work:

df=df.astype(int)
df[df.columns[:-1]]= np.where(df[df.columns[:-1]]>0, 1, 0)
df[df.columns[:-1]]= df[df.columns[:-1]].mul(df['views'].as_matrix(), axis=0)

answered Feb 23, 2018 at 16:34

user4398691

1 Comment

user4398691 Over a year ago

if your dataframe is binary you don't have to use : df[df.columns[:-1]]= np.where(df[df.columns[:-1]]>0, 1, 0)

Collectives™ on Stack Overflow

Iteration over columns and rows in Pandas Dataframe

4 Answers 4

1 Comment

Comments

1 Comment

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

1 Comment

1 Comment

Linked

Related