sum columns in dataframe python (different columns each row) [duplicate]

Question

I have a dataframe with 3 columns a, b, c like below:

df = pd.DataFrame({'a':[1,1,5,3], 'b':[2,0,6,1], 'c':[4,3,1,4]})

I want to add column d which is sum of some columns in df, but is not the same column for each row, for example

only row 1 and 3 is sum from the same column, row 0 and 2 is sum from others columns.

what I found on Stack over flow is always for certain column for whole dataframe, but in this case it is differnt.

How is the best way I can do it?

` but is not the same column for each row` What is the logic behind the columns chosen for each row? — Quang Hoang
– Quang Hoang, Commented Jan 6, 2021 at 17:26
What are the rules? Otherwise how are people is going to create an approach for solving your issue? — Dani Mesejo
– Dani Mesejo, Commented Jan 6, 2021 at 17:35

LoukasPap · Accepted Answer · 2021-01-06 18:01:16Z

0

Because column d is randomly calculated, the only way to do it for each row, is separately.

df['d'] = 0
df['d'].iloc[0] = df['b'].iloc[0]
df['d'].iloc[1] = df['a'].iloc[1] + df['c'].iloc[1]
df['d'].iloc[2] = df['a'].iloc[2]
df['d'].iloc[3] = df['a'].iloc[3] + df['c'].iloc[3]

If rows 1 and 3, have a rule:

df['d'].loc[(df.index % 2)==1] = df['a'].iloc[df.index] + df['c'].iloc[df.index]

Also, with for-loop:

for i in range(0, 4): 
    if i % 2 == 1: 
        df['d'].iloc[i] = df['a'].iloc[i] + df['c'].iloc[i]

edited Jan 6, 2021 at 18:01

answered Jan 6, 2021 at 17:32

LoukasPap

1,3481 gold badge11 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

actnmk Over a year ago

sorry I have a mistake in the expected df, column d of row 1 and 3 have the same rule, is sum of column a and c, how can I do it for both rows at the same time?

LoukasPap Over a year ago

please tell me if it works, so I can change it,if not. @actnmk

actnmk Over a year ago

it works! thank you so much

smci Over a year ago

L. Papadopoulos, I didn't say you plagiarized this. I did say that the question was a blatant dupe, which I already flagged 30 min ago, and when the question is known to be a dupe, the proper SO behavior is to vote to close as a dupe in favor of the target question, not to answer it. Additionally, you could adapt this answer and post it there (answer would need to restate the example formula you're trying to solve, obviously).

LoukasPap Over a year ago

@smci i agree that duplicates must close. But actnmk is beginer as he says, and my answer is beginer-friendly, because it is small, and on point, without confusing explanations. The other answer you say, does not need to get smaller, it is like a wiki answer, that analyses a problem and gives multiple solutions.

|

smci · Accepted Answer · 2021-01-06 20:41:36Z

The dynamic way uses pd.eval(), as per [this solution][1]. This evaluates each row's formula individually, which allows df['formula'] to be different on each row, and nothing is hardcoded in your code. There's a huge amount going on in this one-liner, see the explanation in Notes below.

df.apply(lambda row: pd.eval(row['formula'], local_dict=row.to_dict()), axis=1)

0    2
1    4
2    5
3    4
#    ^--- this is the result

and if you want to assign that result to a dataframe column, say df['z']:

df['z'] = df.apply(lambda row: pd.eval(row['formula'], local_dict=row.to_dict()), axis=1)
alternatively you could use pd.eval(..., inplace=True), but then the formula would need to contain an actual assignment, e.g. 'z=a+b', and also the 'z' column would need to have been declared already: df['z'] = np.NaN. That part is slightly annoying to implement, so I didn't.

NOTES:

we use pd.eval(...) to dynamically evaluate the ['formula'] column
- ...using the pd.eval(.., local_dict=...) argument to pass in the variables for that row
to evaluate an expression on each dataframe row, we use df.apply(..., axis=1). We have to provide some lambda function to tell it what to evaluate.
So how does pd.eval() know how to map the strings a,b,c to their values on that individual row?
- When we call df.apply(..., axis=1) row-wise like that, each row gets passed in as an individual Series, so within our apply(... axis=1), we can no longer reference the dataframe as df or its columns as df['a'], df['b'], ...
- So instead we need to pass in that row as a Python dict, hence the local_dict=row.to_dict() argument to pd.eval, inside the lambda function.
The pd.eval() approach can handle arbitrarily complicated formulas in the variables, not just simple sums; it can handle e.g. (a + c**2)/(b+c). You could reference external constants, or external functions e.g. log10.

References: [1]: Compute dataframe columns from a string formula in variables?

Collectives™ on Stack Overflow

sum columns in dataframe python (different columns each row) [duplicate]

2 Answers 2

10 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

Comments

Linked

Related