-1

I have a dataframe with 3 columns a, b, c like below:

df = pd.DataFrame({'a':[1,1,5,3], 'b':[2,0,6,1], 'c':[4,3,1,4]})

I want to add column d which is sum of some columns in df, but is not the same column for each row, for example

enter image description here

only row 1 and 3 is sum from the same column, row 0 and 2 is sum from others columns.

what I found on Stack over flow is always for certain column for whole dataframe, but in this case it is differnt.

How is the best way I can do it?

10
  • 2
    ` but is not the same column for each row` What is the logic behind the columns chosen for each row? Commented Jan 6, 2021 at 17:26
  • 2
    You mean they are selected randomly? Commented Jan 6, 2021 at 17:28
  • 1
    Why the third row is equal to a? Commented Jan 6, 2021 at 17:30
  • 1
    What are the rules? Otherwise how are people is going to create an approach for solving your issue? Commented Jan 6, 2021 at 17:35
  • 1
    Is d=a+c the rule for every odd row? Commented Jan 6, 2021 at 17:45

2 Answers 2

0

Because column d is randomly calculated, the only way to do it for each row, is separately.

df['d'] = 0
df['d'].iloc[0] = df['b'].iloc[0]
df['d'].iloc[1] = df['a'].iloc[1] + df['c'].iloc[1]
df['d'].iloc[2] = df['a'].iloc[2]
df['d'].iloc[3] = df['a'].iloc[3] + df['c'].iloc[3]

If rows 1 and 3, have a rule:

df['d'].loc[(df.index % 2)==1] = df['a'].iloc[df.index] + df['c'].iloc[df.index]

Also, with for-loop:

for i in range(0, 4): 
    if i % 2 == 1: 
        df['d'].iloc[i] = df['a'].iloc[i] + df['c'].iloc[i]
Sign up to request clarification or add additional context in comments.

10 Comments

sorry I have a mistake in the expected df, column d of row 1 and 3 have the same rule, is sum of column a and c, how can I do it for both rows at the same time?
please tell me if it works, so I can change it,if not. @actnmk
it works! thank you so much
L. Papadopoulos, I didn't say you plagiarized this. I did say that the question was a blatant dupe, which I already flagged 30 min ago, and when the question is known to be a dupe, the proper SO behavior is to vote to close as a dupe in favor of the target question, not to answer it. Additionally, you could adapt this answer and post it there (answer would need to restate the example formula you're trying to solve, obviously).
@smci i agree that duplicates must close. But actnmk is beginer as he says, and my answer is beginer-friendly, because it is small, and on point, without confusing explanations. The other answer you say, does not need to get smaller, it is like a wiki answer, that analyses a problem and gives multiple solutions.
|
-2

The dynamic way uses pd.eval(), as per [this solution][1]. This evaluates each row's formula individually, which allows df['formula'] to be different on each row, and nothing is hardcoded in your code. There's a huge amount going on in this one-liner, see the explanation in Notes below.

df.apply(lambda row: pd.eval(row['formula'], local_dict=row.to_dict()), axis=1)

0    2
1    4
2    5
3    4
#    ^--- this is the result

and if you want to assign that result to a dataframe column, say df['z']:

  • df['z'] = df.apply(lambda row: pd.eval(row['formula'], local_dict=row.to_dict()), axis=1)
  • alternatively you could use pd.eval(..., inplace=True), but then the formula would need to contain an actual assignment, e.g. 'z=a+b', and also the 'z' column would need to have been declared already: df['z'] = np.NaN. That part is slightly annoying to implement, so I didn't.

NOTES:

  1. we use pd.eval(...) to dynamically evaluate the ['formula'] column
    • ...using the pd.eval(.., local_dict=...) argument to pass in the variables for that row
  2. to evaluate an expression on each dataframe row, we use df.apply(..., axis=1). We have to provide some lambda function to tell it what to evaluate.
  3. So how does pd.eval() know how to map the strings a,b,c to their values on that individual row?
    • When we call df.apply(..., axis=1) row-wise like that, each row gets passed in as an individual Series, so within our apply(... axis=1), we can no longer reference the dataframe as df or its columns as df['a'], df['b'], ...
    • So instead we need to pass in that row as a Python dict, hence the local_dict=row.to_dict() argument to pd.eval, inside the lambda function.
  4. The pd.eval() approach can handle arbitrarily complicated formulas in the variables, not just simple sums; it can handle e.g. (a + c**2)/(b+c). You could reference external constants, or external functions e.g. log10.

References: [1]: Compute dataframe columns from a string formula in variables?

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.