pandas apply function with if statement inside the custom function

Question

def cal_properties(pressure):
    
    if pressure>=0 and pressure<=1000:
        density=1/pressure  #myfunction(pressure)
    else:
        density=pressure*10

    return  density

print(df)


WELL_NKNME  10A74  10A75  10A77  10A78  11A74  11A75  11A77  11A78
Date                                                              
2022-06-05    0.0    0.0    0.0    0.0    0.0  122.8   56.3   96.3
2022-06-06    0.0    0.0    0.0    0.0    0.0  118.3   52.0   85.3
2022-06-07    0.0    0.0    0.0    0.0    0.0  119.5   52.9   87.4

df=df.apply(lambda row: cal_properties(row),axis=1)

then I got an error related to if statement


----> 7 df=df.apply(lambda row: cal_properties(row),axis=1)
      8 df

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\frame.py in apply(
    self,
    func,
    axis,
    raw,
    result_type,
    args,
    **kwargs
)
   8738             kwargs=kwargs,
   8739         )
-> 8740         return op.apply()
   8741 
   8742     def applymap(

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\apply.py in apply(self)
    686             return self.apply_raw()
    687 
--> 688         return self.apply_standard()
    689 
    690     def agg(self):

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    810 
    811     def apply_standard(self):
--> 812         results, res_index = self.apply_series_generator()
    813 
    814         # wrap results

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    826             for i, v in enumerate(series_gen):
    827                 # ignore SettingWithCopy here in case the user mutates
--> 828                 results[i] = self.f(v)
    829                 if isinstance(results[i], ABCSeries):
    830                     # If we have a view on v, we need to make a copy because

C:\Temp\1\ipykernel_840\3896317687.py in <lambda>(row)
      5 # print(df.iloc[0:1,:])
      6 # print(df.to_dict())
----> 7 df=df.apply(lambda row: cal_properties(row),axis=1)
      8 df

C:\Temp\1\ipykernel_840\2054338456.py in cal_properties(pressure)
      1 def cal_properties(pressure):
      2 
----> 3     if pressure>=0 and pressure<=1000:
      4         density=1/pressure  #myfunction(pressure)
      5     else:

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1535     @final
   1536     def __nonzero__(self):
-> 1537         raise ValueError(
   1538             f"The truth value of a {type(self).__name__} is ambiguous. "
   1539             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

here is the dataframe dictionary data so you can exercise the code. if I don't have if statement, the code is fine. I am not sure how to solve it? Thanks for your help.

print(df.to_dict())

{'10A74': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '10A75': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '10A77': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '10A78': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '11A74': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '11A75': {Timestamp('2022-06-05 00:00:00'): 122.8, Timestamp('2022-06-06 00:00:00'): 118.3, Timestamp('2022-06-07 00:00:00'): 119.5}, '11A77': {Timestamp('2022-06-05 00:00:00'): 56.3, Timestamp('2022-06-06 00:00:00'): 52.0, Timestamp('2022-06-07 00:00:00'): 52.9}, '11A78': {Timestamp('2022-06-05 00:00:00'): 96.3, Timestamp('2022-06-06 00:00:00'): 85.3, Timestamp('2022-06-07 00:00:00'): 87.4}}

why are writing a function, and then putting that in another lambda function to apply? Just apply the function you wrote — SuperStew
– SuperStew, Commented Jun 10, 2022 at 14:44
Also do you mean to apply to the whole df? or just one of the series? — SuperStew
– SuperStew, Commented Jun 10, 2022 at 14:45
You are calling cal_properties(row), so pressure is a whole row. So what does if pressure>=0 and pressure<=1000 mean? — Yevhen Kuzmovych
– Yevhen Kuzmovych, Commented Jun 10, 2022 at 14:45
If your intent is to apply the function to each cell, use df.applymap. Otherwise, the function gets passed a row (axis=1) and the exception trace is pretty self-explanatory — Marat
– Marat, Commented Jun 10, 2022 at 14:46
yes it is applied to each cell for whole dataframe. if I don't use if statement, using axis=1 is totally fine. the error is for the case with if statement. Thanks — roudan
– roudan, Commented Jun 10, 2022 at 14:48

Marat · Accepted Answer · 2022-06-10 15:08:28Z

2

It seems like a job for np.where instead:

df.loc[:, :] = np.where((df >= 0) & (df <= 1000), 1/df, df*10)

Same logic can be applied row-wise:

def cal_properties(pressure_row):
    return pd.Series(
        np.where(pressure_row.between(0, 1000), 1/pressure_row, pressure_row*10),
        index=pressure_row.index
    )

df = df.apply(cal_properties,axis=1)

edited Jun 10, 2022 at 15:08

answered Jun 10, 2022 at 14:56

Marat

15.8k3 gold badges44 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ArchAngelPwn Over a year ago

I was about to ask the same question why np.where() or np.select() wouldn't work better in this situation. OP, if this isn't what you are looking for can you explain why?

roudan Over a year ago

yes it works, Thanks Marat, but I still prefer to use apply() with lambda since my real function is much more complicated that above.

Marat Over a year ago

@roudan updated the unser. Let me know if this is what you're looking for

roudan Over a year ago

Thank you Marat, your code really help me understand how it works by using apply function. I use your same code to solve my other puzzle. I really appreciate it.

Onyambu · Accepted Answer · 2022-06-10 15:22:44Z

WE know you cannot divide by 0

changing your function to stricty >0 then you can do applymap since you are doing the calculations cellwise instead of rowwise/columnwise. hence:

def cal_properties(pressure):
    
    if pressure>0 and pressure<=1000:
        density=1/pressure  #myfunction(pressure)
    else:
        density=pressure*10

    return  density

df.applymap(cal_properties)
 
            10A74  10A75  10A77  10A78  11A74     11A75     11A77     11A78
2022-06-05    0.0    0.0    0.0    0.0    0.0  0.008143  0.017762  0.010384
2022-06-06    0.0    0.0    0.0    0.0    0.0  0.008453  0.019231  0.011723
2022-06-07    0.0    0.0    0.0    0.0    0.0  0.008368  0.018904  0.011442

yes it works as Marat mentioned, but still prefer to use apply(axis=1), thanks

Collectives™ on Stack Overflow

pandas apply function with if statement inside the custom function

2 Answers 2

4 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Related