1

So I have a pandas DataFrame that contains some batting statistics from the 2001 Arizona Diamondbacks. I'm pretty new to Python/Pandas and so I was trying to add in a few columns using lambda functions like these

PA_lambda = lambda row: row.AB + row.BB + row.HBP + row.SH + row.SF
OBP_lambda = lambda row: (row.H + row.BB + row.HBP) / (row.PA) if row.PA > 0 else 'NaN'
AVG_lambda = lambda row: row.H / row.AB if row.AB > 0 else 'NaN'

Later down the road I want to work with more data that is very similar, and will need to add these columns, and many more in the future. So I made a separate python module containing the functions, a list with each function and the column name that it should have, and a function to iterate through the list and add the columns onto the end of the DataFrame:

import pandas as pd 


PA_lambda = lambda row: row.AB + row.BB + row.HBP + row.SH + row.SF
OBP_lambda = lambda row: (row.H + row.BB + row.HBP) / (row.PA) if row.PA > 0 else 'NaN'
AVG_lambda = lambda row: row.H / row.AB if row.AB > 0 else 'NaN'

stat_functions = [['pa', PA_lambda], ['obp',OBP_lambda], ['avg', AVG_lambda]]
def format_df(df):
    for func in stat_functions:
        df['func[0]'] = df.apply(func[1], axis=1)

I'm not sure if I need the pandas module in there or not, but whenever I import the module into my Jupyter Notebook and try to call format_df, only the first function PA_lambda is run and it's saved into the DataFrame under the column label 'func'. I thought that creating a list with the column name and the function itself would work, but once it tries to apply OBP_lambda to the df it returns the error

AttributeError: 'Series' object has no attribute 'PA'

Sorry this is a little long, it's my first post here but if you have a solution I am very eager to learn.

3 Answers 3

1

You don't need to use apply for that, you can directly do these operations on columns in pandas:

df['pa'] = df['AB'] + df['BB'] + df['HBP'] + df['SH'] +df['SF']
df['obp'] = (df['H']+ df['BB']+df['HBP'])/df['PA']
df['avg'] = df['H']/df['AB']
Sign up to request clarification or add additional context in comments.

1 Comment

Right, I did that at first. But I plan on adding these columns into a number of DataFrames down the road, and adding columns with even more complex formulas so I want to just have a function that adds in the desired columns and then boom easy peasy I have the data I need simply for any type of data frame
0

Your format_df(df) function is currently looping through each function and saving the result of each to the same column 'func' because your string formatting is not correct. You need to update the last line of the function with an 'f-string' (put an f before the string) so that it is formatted at run-time.

def format_df(df):
    for func in stat_functions:
        df[f'func[0]'] = df.apply(func[1], axis=1)

Comments

0

What you needed to do is use the label element of the func item correctly when creating the new column in the df.

like this:

for func in stat_functions: 
    df[func[0]] = df.apply(func[1], axis=1)

notice how this code is referencing the value of func[0] and not the string 'func[0]' when creating a new column in the dataframe.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.