So I have a pandas DataFrame that contains some batting statistics from the 2001 Arizona Diamondbacks. I'm pretty new to Python/Pandas and so I was trying to add in a few columns using lambda functions like these
PA_lambda = lambda row: row.AB + row.BB + row.HBP + row.SH + row.SF
OBP_lambda = lambda row: (row.H + row.BB + row.HBP) / (row.PA) if row.PA > 0 else 'NaN'
AVG_lambda = lambda row: row.H / row.AB if row.AB > 0 else 'NaN'
Later down the road I want to work with more data that is very similar, and will need to add these columns, and many more in the future. So I made a separate python module containing the functions, a list with each function and the column name that it should have, and a function to iterate through the list and add the columns onto the end of the DataFrame:
import pandas as pd
PA_lambda = lambda row: row.AB + row.BB + row.HBP + row.SH + row.SF
OBP_lambda = lambda row: (row.H + row.BB + row.HBP) / (row.PA) if row.PA > 0 else 'NaN'
AVG_lambda = lambda row: row.H / row.AB if row.AB > 0 else 'NaN'
stat_functions = [['pa', PA_lambda], ['obp',OBP_lambda], ['avg', AVG_lambda]]
def format_df(df):
for func in stat_functions:
df['func[0]'] = df.apply(func[1], axis=1)
I'm not sure if I need the pandas module in there or not, but whenever I import the module into my Jupyter Notebook and try to call format_df, only the first function PA_lambda is run and it's saved into the DataFrame under the column label 'func'. I thought that creating a list with the column name and the function itself would work, but once it tries to apply OBP_lambda to the df it returns the error
AttributeError: 'Series' object has no attribute 'PA'
Sorry this is a little long, it's my first post here but if you have a solution I am very eager to learn.