I'm trying to work out the cleanest way to deal with list comprehension when my function errors out for some reason.
Here's an example that works:
# Make up a dataframe
df=pd.DataFrame({'city':['London','Paris','New York'],
'population_m':[6.7,7.2,12.1],
'density_kms':[5752,5897,4856]})
# Define some function
def calc_some_stuff(input_df,city):
temp_df=input_df[df.city==city]
return({
'city':city,
'value':int(temp_df.population_m * temp_df.density_kms / 5)})
# Use a list comprehension to cycle through cities calculating the random thing
cities=['London','Paris','New York']
pd.DataFrame([calc_some_stuff(df,c) for c in cities])
There are a few ways that can break, either NaNs or missing data
### First type of break, replace df with this (so introduce nan)
df=pd.DataFrame({'city':['London','Paris','New York'],
'population_m':[6.7,7.2,12.1],
'density_kms':[5752,np.nan,4856]})
### Second type of break, missing data (introducing a new city without data here)
cities=['London','Paris','New York','Berlin']
I've tried some hacky solutions, using if 'value' in locals() else None but that's a big mess. I also tried catching the two types of errors with if, elif, else but that gets really big and messy when the true function is much larger than my example one here.
The output I'm looking for (made up numbers) is:
city,value
London,4568
Paris,NA
New York,4862
Berlin,NA
df.eval("value = population_m * density_kms/5", inplace=True)