How to deal with errors in a list comprehension

Question

I'm trying to work out the cleanest way to deal with list comprehension when my function errors out for some reason.

Here's an example that works:

# Make up a dataframe
df=pd.DataFrame({'city':['London','Paris','New York'],
            'population_m':[6.7,7.2,12.1],
            'density_kms':[5752,5897,4856]})

# Define some function 
def calc_some_stuff(input_df,city):
    temp_df=input_df[df.city==city]
    return({
        'city':city,
        'value':int(temp_df.population_m * temp_df.density_kms / 5)})

# Use a list comprehension to cycle through cities calculating the random thing
cities=['London','Paris','New York']
pd.DataFrame([calc_some_stuff(df,c) for c in cities])

There are a few ways that can break, either NaNs or missing data

### First type of break, replace df with this (so introduce nan)
df=pd.DataFrame({'city':['London','Paris','New York'],
            'population_m':[6.7,7.2,12.1],
            'density_kms':[5752,np.nan,4856]})

### Second type of break, missing data (introducing a new city without data here)
cities=['London','Paris','New York','Berlin']

I've tried some hacky solutions, using if 'value' in locals() else None but that's a big mess. I also tried catching the two types of errors with if, elif, else but that gets really big and messy when the true function is much larger than my example one here.

The output I'm looking for (made up numbers) is:

city,value
London,4568
Paris,NA
New York,4862
Berlin,NA

did this do the required job: df.eval("value = population_m * density_kms/5", inplace=True) — Khaled Koubaa
– Khaled Koubaa, Commented Sep 22, 2022 at 12:33

matszwecja · Accepted Answer · 2022-09-22 12:32:41Z

Simpliest way would be with catching exceptions in your custom function - the problem isn't really related to list comprehension, but to the fact that your function cannot handle undefined data.

import pandas as pd
import numpy as np
# Make up a dataframe
### First type of break, replace df with this (so introduce nan)
df=pd.DataFrame({'city':['London','Paris','New York'],
            'population_m':[6.7,7.2,12.1],
            'density_kms':[5752,np.nan,4856]})

### Second type of break, missing data (introducing a new city without data here)
cities=['London','Paris','New York','Berlin']

# Define some function 
def calc_some_stuff(input_df,city):
    temp_df=input_df[df.city==city] 
    try:
        return({
            'city':city,
            'value':int(temp_df.population_m * temp_df.density_kms / 5)})
    except (ValueError, TypeError):
        return({
            'city':city,
            'value':np.nan})

# Use a list comprehension to cycle through cities calculating the random thing
print(pd.DataFrame([calc_some_stuff(df,c) for c in cities]))

Adam J · Accepted Answer · 2022-09-22 12:30:57Z

Comprehension lists can be quite hard to debug especially when they get nested. So a little tip I use when writing list comprehension:

Write the logic first using for loops if it doesn't work right away
Use print statements for debugging the loop if you are not getting the expected output.
When it works and you get the expected output, convert the for loop syntax to a list comprehension.
Extra: you can try breaking down nested list comprehensions by taking out some logic in a function like you do. Also another option is to make the logic explicit and stick to a for loop to make it readable for your future self and anyone else who might read your code.

Collectives™ on Stack Overflow

How to deal with errors in a list comprehension

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related