0

Lets say I have a Pyspark DataFrame with the following columns:

user, score, country, risky/safe, payment_id

I made a list of thresholds: [10, 20, 30]

Now I want to make a new columns for each threshold:

  1. % of risky payments with score above the threshold out of all payments (risky and safe)
  2. % of risky distinct users with at least one score above the threshold out of all users (risky and safe)

both of them should be grouped by country.

The result should be something like this:

Country | % payments thresh 10 | % users thresh 10 | % payments thresh 20 ... 
A
B
C

I was able to make it work with an external for loop but I want it to be all in one dataframe.

thresholds = [10, 20, 30]


for thresh in thresholds:

    
df = (df
     .select('country', 'risk/safe', 'user', 'payment')
     .where(F.col('risk\safe') == 'risk')
     .groupBy('country').agg(F.sum(F.when(
         (F.col('score') >= thresh),1 
           )) / F.count('country').alias('% payments'))
1
  • Shouldn't you divide by F.count('payment') to get the % of payments over the threshold for every country? Commented Aug 9, 2022 at 6:55

1 Answer 1

1

Use a list comprehension within the agg().

pay_aggs = [(func.sum((func.col('score')>=thresh).cast('int'))/func.count('country')).alias('% pay '+str(thresh)) for thresh in thresholds]
user_aggs = [(func.countDistinct(func.when(func.col('score')>=thresh, func.col('user')))/func.countDistinct('user')).alias('% user '+str(thresh)) for thresh in thresholds]

df. \
    select('country', 'risk/safe', 'user', 'payment'). \
    where(func.col('risk\safe') == 'risk'). \
    groupBy('country'). \
    agg(*pay_aggs, *user_aggs)

The pay_aggs list will generate the following aggregations (you can easily print the list)

# [Column<'(sum(CAST((score >= 10) AS INT)) / count(country)) AS `% pay 10`'>,
#  Column<'(sum(CAST((score >= 20) AS INT)) / count(country)) AS `% pay 20`'>,
#  Column<'(sum(CAST((score >= 30) AS INT)) / count(country)) AS `% pay 30`'>]
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.