1

I'm "translating" a series of operations that were in Tableau into a pyspark script. One column in specific I'm having serious trouble to replicate, here the Tableu version of the code:

min(
     min(
          IF[COLUMN_A] > 0 THEN [COLUMN_B] ELSE 9999 END,
          IF[COLUMN_C] > 0 THEN [COLUMN_D] ELSE 99999 END),
     min(
          IF[COLUMN_E] > 0 THEN [COLUMN_F] ELSE 99999 END,
          IF[COLUMN_G] != 0 THEN [COLUMN_H] ELSE 99999 END)
)

Is it possible to create a version of that code using a when and otherwise functions in pyspark? I'm trying but it seems I can't solve this...

1 Answer 1

1

PySpark definitely supports chaining complex conditions. I suspect your point of failure is in the syntax, or not using fun.lit to wrap the 9999 scalar.

Anyway, assuming your columns are called 'A', 'B', ... 'H', the following should work:

import pyspark.sql.functions as fun
...
fun.least(
    fun.least(
        fun.when(fun.col('A') > 0, fun.col('B')).otherwise(fun.lit(9999)),
        fun.when(fun.col('C') > 0, fun.col('D')).otherwise(fun.lit(9999))
            ),
    fun.least(
        fun.when(fun.col('E') > 0, fun.col('F')).otherwise(fun.lit(9999)),
        fun.when(fun.col('G') != 0, fun.col('H')).otherwise(fun.lit(9999))
            )
        )
Sign up to request clarification or add additional context in comments.

3 Comments

Unfortunately, when I apply this code I'm getting a "TypeError: 'Column' object is not callable"
Woops. fun.min operates within a single column. You need to use fun.least to operate across columns. I corrected the answer.
Just realized your 'not callable' error was being caused by a spelling mistake in otherwise. That is fixed too.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.