Spark DataFrame Add Column with Value

Question

I have a DataFrame with below data

scala> nonFinalExpDF.show
+---+----------+
| ID|      DATE|
+---+----------+
|  1|      null|
|  2|2016-10-25|
|  2|2016-10-26|
|  2|2016-09-28|
|  3|2016-11-10|
|  3|2016-10-12|
+---+----------+

From this DataFrame I want to get below DataFrame

+---+----------+----------+
| ID|      DATE| INDICATOR|
+---+----------+----------+
|  1|      null|         1|
|  2|2016-10-25|         0|
|  2|2016-10-26|         1|
|  2|2016-09-28|         0|
|  3|2016-11-10|         1|
|  3|2016-10-12|         0|
+---+----------+----------+

Logic -

For latest DATE(MAX Date) of an ID, Indicator value would be 1 and others are 0.
For null value of the account Indicator would be 1

Please suggest me a simple logic to do that.

Post your code, what you tried so far?

Shankar
– Shankar

2016-11-04 13:04:55 +00:00
Commented Nov 4, 2016 at 13:04 — Shankar
– Shankar, Commented Nov 4, 2016 at 13:04

user6022341 · Accepted Answer · 2016-11-04 13:15:04Z

2

Try

df.createOrReplaceTempView("df")
spark.sql("""
  SELECT id, date,
    CAST(LEAD(COALESCE(date, TO_DATE('1900-01-01')), 1)
    OVER (PARTITION BY id ORDER BY date) IS NULL AS INT)
  FROM df""")

answered Nov 4, 2016 at 13:15

community wiki

user6022341

Sign up to request clarification or add additional context in comments.

1 Comment

Avijit Over a year ago

Its working. I have used "registerTempTable" instead of createOrReplaceTempView.

Collectives™ on Stack Overflow

Spark DataFrame Add Column with Value

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related