0

I want to aggregate on the Identifiant column with count of different state and represent all the state.

Identifiant state
ID01 NY
ID02 NY
ID01 CA
ID03 CA
ID01 CA
ID03 NY
ID01 NY
ID01 CA
ID01 NY

I'd like to obtain this dataset:

Identifiant NY CA
ID01 3 3
ID02 1 0
ID03 1 1

1 Answer 1

1

Group by Identifiant and pivot State column:

from pyspark.sql import functions as F

result = (df.groupBy("Identifiant")
          .pivot("State")
          .count().na.fill(0)
          )

result.show()
#+-----------+---+---+
#|Identifiant| CA| NY|
#+-----------+---+---+
#|       ID03|  1|  1|
#|       ID01|  3|  3|
#|       ID02|  0|  1|
#+-----------+---+---+
Sign up to request clarification or add additional context in comments.

1 Comment

Nice, but you can simplify the code: df.groupby('Identifiant').pivot('State').count().fillna(0)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.