0

I have a device_id's event data which might be successful sometime and unsuccessful sometime.

device_id status
1 Successful
1 UnSuccessful
1 UnSuccessful
1 UnSuccessful
1 Successful
2 Successful
2 UnSuccessful
2 UnSuccessful

Is there a way to do a group by and get result for an Id in a single row like this:

device_id success_count unsuccessful_count
1 2 3
2 1 2

I have been trying several ways using group by but I haven't been able to get the success_count and unsuccessful_count for a device_id in single row.

1 Answer 1

3

You need to group your data by device id and then pivot by status and count:

df.groupBy("device_id").pivot("status").count()
Sign up to request clarification or add additional context in comments.

2 Comments

I somehow get classCast exception: Py4JJavaError: An error occurred while calling o1604.showString. : java.lang.ClassCastException: class org.apache.spark.sql.types.ArrayType cannot be cast to class org.apache.spark.sql.types.StructType (org.apache.spark.sql.types.ArrayType and org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')... Any Idea why it might happen ?
I'm accepting the answer since the above issue that I see is just related to my usecase and the solution actually does what I had asked for .

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.