I've been trying for a long time to perform a GroupBy and a count() on a Spark DataFrame but and it takes like forever to be processed...
The below line takes about 13 seconds to be processed. From my perspective I think it takes too much time but I don't know how to reduce the processing time.
matched.limit(100).groupBy('Date','Period').agg(count("*").alias('cnt')).show()
I'm running on Spark 2.4 with the following config: Driver: 2 vCPU 8 GB RAM 10 Executors: 2 vCPU 8 GB RAM
Can anyone give me a hint on how to solve this issue ?