I trying to extract a sample from a dataframe (df_spark) with 100 million rows and converting it to a pandas dataframe using the below code:
df = df_spark.sample(withReplacement = False, fraction = 0.05, seed = 11).collect().toPandas()
Unfortunately, I'm getting the following error:
AttributeError: 'list' object has no attribute 'toPandas'
I also tried to convert it to rdd and then to pandas and got the same error.
I'm wondering to know once I have the sample list what is the correct method to convert it to a pandas dataframe or a spark dataframe?