I'm trying to convert a list into a dataframe in pyspark so that I can then join it onto a larger dataframe as a column. The data in the list are randomly generated names as so:
from faker import Faker
from pyspark.sql.functions import *
import pyspark.sql.functions as F
from pyspark.sql.types import *
faker = Faker("en_GB")
list1 = [faker.first_name() for _ in range(0, 100)]
firstname = sc.parallelize([list1])
schema = StructType([
StructField('FirstName', StringType(), True)
])
df = spark.createDataFrame(firstname, schema)
display(df)
But I'm getting this error:
PythonException: 'ValueError: Length of object (100) does not match with length of fields (1)'.
Any ideas on what's causing this and how to fix it appreciated!
Many thanks,
Carolina