TypeError converting a Pandas Dataframe to Spark Dataframe in Pyspark

Question

Did my research, but didn't find anything on this. I want to convert a simple pandas.DataFrame to a spark dataframe, like this:

df = pd.DataFrame({'col1': ['a', 'b', 'c'], 'col2': [1, 2, 3]})
sc_sql.createDataFrame(df, schema=df.columns.tolist())

The error I get is:

TypeError: Can not infer schema for type: <class 'str'>

I tried something even simpler:

df = pd.DataFrame([1, 2, 3])
sc_sql.createDataFrame(df)

And I get:

TypeError: Can not infer schema for type: <class 'numpy.int64'>

Any help? Do manually need to specify a schema or so?

sc_sql is a pyspark.sql.SQLContext, I am in a jupyter notebook on python 3.4 and spark 1.6.

Thanks!

I tried the code works fine, there is no error.

shivsn
– shivsn

2016-05-24 11:36:34 +00:00
Commented May 24, 2016 at 11:36 — shivsn
– shivsn, Commented May 24, 2016 at 11:36
It doesn't for me, with or without schema...

neocortex
– neocortex

2016-05-24 11:39:27 +00:00
Commented May 24, 2016 at 11:39 — neocortex
– neocortex, Commented May 24, 2016 at 11:39
which spark version are you using?

shivsn
– shivsn

2016-05-24 11:40:51 +00:00
Commented May 24, 2016 at 11:40 — shivsn
– shivsn, Commented May 24, 2016 at 11:40
I'm on Spark 1.6.1

neocortex
– neocortex

2016-05-24 11:43:31 +00:00
Commented May 24, 2016 at 11:43 — neocortex
– neocortex, Commented May 24, 2016 at 11:43
What version of Pandas do you use?

zero323
– zero323

2016-05-25 03:47:18 +00:00
Commented May 25, 2016 at 3:47 — zero323
– zero323, Commented May 25, 2016 at 3:47

ML_TN · Accepted Answer · 2017-04-17 20:24:53Z

4

It's related to your spark version, latest update of spark makes type inference more intelligent. You could have fixed this by adding the schema like this :

mySchema = StructType([ StructField("col1", StringType(), True), StructField("col2", IntegerType(), True)])
sc_sql.createDataFrame(df,schema=mySchema)

answered Apr 17, 2017 at 20:24

ML_TN

7296 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

TypeError converting a Pandas Dataframe to Spark Dataframe in Pyspark

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related