4

Did my research, but didn't find anything on this. I want to convert a simple pandas.DataFrame to a spark dataframe, like this:

df = pd.DataFrame({'col1': ['a', 'b', 'c'], 'col2': [1, 2, 3]})
sc_sql.createDataFrame(df, schema=df.columns.tolist()) 

The error I get is:

TypeError: Can not infer schema for type: <class 'str'>

I tried something even simpler:

df = pd.DataFrame([1, 2, 3])
sc_sql.createDataFrame(df)

And I get:

TypeError: Can not infer schema for type: <class 'numpy.int64'>

Any help? Do manually need to specify a schema or so?

sc_sql is a pyspark.sql.SQLContext, I am in a jupyter notebook on python 3.4 and spark 1.6.

Thanks!

7
  • 1
    I tried the code works fine, there is no error. Commented May 24, 2016 at 11:36
  • It doesn't for me, with or without schema... Commented May 24, 2016 at 11:39
  • which spark version are you using? Commented May 24, 2016 at 11:40
  • I'm on Spark 1.6.1 Commented May 24, 2016 at 11:43
  • What version of Pandas do you use? Commented May 25, 2016 at 3:47

1 Answer 1

4

It's related to your spark version, latest update of spark makes type inference more intelligent. You could have fixed this by adding the schema like this :

mySchema = StructType([ StructField("col1", StringType(), True), StructField("col2", IntegerType(), True)])
sc_sql.createDataFrame(df,schema=mySchema)
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.