Did my research, but didn't find anything on this. I want to convert a simple pandas.DataFrame to a spark dataframe, like this:
df = pd.DataFrame({'col1': ['a', 'b', 'c'], 'col2': [1, 2, 3]})
sc_sql.createDataFrame(df, schema=df.columns.tolist())
The error I get is:
TypeError: Can not infer schema for type: <class 'str'>
I tried something even simpler:
df = pd.DataFrame([1, 2, 3])
sc_sql.createDataFrame(df)
And I get:
TypeError: Can not infer schema for type: <class 'numpy.int64'>
Any help? Do manually need to specify a schema or so?
sc_sql is a pyspark.sql.SQLContext, I am in a jupyter notebook on python 3.4 and spark 1.6.
Thanks!