6

I am using pyspark 2.0 to create a DataFrame object by reading a csv using:

data = spark.read.csv('data.csv', header=True)

I find the type of the data using

type(data)

The result is

pyspark.sql.dataframe.DataFrame

I am trying to convert the some columns in data to LabeledPoint in order to apply a classification.

from pyspark.sql.types import *    
from pyspark.sql.functions import loc
from pyspark.mllib.regression import LabeledPoint

data.select(['label','features']).
              map(lambda row:LabeledPoint(row.label, row.features))

I came across with this problem:

AttributeError: 'DataFrame' object has no attribute 'map'

Any idea on the error? Is there a way to generate a LabelPoint from DataFrame in order to perform classification?

1

1 Answer 1

20

Use .rdd.map:

>>> data.select(...).rdd.map(...)

DataFrame.map has been removed in Spark 2.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.