pyspark error: 'DataFrame' object has no attribute 'map'

Question

I am using pyspark 2.0 to create a DataFrame object by reading a csv using:

data = spark.read.csv('data.csv', header=True)

I find the type of the data using

type(data)

The result is

pyspark.sql.dataframe.DataFrame

I am trying to convert the some columns in data to LabeledPoint in order to apply a classification.

from pyspark.sql.types import *    
from pyspark.sql.functions import loc
from pyspark.mllib.regression import LabeledPoint

data.select(['label','features']).
              map(lambda row:LabeledPoint(row.label, row.features))

I came across with this problem:

AttributeError: 'DataFrame' object has no attribute 'map'

Any idea on the error? Is there a way to generate a LabelPoint from DataFrame in order to perform classification?

Does this answer your question? AttributeError: 'DataFrame' object has no attribute 'map' — Yosi Dahari
– Yosi Dahari, Commented Feb 20, 2021 at 14:01

user6022341 · Accepted Answer · 2016-09-08 01:29:04Z

20

Use .rdd.map:

>>> data.select(...).rdd.map(...)

DataFrame.map has been removed in Spark 2.

answered Sep 8, 2016 at 1:29

community wiki

user6022341

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pyspark error: 'DataFrame' object has no attribute 'map'

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related