How to use an existing column as index in Spark's Dataframe

Question

I am 'translating' a python code to pyspark. I would like to use an existing column as index for a dataframe. I did this in python using pandas. The small piece of code below explains what I did. Thanks for helping.

df.set_index('colx',drop=False,inplace=True)
# Ordena index
df.sort_index(inplace=True)

I expect the result to be a dataframe with 'colx' as index.

Spark DataFrames do not have a concept of an index (or order in general). You can do df = df.sort("colx") but that's primarily for display purposes and you can't rely on that order for computations (without using a Window). Or maybe you want to add a row_number ordering by colx? — pault
– pault, Commented May 30, 2019 at 17:20
Possible duplicate of Spark Dataframe :How to add a index Column : Aka Distributed Data Index — Ram Ghadiyaram
– Ram Ghadiyaram, Commented Sep 12, 2019 at 19:31

Ram Ghadiyaram · Accepted Answer · 2019-09-12 19:33:28Z

1

This is not how it works with Spark. No such concept exists.

One can add a column to an RDD zipWithIndex by convert DF to RDD and back, but that is a new column, so not the same thing.

edited Sep 12, 2019 at 19:33

Ram Ghadiyaram

29.4k16 gold badges101 silver badges133 bronze badges

answered May 30, 2019 at 22:10

Ged

18.5k8 gold badges53 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hanzgs · Accepted Answer · 2020-04-24 13:21:08Z

1

add index to pyspark dataframe as a column and use it

rdd_df = df.rdd.zipWithIndex()
df_index = rdd_df.toDF()
#and extract the columns
df_index = df_index.withColumn('colA', df_index['_1'].getItem("'colA"))
df_index = df_index.withColumn('colB', df_index['_1'].getItem("'colB"))

answered Apr 24, 2020 at 13:21

hanzgs

1,6162 gold badges24 silver badges52 bronze badges

Collectives™ on Stack Overflow

How to use an existing column as index in Spark's Dataframe

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related