I am 'translating' a python code to pyspark. I would like to use an existing column as index for a dataframe. I did this in python using pandas. The small piece of code below explains what I did. Thanks for helping.
df.set_index('colx',drop=False,inplace=True)
# Ordena index
df.sort_index(inplace=True)
I expect the result to be a dataframe with 'colx' as index.
df = df.sort("colx")but that's primarily for display purposes and you can't rely on that order for computations (without using aWindow). Or maybe you want to add arow_numberordering bycolx?