I am currently reading Spark the definitive guide and there is an example to orderBy the DataFrame by using an expr but it does not work:
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql import Row
schema = StructType([
StructField("origin", StringType(), True),
StructField("destination", StringType(), True),
StructField("count", LongType(), True)
])
rows = [
Row("US", "Germany", 5),
Row("US", "France", 1),
Row("US", "UK", 10)
]
parallelizedRows = spark.sparkContext.parallelize(rows)
df = spark.createDataFrame(parallelizedRows, schema)
Now, in order to sort the DataFrame in descending order using expr,
df.orderBy(expr("count desc")).show(3)
The output is still in ascending. But it works using Column class:
df.orderBy(col("count").desc()).show(3)
Any idea why expr isn't working?