0

I've got a dataframe and I want to add another column which for the first n rows is one value, and for the rest is the value in another column... something like this

frame.select("*")
.withColumn("newColumn", if(row number < 5) "hello, world" else col("someth_else"))
1
  • 1
    There are no row numbers in Spark. Commented Mar 16, 2017 at 23:06

1 Answer 1

4

If you are using spark >= 2.x, you can use monotonically_increasing_id() to create a row index for the data frame, then use when.otherwise to conditionally create a new column based the condition (row_number):

val df = Seq(1,3,5,7,8).toDF("A")

df.withColumn("rn", monotonically_increasing_id()).
   withColumn("new", when($"rn" <= 2, lit("hello world")).otherwise($"A")).show

+---+---+-----------+
|  A| rn|        new|
+---+---+-----------+
|  1|  0|hello world|    
|  3|  1|hello world|
|  5|  2|hello world|
|  7|  3|          7|
|  8|  4|          8|
+---+---+-----------+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.