Linked Questions
10 questions linked to/from How do I add an persistent column of row ids to Spark DataFrame?
55
votes
4
answers
96k
views
Append a column to Data Frame in Apache Spark 1.3
Is it possible and what would be the most efficient neat method to add a column to Data Frame?
More specifically, column may serve as Row IDs for the existing Data Frame.
In a simplified case, ...
4
votes
2
answers
13k
views
add sequence number column in dataframe usnig scala
below is the logic to add sequence number column in dataframe. Its working as expected when I am reading data from delimited files. Today I have a new task to read the data from oracle table and add ...
4
votes
1
answer
16k
views
Auto - Incrementing pyspark dataframe column values
I am trying to generate an additional column in a dataframe with auto incrementing values based on the global value.However all the rows are generated with the same value and the value is not ...
1
vote
1
answer
8k
views
Using monotonically_increasing_id won't give consecutively IDs (pyspark)
I want to create an ID column for my pyspark dataframe, I have a column A that have repeated numbers, I want to take all the different values and assign an ID to each value
I have:
+----+
| A|
+---...
0
votes
1
answer
2k
views
Dataframe change first n rows
I've got a dataframe and I want to add another column which for the first n rows is one value, and for the rest is the value in another column... something like this
frame.select("*")
.withColumn("...
2
votes
1
answer
2k
views
How "stable" is monotonically_increasing_id() in Spark?
I'm looking for an inexpensive way to distinguish duplicates and/or uniquely identify rows. I've been looking at the Spark built-ins monotonically_increasing_id() and uuid().
The problem with uuid() ...
2
votes
0
answers
585
views
Generating unique ID for incremental update of Existing RDD in spark
I am attempting to do an incremental update to my RDD using union in spark. For that I have RDD1 ( already existing).
RDD1 :
JavaPairRDD<String,String>(uniqueID,data)
where first String ...
0
votes
0
answers
506
views
monotonically_increasing_id is generating 2 different unique IDs for same record in spark 2.3.1?
I am creating a column in my dataframe using monotonically_increasing_id, over 2-3 transformation, for few of the records ID gets changed.
e.g
val newDf = df.withColumn("rowId", ...
0
votes
1
answer
310
views
data losing while reading a file of huge size in spark scala
val data = spark.read
.text(filepath)
.toDF("val")
.withColumn("id", monotonically_increasing_id())
val count = data.count()
This code works fine when I am reading a file contains upto 50k+...
1
vote
0
answers
202
views
How do I add an persistent column of row ids to Spark DataFrame - #2
Basically I want the same thing as in this SO question. The accepted answer states that the issue is fixed with Spark 2.0 / Spark 2.1. I am using Spark 2.1.1.
However, I still experience the same (a ...