I am trying to generate an additional column in a dataframe with auto incrementing values based on the global value.However all the rows are generated with the same value and the value is not incrementing.
Here is the code
def autoIncrement():
global rec
if (rec == 0) : rec = 1
else : rec = rec + 1
return int(rec)
rec=14
UDF
autoIncrementUDF = udf(autoIncrement, IntegerType())
df1 = hiveContext.sql("select id,name,location,state,datetime,zipcode from demo.target")
df1.withColumn("id2", autoIncrementUDF()).show()
Here is the result df
+---+------+--------+----------+-------------------+-------+---+
| id| name|location| state| datetime|zipcode|id2|
+---+------+--------+----------+-------------------+-------+---+
| 20|pankaj| Chennai| TamilNadu|2018-03-26 11:00:00| NULL| 15|
| 10|geetha| Newyork|New Jersey|2018-03-27 10:00:00| NULL| 15|
| 25| pawan| Chennai| TamilNadu|2018-03-27 11:25:00| NULL| 15|
| 30|Manish| Gurgoan| Gujarat|2018-03-27 11:00:00| NULL| 15|
+---+------+--------+----------+-------------------+-------+---+
But i am expecting the below result
+---+------+--------+----------+-------------------+-------+---+
| id| name|location| state| datetime|zipcode|id2|
+---+------+--------+----------+-------------------+-------+---+
| 20|pankaj| Chennai| TamilNadu|2018-03-26 11:00:00| NULL| 15|
| 10|geetha| Newyork|New Jersey|2018-03-27 10:00:00| NULL| 16|
| 25| pawan| Chennai| TamilNadu|2018-03-27 11:25:00| NULL| 17|
| 30|Manish| Gurgoan| Gujarat|2018-03-27 11:00:00| NULL| 18|
+---+------+--------+----------+-------------------+-------+---+
Any help is appreciated.
UDFcan be executed in different workers, a pythonglobalvariable makes no sense as globals are bound to processes.