0

my pyspark dataframe is "Values":

+------+
|w_vote|
+------+
|   0.1|
|   0.2|
|  0.25|
|   0.3|
|  0.31|
|  0.36|
|  0.41|
|   0.5|

I want to loop to each value of a df using pyspark

My code :

out = []
for i in values.collect():
    print(i)

What i basically want to do is (for (i in 1:nrow(values))

I am trying below code in pyspark but it gives result as below

Row(w_vote=0.1)
Row(w_vote=0.2)
Row(w_vote=0.25)
Row(w_vote=0.3)
Row(w_vote=0.31)
Row(w_vote=0.36)
Row(w_vote=0.41)

But i want result as 0.1, 0.2, 0.25 etc.

1 Answer 1

1

collect returns a Row object, which is kind of like a dict, except you access elements as attributes, not keys.

Accordingly, you can just do this:

result = [row.w_vote for row in values.collect()]

Or this:

result = [row.asDict()['w_vote'] for row in values.collect()]

As a forloop:

result = []

for row in values.collect():
    result.append(row.w_vote)
Sign up to request clarification or add additional context in comments.

1 Comment

Can you please suggest how to incorporate that in for loop, instead of above

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.