Convert array of rows into array of strings in pyspark

Question

I have a dataframe with 2 columns and I got below array by doing df.collect().

array = [Row(name=u'Alice', age=10), Row(name=u'Bob', age=15)]

Now I want to get an output array like below.

new_array = ['Alice', 'Bob']

Could anyone please let me know how to extract above output using pyspark. Any help would be appreciated.

Thanks

cph_sto · Accepted Answer · 2019-02-19 07:50:51Z

3

# Creating the base dataframe.
values = [('Alice',10),('Bob',15)]
df = sqlContext.createDataFrame(values,['name','age'])
df.show()
    +-----+---+
    | name|age|
    +-----+---+
    |Alice| 10|
    |  Bob| 15|
    +-----+---+

df.collect()
    [Row(name='Alice', age=10), Row(name='Bob', age=15)]

# Use list comprehensions to create a list.
new_list = [row.name for row in df.collect()]
print(new_list)
    ['Alice', 'Bob']

edited Feb 19, 2019 at 7:50

answered Feb 19, 2019 at 7:41

cph_sto

7,70714 gold badges48 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Valli69 Over a year ago

Thanks for your reply. When I do df.collect() I'm getting array like [Row(name=u'Alice', age=10), Row(name=u'Bob', age=15)]. So when used row.name I'm getting u['Alice', u'Bob'] instead of ['Alice', 'Bob']

cph_sto Over a year ago

It's the same thing. Don't worry, all good. u does not have any effect on data- it is just an explicit representation of unicode object (not byte array).

Valli69 Over a year ago

Oh ok. Thank you

Jim Todd · Accepted Answer · 2019-02-19 07:40:49Z

0

I see two columns name and age in the df. Now, you want only the name column to be displayed.

You can select it like:

df.select("name").show()

This will show you only the names.

Tip: Also, you df.show() instead of df.collect(). That will show you in tabular form instead of row(...)

answered Feb 19, 2019 at 7:40

Jim Todd

1,5881 gold badge11 silver badges15 bronze badges

Collectives™ on Stack Overflow

Convert array of rows into array of strings in pyspark

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Related