1

I've got a DataFrame like this:

from pyspark.sql import SparkSession
from pyspark import Row

spark = SparkSession.builder \
    .appName('DataFrame') \
    .master('local[*]') \
    .getOrCreate()

df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'),
                            Row(a=2, b='', c=['0', '1'], d='bar'),
                            Row(a=3, b='', c=['0', '1'], d='foo')])

|  a|  b|     c|  d|
+---+---+------+---+
|  1|   |[0, 1]|foo|
|  2|   |[0, 1]|bar|
|  3|   |[0, 1]|foo|
+---+---+------+---+

I would like to create column "e" with first element of "c" column and "f" column with second element of "c" column", to look like this:

|a  |b  |c     |d  |e  |f  |
+---+---+------+---+---+---+
|1  |   |[0, 1]|foo|0  |1  |
|2  |   |[0, 1]|bar|0  |1  |
|3  |   |[0, 1]|foo|0  |1  |
+---+---+------+---+---+---+
1

1 Answer 1

2
df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'),
                            Row(a=2, b='', c=['0', '1'], d='bar'),
                            Row(a=3, b='', c=['0', '1'], d='foo')])

df2 = df.withColumn('e', df['c'][0]).withColumn('f', df['c'][1])
df2.show()

+---+---+------+---+---+---+
|a  |b  |c     |d  |e  |f  |
+---+---+------+---+---+---+
|1  |   |[0, 1]|foo|0  |1  |
|2  |   |[0, 1]|bar|0  |1  |
|3  |   |[0, 1]|foo|0  |1  |
+---+---+------+---+---+---+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.