1

So I got an input pysaprk dataframe that looks like the following:

df = spark.createDataFrame(
    [("1111", "[clark, john, silvie]"),
     ("2222", "[bob, charles, seth]"),
     ("3333", "[jane, luke, adam]"),  
    ],
    ["column1", "column2"]
)
| column1 | column2 |
| ------- | ------- |
| 1111    | [clark kent, john, silvie] |
| 2222    | [bob, charles, seth rog]  |
| 3333    | [jane, luke max, adam]    |

And my goal is to convert the column and values from the column2 which is in StringType() to an ArrayType() of StringType().

But I have managed to only partially get the result converting it to ArrayType, but those values from the string list with more than one word are split separately, like the follow:

from pyspark.sql.functions import expr

df_out = df.withColumn('column2', expr(r"regexp_extract_all(column2, '(\\w+)', 1)"))

Which gets me something like (my regex skills aren't that good):

| column1 | column2 |
| ------- | ------- |
| 1111    | ["clark", "kent", "john", "silvie"] |
| 2222    | ["bob", "charles", "seth", "rog"]  |
| 3333    | ["jane", "luke", "max", "adam"]    |

But I'm actually looking to get something like:

| column1 | column2 |
| ------- | ------- |
| 1111    | ["clark kent", "john", "silvie"] |
| 2222    | ["bob", "charles", "seth rog"]  |
| 3333    | ["jane", "luke max", "adam"]    |

1 Answer 1

1

Your output does not compare well with input. Anyway modified input. Let me know if this is what you want

Use translate to replacecorner brackets. split outcome with a comma

df = spark.createDataFrame(
    [("1111", "[clark kent, john, silvie]"),
     ("2222", "[bob, charles, seth rog]"),
     ("3333", "[jane, luke max, adam]"),  
    ],
    ["column1", "column2"]
)



df.withColumn('column2',split(translate('column2','[]',''),'\,')).show(truncate=False)


+-------+----------------------------+
|column1|column2                     |
+-------+----------------------------+
|1111   |[clark kent,  john,  silvie]|
|2222   |[bob,  charles,  seth rog]  |
|3333   |[jane,  luke max,  adam]    |
+-------+----------------------------+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.