Convert data frame into String using scala and save the ouput to a csv

Question

I want to append rows in a single string

+--------------------+
|   defectDescription|
+--------------------+
|ACEView NA : Daework|
|ACEView NA : Documen|
|ACEView NA : ACev   |
|ACEView NA : Dragdro|
+--------------------+

Expected Output: ACEView NA : Daework ACEView NA : Documen ACEView NA : ACev ACEView NA : Dragdro

Possible duplicate of Cannot print the contents of RDD

eliasah
– eliasah

2017-02-15 09:37:30 +00:00
Commented Feb 15, 2017 at 9:37 — eliasah
– eliasah, Commented Feb 15, 2017 at 9:37
I tried your solution, its not working

Ramya
– Ramya

2017-02-15 09:58:24 +00:00
Commented Feb 15, 2017 at 9:58 — Ramya
– Ramya, Commented Feb 15, 2017 at 9:58

Assaf Mendelson · Accepted Answer · 2017-02-15 16:43:14Z

10

If you indeed want to get all the data into a single string you can do it using collect:

val rows = df.select("defectDescription").collect().map(_.getString(0)).mkString(" ")

You first select the relevant column (so you have just it) and collect it, it would give you an array of rows. the map turns each row to the string (there is just one column - 0). Then mkString would make an overall string of them with a space as the separator.

Note that this would bring the entire dataframe to the driver which might cause memory exceptions. If you need just some of the data you can use take(n) instead of collect to limit the number of rows to n.

edited Feb 15, 2017 at 16:43

answered Feb 15, 2017 at 9:17

Assaf Mendelson

13k5 gold badges51 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Ramya Over a year ago

I'm getting an output like this "[Ljava.lang.String;@5eeee124"

Assaf Mendelson Over a year ago

strange, it works for me. I created the dataframe val df = Seq("a","b","c","d").toDF("defectDescription") and it simply worked.

juanchito Over a year ago

no need to collect and, indeed, if data is big enough, driver explodes

Assaf Mendelson Over a year ago

@juanchito The collect is critical here as the OP wanted everything in a SINGLE string. Of course, this will explode if too big but that is what the OP requested.

swapnil shashank · Accepted Answer · 2018-12-09 05:54:38Z

val str1 = df.select("defectDescription").collect.mkString(",")
val str =  str1.replaceAll("[\\[\\]]","")

Another way to do this is as follows:

The 1st line selects the particular columns then collects the subset, collects behaves as: Collect (Action) - Return all the elements of the dataset as an array at the driver program. This is usually useful after a filter or other operation that returns a sufficiently small subset of the data.

mkString - mkString method has an overloaded method which allows you to provide a delimiter to separate each element in the collection.

The 2nd line just replaces the additional brackets

B--rian · Accepted Answer · 2019-09-05 17:45:50Z

1

df.createTempView(viewName="table")
val res=spark.sqlContext.sql(sqlText="select defectDescription from table").collectAsList.toString.replace("[", "").replace("]", "")

Initially create a temporary view of the dataframe, then convert into a list, and then string- Finally remove the brackets as per the required output.

edited Sep 5, 2019 at 17:45

B--rian

5,97611 gold badges50 silver badges101 bronze badges

answered Sep 5, 2019 at 15:00

riddhi

112 bronze badges

Comments

juanchito · Accepted Answer · 2020-10-12 10:34:17Z

0

Let's leverage parallel computing by not prematurely collecting the data while there is associative processing to be done:

def str(r: Row) = r.getString(0)
def cat(r0: Row, r1: Row) = Row(s"${str(r0)} ${str(r1)}")

str(df.select("defectDescription").reduce(cat))

This allows parallel concatenations to be done on all executors before concatenating their results in the driver.

edited Oct 12, 2020 at 10:34

answered Oct 11, 2020 at 13:52

juanchito

5181 gold badge8 silver badges16 bronze badges

Collectives™ on Stack Overflow

Convert data frame into String using scala and save the ouput to a csv

4 Answers 4

4 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

Comments

Comments

Linked

Related