2

I have a String like below , each line separated by new line and fields with spaces. The first row is my header .

col1 col2 col3 col4 col5 col6 col7 col8
val1 val2 val3 val4 val5 val6 val7 val8
val9 val10 val11 val12 val13 val14 val15 val16
val17 val18 val19 val20 val21 val22 val23 val24

How can i build a Spark DataFrame from String in Java?

2 Answers 2

2

I believe @Shankar Koirala has already provided a solution in Java by treating the text/string file as a CSV file (with custom separator " " instead of ","). Below is a Scala-equivalence of the same approach:

val spark = org.apache.spark.sql.SparkSession.builder.
  master("local").
  appName("Spark custom CSV").
  getOrCreate

val df = spark.read.
  format("csv").
  option("header", "true").
  option("delimiter", " ").
  csv("/path/to/textfile")

df.show
+-----+-----+-----+-----+-----+-----+-----+-----+
| col1| col2| col3| col4| col5| col6| col7| col8|
+-----+-----+-----+-----+-----+-----+-----+-----+
| val1| val2| val3| val4| val5| val6| val7| val8|
| val9|val10|val11|val12|val13|val14|val15|val16|
|val17|val18|val19|val20|val21|val22|val23|val24|
+-----+-----+-----+-----+-----+-----+-----+-----+

[UPDATE] Create DataFrame from string content

val s: String = """col1 col2 col3 col4 col5 col6 col7 col8
                  |val1 val2 val3 val4 val5 val6 val7 val8
                  |val9 val10 val11 val12 val13 val14 val15 val16
                  |val17 val18 val19 val20 val21 val22 val23 val24
|"""

// remove header line
val s2 = s.substring(s.indexOf('\n') + 1)

// create RDD
val rdd = sc.parallelize( s2.split("\n").map(_.split(" ")) )

// create DataFrame
val df = rdd.map{ case Array(c1, c2, c3, c4, c5, c6, c7, c8) => (c1, c2, c3, c4, c5, c6, c7, c8) }.
  toDF("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8")

df.show
+-----+-----+-----+-----+-----+-----+-----+-----+
| col1| col2| col3| col4| col5| col6| col7| col8|
+-----+-----+-----+-----+-----+-----+-----+-----+
| val1| val2| val3| val4| val5| val6| val7| val8|
| val9|val10|val11|val12|val13|val14|val15|val16|
|val17|val18|val19|val20|val21|val22|val23|val24|
+-----+-----+-----+-----+-----+-----+-----+-----+
Sign up to request clarification or add additional context in comments.

4 Comments

Yes @Shankar and you are giving a solution to read from the csv file . What i have is a String which is extracted from a particular file . I don't want to write back the String i have into a csv file and read it again. How can i convert the String i have to a data frame?
Ahh, my oversight. Please see expanded answer (code is in Scala, though).
could you help me with this piece of code in java ? sc.parallelize( s2.split("\n").map(_.split(" ")) )
One way to process the string in Java would be similar to this.
0

You can read csv file in spark Java API as follows: Creating spark session

SparkSession spark = SparkSession.builder()
  .master("local[*]")
  .appName("Example")
  .getOrCreate();

//read file with header true and delimiter as " " (space)
DataFrame df = spark.read
    .option("delimiter", " ")
    .option("header", true)
    .csv("path to file");
df.show();

1 Comment

Its not a csv file , its a string .

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.