0

Given a dataset with multiple lines:

0,1,2

7,8,9

18,19,5

How to produce results in Spark:

Array(Array(Array(0),Array(1),Array(2)),Array(Array(7),Array(8),Array(9)), Array(Array(18),Array(19),Array(5))

1 Answer 1

1

If you are talking about RDD[Array[Array[Int]]] in spark which is equivalent to Array[Array[Array[Int]]] in scala, then you can do the following

supposing you have a text file (/home/test.csv) as having

0,1,2
7,8,9
18,19,5

you can do

scala> val data = sc.textFile("/home/test.csv")
data: org.apache.spark.rdd.RDD[String] = /home/test.csv MapPartitionsRDD[4] at textFile at <console>:24

scala> val array = data.map(line => line.split(",").map(x => Array(x.toInt)))
array: org.apache.spark.rdd.RDD[Array[Array[Int]]] = MapPartitionsRDD[5] at map at <console>:26

You can take one step further to have RDD[Array[Array[Array[Int]]]] which says that each value of rdd is the type you want, then you can use wholeTextFile as it reads a file into tuple2(filename, texts in the file)

scala> val data = sc.wholeTextFiles("/home/test.csv")
data: org.apache.spark.rdd.RDD[(String, String)] = /home/test.csv MapPartitionsRDD[3] at wholeTextFiles at <console>:24

scala> val array = data.map(t2 => t2._2.split("\n").map(line => line.split(",").map(x => Array(x.toInt))))
array: org.apache.spark.rdd.RDD[Array[Array[Array[Int]]]] = MapPartitionsRDD[4] at map at <console>:26
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.