I am working in Spark and Using Scala
I am having two csv files, one having the column names and other having data, how I can integrate both of them so that I can make a resultant file with schema and data, then I have to apply operations on that file like groupby, cout, etc as I need to count the distinct values from those columns.
So can anyone help out here will be really helpful
I wrote the below code made two DF from both the file after reading them than I joined both the DF using union now how I can make the first row as schema , or anyother way to proceed with this . Anyone can suggest .
val sparkConf = new SparkConf().setMaster("local[4]").setAppName("hbase sql")
val sc = new SparkContext(sparkConf)
val spark1 = SparkSession.builder().config(sc.getConf).getOrCreate()
val sqlContext = spark1.sqlContext
val spark = SparkSession
.builder
.appName("SparkSQL")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val lines = spark1.sparkContext.textFile("C:/Users/ayushgup/Downloads/home_data_usage_2018122723_1372672.csv").map(lines=>lines.split("""\|""")).toDF()
val header = spark1.sparkContext.textFile("C:/Users/ayushgup/Downloads/Header.csv").map(lin=>lin.split("""\|""")).toDF()
val file = header.unionAll(lines).toDF()