I have a scenario where i have to convert data in different columns to be displayed in one columns.
Below is the data available.
+-----------------------+----------+-----------------------+------+
|BaseTime |SGNL_NAME |SGNL_TIME |SGNL_V|
+-----------------------+----------+-----------------------+------+
|2019-11-21 18:19:15.817|Acc |2019-11-21 18:18:16.645|0.0 |
|2019-11-21 18:19:15.817|Acc |2019-11-21 18:18:16.645|0.0 |
|2019-11-21 18:19:15.817|Acc |2019-11-21 18:18:16.645|0.0 |
|2019-11-21 18:19:15.817|Acc |2019-11-21 18:18:17.645|0.0 |
|2019-11-21 18:19:15.817|Acc |2019-11-21 18:18:17.645|0.0 |
+-----------------------+----------+-----------------------+------+
the expected output is as below: where as a new column is created with combination of NAME, TIME and V as elements of an array.
"SGNL": [
{
"SGNL_NAME ": "Acc ",
"SGNL_TIME ": 1574128316834,
"SGNL_V": 0.0
}
]
+-----------------------+-----------------------------------------------------------------+
|BaseTime |SGNL |
+-----------------------+-----------------------------------------------------------------+
|2019-11-21 18:19:15.817|[{"SGNL_NAME": "Acc" ,"SGNL_TIME": 1574128316834,"SGNL_V": 0.0}]|
|2019-11-21 18:19:15.817|[{"SGNL_NAME": "Acc" ,"SGNL_TIME": 1574128316834,"SGNL_V": 0.0}]|
|2019-11-21 18:19:15.817|[{"SGNL_NAME": "Acc" ,"SGNL_TIME": 1574128316834,"SGNL_V": 0.0}]|
|2019-11-21 18:19:15.817|[{"SGNL_NAME": "Acc" ,"SGNL_TIME": 1574128316834,"SGNL_V": 0.0}]|
|2019-11-21 18:19:15.817|[{"SGNL_NAME": "Acc" ,"SGNL_TIME": 1574128316834,"SGNL_V": 0.0}]|
+-----------------------------------------------------------------------------------------+
the schema of input is as given below
root
|-- BaseTime: timestamp (nullable = true)
|-- SGNL_NAME: string (nullable = true)
|-- SGNL_TIME: timestamp (nullable = true)
|-- SGNL_V: string (nullable = true)
I am trying with writing UDF to combine rows, Is there any other solutions available?