4

I am working with a JSON object, and want to convert object.hours to relational table, based on Spark SQL dataframe/dataset.

I tried to use "explode", which is not really supporting the "structs array".

The json object is below:

{
  "business_id": "abc",
  "full_address": "random_address",
  "hours": {
    "Monday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Tuesday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Friday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Wednesday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Thursday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Sunday": {
      "close": "00:00",
      "open": "11:00"
    },
    "Saturday": {
      "close": "02:00",
      "open": "11:00"
    }
  }
}

To a relational table like below,

CREATE TABLE "business_hours" (
     "id" integer NOT NULL PRIMARY KEY,
     "business_id" integer NOT NULL FOREIGN KEY REFERENCES "businesses",
     "day" integer NOT NULL,
     "open_time" time,
     "close_time" time
)
0

1 Answer 1

3

You can do this using this trick:

import org.apache.spark.sql.types.StructType
val days = df.schema 
  .fields
  .filter(_.name=="hours")
  .head
  .dataType
  .asInstanceOf[StructType]
  .fieldNames

val solution = df
  .select(
    $"business_id",
    $"full_address",
    explode(
      array(
        days.map(d => struct(
          lit(d).as("day"),
          col(s"hours.$d.open").as("open_time"),
          col(s"hours.$d.close").as("close_time")
        )):_*
      )
    )
  )
  .select($"business_id",$"full_address",$"col.*")

scala> solution.show
+-----------+--------------+---------+---------+----------+
|business_id|  full_address|      day|open_time|close_time|
+-----------+--------------+---------+---------+----------+
|        abc|random_address|   Friday|    11:00|     02:00|
|        abc|random_address|   Monday|    11:00|     02:00|
|        abc|random_address| Saturday|    11:00|     02:00|
|        abc|random_address|   Sunday|    11:00|     00:00|
|        abc|random_address| Thursday|    11:00|     02:00|
|        abc|random_address|  Tuesday|    11:00|     02:00|
|        abc|random_address|Wednesday|    11:00|     02:00|
+-----------+--------------+---------+---------+----------+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.