0

I have a data table with two columns ID and Details that looks like: enter image description here

Here is the code to create the above dataset:

select stack(4,
             115,array(map('AA_Desc', 'Sale', 'AA_Date', '202403', 'AA_Block', '0')),
             214,array(map('AA_Desc', 'Ret', 'AA_Date', '202402', 'AA_Block', '1')),
             54,array(map('AA_Desc', 'Sale', 'AA_Date', '202312', 'AA_Block', '0')),
             31, array(map('AA_Desc', 'Can', 'AA_Date', '202312'), map('AA_Desc', 'Ret', 'AA_Date', '202312','AA_Block', '0'))
            ) as (id, arr_col)

The data type for column 'details' is array<map<string,string>> and I want to extract certain values from the details field for every ID. Ideally an output in the below format would be great either in HIVE or Impala. I have not worked with this data type structure before so any help would be appreciated. Thanks

Output:

enter image description here

1 Answer 1

0

USING IMPALA

From the 'details' values you provided, it might be easier to try using a data type like array<struct<map<string,string>,map<string,string>,map<string,string>>>

To create this, you could try something like

create TABLE AAtable
(
  id BIGINT,

  details ARRAY < STRUCT <
    AADesc: MAP <STRING, STRING>,
    AADate: MAP <STRING, STRING>,
    AABlock: MAP <STRING, STRING> > >
)
STORED AS PARQUET;

And then to extract what you want, use a query like

select tb.id, dt.aadesc.value, dt.aadate.value, dt.aablock.value 
    from AAtable tb INNER JOIN tb.details dt
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.