Here, we need to divide it into three tasks:
Remove irrelevant characters, such as '[', ']', and '"'
explode array into rows using posexplode as you said,it is important to note that holding the array numbers
concat the results from the explode result while keeping any missing values as null,so we need left join.
the hive sql is below:
WITH data1
AS (
SELECT 'A' col1
,'B' col2
,'["11", "12", "13"]' AS array_str1
,'["1", "2"]' AS array_str2
)
,table1
AS (
SELECT col1
,col2
,pos
,element
FROM data1 LATERAL VIEW posexplode(SPLIT(REPLACE(REPLACE(REPLACE(array_str1, '"', ''), '[', ''), ']', ''), ',')) pos AS pos
,element
)
,table2
AS (
SELECT col1
,col2
,pos
,element
FROM data1 LATERAL VIEW posexplode(SPLIT(REPLACE(REPLACE(REPLACE(array_str2, '"', ''), '[', ''), ']', ''), ',')) pos AS pos
,element
)
SELECT t1.col1
,t1.col2
,t1.element
,t2.element
FROM table1 t1
LEFT JOIN table2 t2 ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
AND t1.pos = t2.pos
output:
col1 col2 element element
A B 11 1
A B 12 2
A B 13 NULL