0

I have a hive table with one of the example row:

A, B, [“11”, “12”, “13”], [“1”, “2”] The array lengths of columns 3 and 4 are sometimes the same, and sometimes not.

Output:

A, B, 11, 1

A, B, 12, 2

A, B, 13, [null]

I am new to hive and really need help figuring this out.

The length of arrays keeps changing. It’s not static. sometimes it’s of length 3 or 4 or 5. That’s where i’m stuck.

I used the posexplode function in hive

1
  • Please provide enough code so others can better understand or reproduce the problem. Commented Sep 5, 2024 at 10:41

1 Answer 1

0

Here, we need to divide it into three tasks:

  1. Remove irrelevant characters, such as '[', ']', and '"'

  2. explode array into rows using posexplode as you said,it is important to note that holding the array numbers

  3. concat the results from the explode result while keeping any missing values as null,so we need left join.

the hive sql is below:

WITH data1
AS (
    SELECT 'A' col1
        ,'B' col2
        ,'["11", "12", "13"]' AS array_str1
        ,'["1", "2"]' AS array_str2
    )
    ,table1
AS (
    SELECT col1
        ,col2
        ,pos
        ,element
    FROM data1 LATERAL VIEW posexplode(SPLIT(REPLACE(REPLACE(REPLACE(array_str1, '"', ''), '[', ''), ']', ''), ',')) pos AS pos
        ,element
    )
    ,table2
AS (
    SELECT col1
        ,col2
        ,pos
        ,element
    FROM data1 LATERAL VIEW posexplode(SPLIT(REPLACE(REPLACE(REPLACE(array_str2, '"', ''), '[', ''), ']', ''), ',')) pos AS pos
        ,element
    )
SELECT t1.col1
    ,t1.col2
    ,t1.element
    ,t2.element
FROM table1 t1
LEFT JOIN table2 t2 ON t1.col1 = t2.col1
    AND t1.col2 = t2.col2
    AND t1.pos = t2.pos

output:

col1    col2    element element
    A   B   11  1
    A   B    12  2
    A   B    13 NULL
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.