3

Running: PostgreSQL 9.6.2

I have data stored in a table that is in the form of a key/value pair. The "key" is actually the path of a json object, each one being a property. So for example if the key was "cogs","props1","value", then the json object would be like so:

{
  "cogs":{
     "props1": {
       "value": 100    
      }
  }
}

I'd like to somehow reconstruct a json object via a SQL query if possible. Here is the test data set:

drop table if exists test_table;
CREATE TABLE test_table
(
    id serial,
    file_id integer NOT NULL,
    key character varying[],
    value character varying,
    status character varying
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

insert into test_table (file_id, key, value, status)
values (1, '{"cogs","description"}', 'some awesome cog', 'approved');
insert into test_table (file_id, key, value, status)
values (1, '{"cogs","display"}', 'Giant Cog', null);
insert into test_table (file_id, key, value, status)
values (1, '{"cogs","props1","value"}', '100', 'not verified');
insert into test_table (file_id, key, value, status)
values (1, '{"cogs","props1","id"}', 26, 'approved');
insert into test_table (file_id, key, value, status)
values (1, '{"cogs","props1","dimensions"}', '{"200", "300"}', null);
insert into test_table (file_id, key, value, status)
values (1, '{"cogs","props2","value"}', '200', 'not verified');
insert into test_table (file_id, key, value, status)
values (1, '{"cogs","props2","id"}', 27, 'approved');
insert into test_table (file_id, key, value, status)
values (1, '{"cogs","props2","dimensions"}', '{"700", "800"}', null);

insert into test_table (file_id, key, value, status)
values (1, '{"widgets","description"}', 'some awesome widget', 'approved');
insert into test_table (file_id, key, value, status)
values (1, '{"widgets","display"}', 'Giant Widget', null);
insert into test_table (file_id, key, value, status)
values (1, '{"widgets","props1","value"}', '100', 'not verified');
insert into test_table (file_id, key, value, status)
values (1, '{"widgets","props1","id"}', 28, 'approved');
insert into test_table (file_id, key, value, status)
values (1, '{"widgets","props1","dimensions"}', '{"200", "300"}', null);
insert into test_table (file_id, key, value, status)
values (1, '{"widgets","props2","value"}', '200', 'not verified');
insert into test_table (file_id, key, value, status)
values (1, '{"widgets","props2","id"}', 29, 'approved');
insert into test_table (file_id, key, value, status)
values (1, '{"widgets","props2","dimensions"}', '{"900", "1000"}', null);

The output I'm looking for is in this format:

{
    "cogs": {
        "description": "some awesome cog",
        "display": "Giant Cog",
        "props1": {
            "value": 100,
            "id": 26,
            "dimensions": [200, 300]
        },
        "props2": {
            "value": 200,
            "id": 27,
            "dimensions": [700, 800]
        }
    },
    "widgets": {
        "description": "some awesome widget",
        "display": "Giant Widget",
        "props1": {
            "value": 100,
            "id": 28,
            "dimensions": [200, 300]
        },
        "props2": {
            "value": 200,
            "id": 29,
            "dimensions": [900, 1000]
        }
    }
}

Some issues I'm facing:

  1. The "value" column can hold text, numbers, and an array. For whatever reason, the server-side code using knex.js is storing an array of integers (ie, [100,300]) into postgres as the following format: {"100","300"}. I need to ensure I extract this out as an array of integers as well.

  2. Trying to make this dynamic as possible. Maybe a recursive procedure to figure out what depth of the "key" path exists.... rather than hard-coding array lookup values.

  3. json_object_agg works well to group together properties into a single object. However it breaks when hitting a null value. So if the "key" column has only two values (ie, "cogs","description"), and I attempt to aggregate up an array of length three (ie, "cogs","props1","value"), it will break unless I filter on only arrays of length 3.

  4. Preserve the ordering of the input. @klin solution below is amazing and gets me 95% of the way there. However I failed to mention to also preserve the ordering...

1
  • please - version Commented May 23, 2017 at 18:08

1 Answer 1

3

A dynamic solution needs some work.

First, we need a function to convert a text array and a value to a jsonb object.

create or replace function keys_to_object(keys text[], val text)
returns jsonb language plpgsql as $$
declare
    i int;
    rslt jsonb = to_jsonb(val);
begin
    for i in select generate_subscripts(keys, 1, true) loop
        rslt := jsonb_build_object(keys[i], rslt);
    end loop;
    return rslt;
end $$;

select keys_to_object(array['key', 'subkey', 'subsub'], 'value');

              keys_to_object              
------------------------------------------
 {"key": {"subkey": {"subsub": "value"}}}
(1 row)

Next, another function to merge jsonb objects (see Merging JSONB values in PostgreSQL).

create or replace function jsonb_merge(a jsonb, b jsonb) 
returns jsonb language sql as $$ 
select 
    jsonb_object_agg(
        coalesce(ka, kb), 
        case 
            when va isnull then vb 
            when vb isnull then va 
            when jsonb_typeof(va) <> 'object' or jsonb_typeof(vb) <> 'object' then vb 
            else jsonb_merge(va, vb) end 
        ) 
    from jsonb_each(a) e1(ka, va) 
    full join jsonb_each(b) e2(kb, vb) on ka = kb 
$$;

select jsonb_merge('{"key": {"subkey1": "value1"}}', '{"key": {"subkey2": "value2"}}');

                     jsonb_merge                     
-----------------------------------------------------
 {"key": {"subkey1": "value1", "subkey2": "value2"}}
(1 row) 

Finally, let's create an aggregate based on the above function,

create aggregate jsonb_merge_agg(jsonb)
(
    sfunc = jsonb_merge,
    stype = jsonb
);

and we are done:

select jsonb_pretty(jsonb_merge_agg(keys_to_object(key, translate(value, '{}"', '[]'))))
from test_table;

                 jsonb_pretty                 
----------------------------------------------
 {                                           +
     "cogs": {                               +
         "props1": {                         +
             "id": "26",                     +
             "value": "100",                 +
             "dimensions": "[200, 300]"      +
         },                                  +
         "props2": {                         +
             "id": "27",                     +
             "value": "200",                 +
             "dimensions": "[700, 800]"      +
         },                                  +
         "display": "Giant Cog",             +
         "description": "some awesome cog"   +
     },                                      +
     "widgets": {                            +
         "props1": {                         +
             "id": "28",                     +
             "value": "100",                 +
             "dimensions": "[200, 300]"      +
         },                                  +
         "props2": {                         +
             "id": "29",                     +
             "value": "200",                 +
             "dimensions": "[900, 1000]"     +
         },                                  +
         "display": "Giant Widget",          +
         "description": "some awesome widget"+
     }                                       +
 }
(1 row)
Sign up to request clarification or add additional context in comments.

5 Comments

I'm drinking a cold one in your name this evening! Amazing that this works! Clearly I have much to learn :)
Is it possible to preserve the row ordering during the aggregation? I've been reading up on custom aggregates and still a bit confused by it, thus my reply here.
Unfortunately not. By the JSON definition a collection of name/value pairs is unordered. To store ordered list of values you should use JSON array. Vide json.org. Postgres JSONB does not guarantee any order in a json collection.
However, you can try to use json instead of jsonb. Replace jsonb with json in the whole script and remove jsonb_pretty() (this function has no json equivalent). The order of objects should remain as in the table, this is undocumented feature of the json type though.
Made these changes and can confirm that it definitely preserved order! How great. Read up on a lot of this and it seems JSON type performs worse than JSONB... however in my case it's un-noticeable. Thank you for your help!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.