2

I have a user model with a payment_info JSONB column that contains the following json example:

{
    "customer_id": "cst_K5gCsCkKAU",
    "subscriptions": [
        {
            "status": "active",
            "external_id": "sub_3Q9Q4bP2zW"
        }
    ]
}

I am a newbie with JSON queries, but created the following against the Postgres (PG) db that seem to work, that is: I search for all users that has a certain external_id value:

SELECT payment_info->'subscriptions' as Subscriptions
    FROM public."user"
    , jsonb_array_elements(payment_info->'subscriptions') as subs
    where (subs->>'external_id')::text = 'sub_3Q9Q4bP2zW'

How do I do the same in SQLAlchemy? I tried several thinks that I found on the web (SO) but it does't work. I tried:

  1. JSONB Comparator

    query = misc.setup_query(db_session, User).filter(
        User.payment_info.comparator.contains(
            ('subscriptions', 'external_id') == payment_subscription_id))
    

    That results in following error:

    sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) operator does not exist: jsonb @> boolean
    LINE 3: WHERE "user".payment_info @> false
                                  ^
    HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
    
  2. json_contains function:

    from sqlalchemy import func
    query = misc.setup_query(db_session, User).filter(
        func.json_contains(User.payment_info,
                           payment_subscription_id,
                           ['subscriptions', 'external_id']))
    

    That results in:

    LINE 3: WHERE json_contains("user".payment_info, 'sub_QxyMEmU', ARRA...
               ^
    HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
    
  3. Path to the key:

    query = misc.setup_query(db_session, User).filter(
        User.payment_info['subscriptions', 'external_id'].astext == payment_subscription_id)
    

    That results in empty result, with the following query:

    SELECT *
    FROM "user" 
    WHERE ("user".payment_info #>> %(payment_info_1)s) = %(param_1)s
    

What am I doing wrong, and how can I make it work? BTW: Do I need to put an index on the external_id? (not present yet)

1 Answer 1

1

You could pretty much implement your original approach using an aliased function expression:

misc.setup_query(db_session, User).\
    select_from(
        User,
        func.jsonb_array_elements(User.payment_info['subscriptions']).
            alias('subs')).\
    filter(column('subs', type_=JSONB)['external_id'].astext == 'sub_3Q9Q4bP2zW')

which compiles to

SELECT "user".id AS user_id, "user".payment_info AS user_payment_info 
FROM "user", jsonb_array_elements("user".payment_info -> %(payment_info_1)s) AS subs 
WHERE (subs ->> %(subs_1)s) = %(param_1)s

On the other hand you could use the containment operator:

misc.setup_query(db_session, User).\
    filter(User.payment_info['subscriptions'].contains(
        [{'external_id': 'sub_3Q9Q4bP2zW'}]))

Note that the outermost list is required, as it is part of the "path" to check. With that same logic you could omit extracting the array:

misc.setup_query(db_session, User).\
    filter(User.payment_info.contains(
        {'subscriptions': [{'external_id': 'sub_3Q9Q4bP2zW'}]}))

The above @> using approaches are indexable using a GIN index. The 1st requires a functional index, because it extracts the array first:

CREATE INDEX user_payment_info_subscriptions_idx ON "user"
USING GIN ((payment_info -> 'subscriptions'));

The 2nd would require indexing the entire payment_info jsonb column. Creating GIN indexes can be done in SQLAlchemy model definitions with Postgresql-specific index options:

class User(Base):
    ...

Index('user_payment_info_subscriptions_idx',
      User.payment_info['subscriptions'],
      postgresql_using='gin')

As to why the various attempts proved unsuccessful:

  1. You're not supposed to access the comparator directly. It provides the operators for the type. In addition you're passing contains() the result of the expression

     ('subscriptions', 'external_id') == payment_subscription_id
    

    which is False, most likely (depends on what payment_subscription_id is). That is to say it is evaluated in Python.

  2. There is no json_contains() function in Postgresql (unlike MySQL). Use the @> operator, or SQL/JSON path functions, such as jsonb_path_exists().

  3. You have the wrong path. User.payment_info['subscriptions', 'external_id'].astext would match something like {"subscriptions": {"external_id": "foo"}}, but in your data subscriptions references an array.

Sign up to request clarification or add additional context in comments.

7 Comments

Whaauuu, I am impressed. Thanks a lot. In the meantime I got it working as "raw" sql. I found it difficult to found good information about this topic, especially if it concerns an json array. Both seem to work out-of-the-box, which one performs better? Or should I prefer? Do I need to put an index on the "external_id"? and if so, how do I do that?
In general it's challenging to answer questions about performance without your data. But I've a hunch that given enough data and the indexes in place the @> approach wins.
Thanks again (sorry about the index comment question, overlooked when writting :(
Concerning the Index: I am using the containment operator, and you mention that the Index should be on the complete payment_info jsonb column. Why is that? And should it be like this?: Index('user_payment_info_subscriptions_idx',payment_info, postgresql_using='gin')
Yeppers, that index works for User.payment_info.contains({'subscriptions': [{'external_id': 'sub_3Q9Q4bP2zW'}]}).
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.