0

We're using Python (3.12) with Pydantic models to represent schemas for our MongoDB collections, which we then instantiate with SomeModel.model_validate(<results of pymongo query>). We define relationships between collections using dbrefs; but we don't have an elegant way to handle these in a type-safe way in python.

To handle lazy loading (avoiding fetching the entire object graph on every query) and situations where a referenced document might be deleted, these fields are typed using Union[RelatedModel, DbRef].

Example Models:

import uuid
from typing import Union, Optional, TypeAlias
from pydantic import BaseModel, Field, EmailStr
from bson.dbref import DbRef


class User(BaseModel):
    id: uuid.UUID = Field(default_factory=uuid.uuid4)
    email: EmailStr

class Post(BaseModel):
    id: uuid.UUID = Field(default_factory=uuid.uuid4)
    title: str
    author: Union[User, DBRef]



# --- Simulation of data loading (e.g., using Motor/Beanie) ---
async def load_post_from_db(post_id: uuid.UUID, fetch_author: bool = False) -> Post:
    ... # if fetch_author is false, author is left as a reference, if not, the referenced object is loaded

The Problem:

When working with a Post object, Pyright (or MyPy) correctly flags potential errors if we try to access attributes specific to the User model on the post.author field, because it could be a DBRefUUID dictionary/object at runtime.

async def get_author_email(post_id: uuid.UUID) -> Optional[EmailStr]:
    # Assume fetch_author=True was used here for this example
    post = await load_post_from_db(post_id, fetch_author=True)

    # Type Error from Pyright/MyPy:
    # error: Item "email" of "Union[User, DbRef]" has no attribute "email"
    # return post.author.email # <-- This line causes the error

    # Workaround:
    if isinstance(post.author, User):
        return post.author.email # Type checker is happy inside this block
    return None

# Even if we *know* fetch_author=True was used, the type checker doesn't.

This forces us to use runtime checks (isinstance) or assertions (assert isinstance(post.author, User)) frequently, purely to satisfy the type checker, even when our application logic guarantees the field should be resolved. This adds verbosity and couples type safety concerns tightly with runtime checks. Using # type: ignore feels like avoiding the problem.

Question: I think this pattern is pretty common and not limited to mongodb - a similar scenario likely happens for relational databases/ORMs. Is there an elegant way to do this?

Unfortunately python typing isn't as powerful as e.g. TypeScript and thus doesn't allow derived types that would allow us to express something like type PostWithAuthor that is identical to Post except author has type Author rather than Author | DbRef - if that was the case we could use overloads/typeguards to make ad-hoc types that allow us to tell the typesystem which fields are known to contain DbRefs vs. the actual types during the execution flow.

I don't think this is actually solvable in python, so is there a better approach to this whole thing?

3
  • 1
    Can`t you use a generic version of Post and overload load_post_from_db? Commented Apr 25 at 12:58
  • Yeah, the canonical way to solve this is something along the lines of class Post[AuthorT: (User, DBRef)](BaseModel):\n\tauthor: AuthorT Commented Apr 25 at 20:54
  • @dROOOze Would be lovely if you could flesh this out a tiny bit more in an answer because I have follow-up questions ;-) Commented Apr 29 at 6:20

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.