We're using Python (3.12) with Pydantic models to represent schemas for our MongoDB collections, which we then instantiate with SomeModel.model_validate(<results of pymongo query>). We define relationships between collections using dbrefs; but we don't have an elegant way to handle these in a type-safe way in python.
To handle lazy loading (avoiding fetching the entire object graph on every query) and situations where a referenced document might be deleted, these fields are typed using Union[RelatedModel, DbRef].
Example Models:
import uuid
from typing import Union, Optional, TypeAlias
from pydantic import BaseModel, Field, EmailStr
from bson.dbref import DbRef
class User(BaseModel):
id: uuid.UUID = Field(default_factory=uuid.uuid4)
email: EmailStr
class Post(BaseModel):
id: uuid.UUID = Field(default_factory=uuid.uuid4)
title: str
author: Union[User, DBRef]
# --- Simulation of data loading (e.g., using Motor/Beanie) ---
async def load_post_from_db(post_id: uuid.UUID, fetch_author: bool = False) -> Post:
... # if fetch_author is false, author is left as a reference, if not, the referenced object is loaded
The Problem:
When working with a Post object, Pyright (or MyPy) correctly flags potential errors if we try to access attributes specific to the User model on the post.author field, because it could be a DBRefUUID dictionary/object at runtime.
async def get_author_email(post_id: uuid.UUID) -> Optional[EmailStr]:
# Assume fetch_author=True was used here for this example
post = await load_post_from_db(post_id, fetch_author=True)
# Type Error from Pyright/MyPy:
# error: Item "email" of "Union[User, DbRef]" has no attribute "email"
# return post.author.email # <-- This line causes the error
# Workaround:
if isinstance(post.author, User):
return post.author.email # Type checker is happy inside this block
return None
# Even if we *know* fetch_author=True was used, the type checker doesn't.
This forces us to use runtime checks (isinstance) or assertions (assert isinstance(post.author, User)) frequently, purely to satisfy the type checker, even when our application logic guarantees the field should be resolved. This adds verbosity and couples type safety concerns tightly with runtime checks. Using # type: ignore feels like avoiding the problem.
Question: I think this pattern is pretty common and not limited to mongodb - a similar scenario likely happens for relational databases/ORMs. Is there an elegant way to do this?
Unfortunately python typing isn't as powerful as e.g. TypeScript and thus doesn't allow derived types that would allow us to express something like type PostWithAuthor that is identical to Post except author has type Author rather than Author | DbRef - if that was the case we could use overloads/typeguards to make ad-hoc types that allow us to tell the typesystem which fields are known to contain DbRefs vs. the actual types during the execution flow.
I don't think this is actually solvable in python, so is there a better approach to this whole thing?
class Post[AuthorT: (User, DBRef)](BaseModel):\n\tauthor: AuthorT