6
\$\begingroup\$

I've written a converter between pydantic models of arbitrary complexity (thus, with submodels, nulls, lists) and neomodel models (library for neo4j). In simplified terms, it covers following transitions (in red) and if target is OGM model - saves result in database:

enter image description here

The converter's main trick is pretty simple: it looks at your complex data (nested dictionaries, lists of objects, etc.) and figures out how to map the whole thing to Neo4j's graph structure. It doesn't just handle flat data - it deals with the entire object hierarchy and builds all the right nodes and relationships in the database. Works both ways too - you can pull a complex graph from Neo4j and get back your nested Python objects. It's especially handy when your data has objects that reference each other in circles, which would normally cause endless loops and stack overflows.

Limitation: OGM models are needed.

Whole implementation with docker-compose file and tests is available here ( >1KLOC in total).

There's a ton of code below, so I'll write my Qs here:

  1. Assuming code separation is not a problem and that 1+ KLOC chunk (converter.py) is pasted for better readability of this post, how can I improve my code in general?
  2. Some functions, like _get_property_type() or _convert_value() are bounded directly to the internals of neomodel. What bothers me are these endless if-elses: I haven't found simple solution of acquiring python datatype from neomodel property, thus created such a spaghetti. Maybe better way of solving problem exists?
  3. There is like 1000 and 1 place with generic try - except Exception: comments in internal methods of neomodel are either non-existing or do not mention which errors are raised and when. Although defensive programming in this case is rather good, is there a better option of specifying which errors to be catched when no sufficient data about errors to be thrown exists?

Example:

from datetime import date
from typing import List, Optional
from neomodel import (
    StructuredNode, StringProperty, IntegerProperty, FloatProperty, BooleanProperty,
    RelationshipTo, config
)
from pydantic import BaseModel, Field
from converter import Converter
import json

# Configure Neo4j connection
config.DATABASE_URL = 'bolt://neo4j:password@localhost:7687'


# ===== Models for complex nested structure =====
class ItemPydantic(BaseModel):
    name: str
    price: float


class OrderPydantic(BaseModel):
    uid: str
    items: List[ItemPydantic] = Field(default_factory=list)


class CustomerPydantic(BaseModel):
    name: str
    email: str
    orders: List[OrderPydantic] = Field(default_factory=list)


class ItemOGM(StructuredNode):
    name = StringProperty()
    price = FloatProperty()


class OrderOGM(StructuredNode):
    uid = StringProperty()
    items = RelationshipTo(ItemOGM, 'CONTAINS')


class CustomerOGM(StructuredNode):
    name = StringProperty()
    email = StringProperty()
    orders = RelationshipTo(OrderOGM, 'PLACED')


# ===== Models for cyclic references =====
class CyclicPydantic(BaseModel):
    name: str
    links: List['CyclicPydantic'] = Field(default_factory=list)


CyclicPydantic.model_rebuild()  # Resolve forward references


class CyclicOGM(StructuredNode):
    name = StringProperty()
    links = RelationshipTo('CyclicOGM', 'LINKS_TO')


# Register models
def register_models():
    # Register nested structure models
    Converter.register_models(ItemPydantic, ItemOGM)
    Converter.register_models(OrderPydantic, OrderOGM)
    Converter.register_models(CustomerPydantic, CustomerOGM)

    # Register cyclic models
    Converter.register_models(CyclicPydantic, CyclicOGM)

    # Register custom type converters
    Converter.register_type_converter(
        date, str, lambda d: d.isoformat()
    )
    Converter.register_type_converter(
        str, date, lambda s: date.fromisoformat(s)
    )


# ===== Example 1: Complex Nested Structure =====
def example_complex_structure():
    print("\n=== EXAMPLE 1: COMPLEX NESTED STRUCTURE ===")

    # Create sample data
    item1 = ItemPydantic(name="Laptop", price=999.99)
    item2 = ItemPydantic(name="Headphones", price=89.99)
    item3 = ItemPydantic(name="Mouse", price=24.99)

    order1 = OrderPydantic(uid="ORD-001", items=[item1, item2])
    order2 = OrderPydantic(uid="ORD-002", items=[item3])

    customer = CustomerPydantic(
        name="John Doe",
        email="[email protected]",
        orders=[order1, order2]
    )

    # Print original structure
    print("Original Pydantic Structure:")
    print(json.dumps({
        "name": customer.name,
        "email": customer.email,
        "orders": [
            {
                "uid": order.uid,
                "items": [
                    {"name": item.name, "price": item.price}
                    for item in order.items
                ]
            }
            for order in customer.orders
        ]
    }, indent=2))

    # Convert to OGM
    customer_ogm = Converter.to_ogm(customer)

    # Extract data from OGM
    ogm_orders = list(customer_ogm.orders.all())
    ogm_items = []
    for order in ogm_orders:
        ogm_items.extend(list(order.items.all()))

    print("\nConverted to OGM:")
    print(json.dumps({
        "name": customer_ogm.name,
        "email": customer_ogm.email,
        "orders_count": len(ogm_orders),
        "order_uids": [order.uid for order in ogm_orders],
        "items_count": len(ogm_items),
        "items": [{"name": item.name, "price": item.price} for item in ogm_items]
    }, indent=2))

    # Convert to dict
    customer_dict = Converter.ogm_to_dict(customer_ogm)
    print("\nConverted to Dict:")
    print(json.dumps(customer_dict, default=str, indent=2))



    # Convert back to Pydantic
    customer_py = Converter.to_pydantic(customer_ogm)
    print("\nRound-trip to Pydantic:")
    print(json.dumps({
        "name": customer_py.name,
        "email": customer_py.email,
        "orders_count": len(customer_py.orders),
        "order_uids": [order.uid for order in customer_py.orders],
        "items": [
            {"name": item.name, "price": item.price}
            for order in customer_py.orders
            for item in order.items
        ]
    }, indent=2))


# ===== Example 2: Cyclic References =====
def example_cyclic_references():
    print("\n=== EXAMPLE 2: CYCLIC REFERENCES ===")

    # Create cycle: A -> B -> C -> A
    node_a = CyclicPydantic(name="NodeA")
    node_b = CyclicPydantic(name="NodeB")
    node_c = CyclicPydantic(name="NodeC")

    node_a.links = [node_b]
    node_b.links = [node_c]
    node_c.links = [node_a]  # Creates cycle

    print("Original Cyclic Structure:")
    print(json.dumps({
        "node": node_a.name,
        "links_to": node_a.links[0].name,
        "links_to_links_to": node_a.links[0].links[0].name,
        "links_to_links_to_links_to": node_a.links[0].links[0].links[0].name,
        "cycle_detected": node_a.links[0].links[0].links[0].name == node_a.name
    }, indent=2))

    # Convert to OGM
    node_a_ogm = Converter.to_ogm(node_a)

    # Extract data from OGM
    ogm_bs = list(node_a_ogm.links.all())
    ogm_cs = list(ogm_bs[0].links.all())
    ogm_as = list(ogm_cs[0].links.all())

    print("\nConverted to OGM:")
    print(json.dumps({
        "node": node_a_ogm.name,
        "links_to": ogm_bs[0].name,
        "links_to_links_to": ogm_cs[0].name,
        "links_to_links_to_links_to": ogm_as[0].name,
        "cycle_detected": ogm_as[0].name == node_a_ogm.name
    }, indent=2))

    # Convert to dict
    node_dict = Converter.ogm_to_dict(node_a_ogm, max_depth=4)
    print("\nConverted to Dict (trimmed for readability):")
    print(json.dumps({
        "name": node_dict["name"],
        "links": [
            {
                "name": node_dict["links"][0]["name"],
                "links": [
                    {
                        "name": node_dict["links"][0]["links"][0]["name"],
                        "has_links_back": "links" in node_dict["links"][0]["links"][0]
                    }
                ]
            }
        ]
    }, indent=2))

    # Convert back to Pydantic
    node_py = Converter.to_pydantic(node_a_ogm)
    print("\nRound-trip to Pydantic:")
    print(json.dumps({
        "node": node_py.name,
        "links_to": node_py.links[0].name,
        "links_to_links_to": node_py.links[0].links[0].name,
        "links_to_links_to_links_to": node_py.links[0].links[0].links[0].name,
        "cycle_detected": node_py.links[0].links[0].links[0].name == node_py.name
    }, indent=2))


if __name__ == "__main__":
    # Register all models
    register_models()

    # Run examples
    example_complex_structure()
    example_cyclic_references()

Example output:

# python3 example.py
=== EXAMPLE 1: COMPLEX NESTED STRUCTURE ===
Original Pydantic Structure:
{
  "name": "John Doe",
  "email": "[email protected]",
  "orders": [
    {
      "uid": "ORD-001",
      "items": [
        {
          "name": "Laptop",
          "price": 999.99
        },
        {
          "name": "Headphones",
          "price": 89.99
        }
      ]
    },
    {
      "uid": "ORD-002",
      "items": [
        {
          "name": "Mouse",
          "price": 24.99
        }
      ]
    }
  ]
}

Converted to OGM:
{
  "name": "John Doe",
  "email": "[email protected]",
  "orders_count": 2,
  "order_uids": [
    "ORD-002",
    "ORD-001"
  ],
  "items_count": 3,
  "items": [
    {
      "name": "Mouse",
      "price": 24.99
    },
    {
      "name": "Headphones",
      "price": 89.99
    },
    {
      "name": "Laptop",
      "price": 999.99
    }
  ]
}

Converted to Dict:
{
  "name": "John Doe",
  "email": "[email protected]",
  "orders": [
    {
      "uid": "ORD-002",
      "items": [
        {
          "name": "Mouse",
          "price": 24.99
        }
      ]
    },
    {
      "uid": "ORD-001",
      "items": [
        {
          "name": "Headphones",
          "price": 89.99
        },
        {
          "name": "Laptop",
          "price": 999.99
        }
      ]
    }
  ]
}

Round-trip to Pydantic:
{
  "name": "John Doe",
  "email": "[email protected]",
  "orders_count": 2,
  "order_uids": [
    "ORD-002",
    "ORD-001"
  ],
  "items": [
    {
      "name": "Mouse",
      "price": 24.99
    },
    {
      "name": "Headphones",
      "price": 89.99
    },
    {
      "name": "Laptop",
      "price": 999.99
    }
  ]
}

=== EXAMPLE 2: CYCLIC REFERENCES ===
Original Cyclic Structure:
{
  "node": "NodeA",
  "links_to": "NodeB",
  "links_to_links_to": "NodeC",
  "links_to_links_to_links_to": "NodeA",
  "cycle_detected": true
}

Converted to OGM:
{
  "node": "NodeA",
  "links_to": "NodeB",
  "links_to_links_to": "NodeC",
  "links_to_links_to_links_to": "NodeA",
  "cycle_detected": true
}

Converted to Dict (trimmed for readability):
{
  "name": "NodeA",
  "links": [
    {
      "name": "NodeB",
      "links": [
        {
          "name": "NodeC",
          "has_links_back": true
        }
      ]
    }
  ]
}

Round-trip to Pydantic:
{
  "node": "NodeA",
  "links_to": "NodeB",
  "links_to_links_to": "NodeC",
  "links_to_links_to_links_to": "NodeA",
  "cycle_detected": true
}
\$\endgroup\$

2 Answers 2

4
\$\begingroup\$

We have a triangle of {dict, pydantic, OGM}, and the promise is we can bounce an object back and forth between a pair, or go all the way around the triangle, and bring that object back alive. Looks great!

On the plus side, these are quite concrete promises that are readily verified. What makes testing rather daunting is it's a very broad set of promises, so during testing we should cast a wide net. The hypothesis library is exceptionally good at constructing crazy test inputs, ones you would never have thought to devise manually. It is wonderfully devious, trying to find a loophole in your code it can sneak through. And if it finds nothing, well, that improves your confidence in the ~ 1 KLOC of source code.

specific exceptions

  1. ... generic try - except Exception:

Yeah, I share your concern about this.

For example, in to_pydantic():

        try:
            pydantic_data[prop_name] = getattr(ogm_instance, prop_name)
        except Exception:
            pass

I'm skeptical that there's many ways for getattr() to raise.

The generic technique I'd advise here is to use coverage, perhaps via pytest --cov, and try hard to cover the pass statement or similar error handlers like a continue statement. Having done that, I predict you'll find the exception can only be AttributeError, and even that could be avoided with

        sentinel = object()

and then assigning ... = getattr(ogm_instance, prop_name, sentinel).

More generally, if e.g. get_type_hints() or parse_obj() fails, it's not clear to me that "log + ignore" is the correct approach. I mean, if they didn't hold up their end of the contract, if they didn't keep their promise, how can you keep yours? It seems more appropriate to view their exception as fatal and to let it bubble up the call stack to the application, which can deal with it as it chooses, possibly failing so the user can see the trouble and sort it out.

When released into the wild and a hundred apps use the library, I imagine you'll get a handful of bug reports leading to a new library release. But after a bit that will settle down, and you plus your users will have discovered the set of things can go wrong, when using the code under realistic conditions. They have the code, so they can make a trivial except FooError: patch to get things back on track in their environment.

helpers

Staying with just to_pydantic(), that method is Too Long, and I'm not even counting the 15 lines of beautiful docstring. (Thank you for all the docstrings!)

The primary way of noticing this is you can't read the whole text of the method without vertically scrolling. Doesn't matter how big your 4K screen is, nor what (legible!) font size you choose -- it won't fit in a screenful. But another way you might notice, is the indent level is getting a bit carried away, with e.g. pass indented five levels. Both of those are signals that it's time to start looking for units of work that can be broken out as separate helper functions, each with its own docstring, its own contract.

But wait, there's more! Look at those great comments introducing processing sections, like # Check max depth. The helpers are practically writing themselves at this point! Nuke those terrific comments in favor of def _check_max_depth():, def _extract_basic_properties():, def _infer_pydantic_class(): and so on.

It's a standard aphorism that "a line of code not yet executed is a line of code that's likely buggy." A huge advantage of decomposing a giant function into several building blocks is it becomes much easier to unit test each of the building blocks and exercise all lines of target code.

\$\endgroup\$
2
\$\begingroup\$

Here are some improvements to your type hints:

Your Converter.register_type_converter type hints are too generic, IMO. They allow you to pass a function that does not actually convert from source to target (according to the type hints).

Consider this simplified example:

from typing import Any, Dict, Callable, Tuple, Type


class Converter:
    _type_converters: Dict[Tuple[Type, Type], Callable[[Any], Any]] = {}

    @classmethod
    def register_type_converter(
        cls,
        source_type: Type,
        target_type: Type,
        converter_func: Callable[[Any], Any],
    ) -> None:
        cls._type_converters[(source_type, target_type)] = converter_func


def int_to_str(value: int) -> str:
    return str(value)


if __name__ == "__main__":
    Converter.register_type_converter(int, str, int_to_str)
    Converter.register_type_converter(float, str, int_to_str)

This does not make any sense, obviously, but the type checker will accept it, due to the liberal use of Any.

Much better would be:

from typing import Any, Dict, Callable, Tuple, Type, TypeVar


S = TypeVar("S")
T = TypeVar("T")


class Converter:
    _type_converters: Dict[Tuple[Type, Type], Callable[[Any], Any]] = {}

    @classmethod
    def register_type_converter(
        cls,
        source_type: Type[S],
        target_type: Type[T],
        converter_func: Callable[[S], T],
    ) -> None:
        cls._type_converters[(source_type, target_type)] = converter_func


def int_to_str(value: int) -> str:
    return str(value)


if __name__ == "__main__":
    Converter.register_type_converter(int, str, int_to_str)
    Converter.register_type_converter(float, str, int_to_str)

Running pyright (or mypy) will correctly find the error in the second type converter:

/tmp/type_test/main.py
  /tmp/type_test/main.py:29:51 - error: Argument of type "(value: int) -> str" cannot be assigned to parameter "converter_func" of type "(S@register_type_converter) -> T@register_type_converter" in function "register_type_converter"
    Type "(value: int) -> str" is not assignable to type "(float) -> str"
      Parameter 1: type "float" is incompatible with type "int"
        "float" is not assignable to "int" (reportArgumentType)
1 error, 0 warnings, 0 informations

You can then reuse the T type variable for the actual conversion method, so you (and the type checker) know that the return value of the conversion is either of type T or None:

    @classmethod
    def _convert_value(cls, value: Any, target_type: Type[T]) -> Optional[T]:
        ...

As for reducing the endless if statements there, you can do the basic types in one go and add the conversions from string to date and datetime as a conversion function. You can also exit early if source and target type are identical (nothing to do):

class Converter:
    ...

    @classmethod
    def _convert_value(cls, value: Any, target_type: Type[T]) -> Optional[T]:
        if value is None:
            return None

        source_type = type(value)
        if source_type == target_type:
            return value

        # Check for direct registered converter
        converter = cls._type_converters.get((source_type, target_type))
        if converter:
            return converter(value)

        # Handle basic type conversions
        # order is important here, since float > int > bool, so need to use a tuple, not a set!
        if target_type in (str, int, float, bool):
            return target_type(value)

        # If the target_type is a class and the value is a dict, attempt to create an instance
        if (
            isinstance(target_type, type)
            and issubclass(target_type, BaseModel)
            and isinstance(value, dict)
        ):
            return target_type(**value)

        # If we get here, return None
        return None


def str_to_datetime(value: str) -> datetime:
    try:
        # Try ISO format first
        return datetime.fromisoformat(value)
    except ValueError:
        # Try other common formats
        for fmt in ["%Y-%m-%d", "%Y/%m/%d", "%d-%m-%Y", "%d/%m/%Y"]:
            try:
                return datetime.strptime(value, fmt)
            except ValueError:
                continue

        # If all else fails, raise an error
        raise ConversionError(f"Cannot convert string '{value}' to datetime")


def str_to_date(value: str) -> date:
    try:
        # Try ISO format first
        return date.fromisoformat(value)
    except ValueError:
        # Try other common formats
        for fmt in ["%Y-%m-%d", "%Y/%m/%d", "%d-%m-%Y", "%d/%m/%Y"]:
            try:
                return datetime.strptime(value, fmt).date()
            except ValueError:
                continue

        # If all else fails, raise an error
        raise ConversionError(f"Cannot convert string '{value}' to date")



Converter.register_type_converter(str, datetime, str_to_datetime)
Converter.register_type_converter(datetime, str, datetime.isoformat)
Converter.register_type_converter(str, date, str_to_date)
Converter.register_type_converter(date, str, date.isoformat)

Note that I returned None in the case the conversion failed. Otherwise the return type would still have to be Any due to your final fallback. A TypeError may even be warranted here. Both can be handled / caught in the calling code.

You could even allow registering a type conversion function via a decorator with side effects:

from typing import Any, Dict, Callable, Tuple, Type, TypeVar
from inspect import signature
from datetime import datetime

S = TypeVar("S")
T = TypeVar("T")

class Converter:
    _type_converters: Dict[Tuple[Type, Type], Callable[[Any], Any]] = {}

    @classmethod
    def type_converter(cls, func: Callable[[S], T]) -> Callable[[S], T]:
        sig = signature(func)
        cls._type_converters[
            (tuple(sig.parameters.values())[0].annotation, sig.return_annotation)
        ] = func
        return func

    ...


@Converter.type_converter
def str_to_datetime(value: str) -> datetime:
    try:
        # Try ISO format first
        return datetime.fromisoformat(value)
    except ValueError:
        # Try other common formats
        for fmt in ["%Y-%m-%d", "%Y/%m/%d", "%d-%m-%Y", "%d/%m/%Y"]:
            try:
                return datetime.strptime(value, fmt)
            except ValueError:
                continue

        # If all else fails, raise an error
        raise ConversionError(f"Cannot convert string '{value}' to datetime")
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.