This is an actual work problem we had to solve. Put simply: given a structure (e.g. nested dictionaries) and a mapping of old dictionary keys to new ones, produce a new structure that is anatomically identical to the original, uses the new dictionary keys, and preservers every other value.
- How to encode this mapping?
- How to go about the translation?
Context
We receive these dictionaries in the form of json files through an API and, because of extraneous constraints, the sender doesn't have access to our internal nomenclature system. So we need to convert the names ourselves.
Assembling the mappings is quite a laborious manual effort, as it envolves figuring out semantics and talking to people. We are obviously working on a better solution, but these contraints will hold us for a while longer.
Details
Suppose a system which receives json messages such as
msg = {
    "id": 1,
    "summary": {
        "origin": {
            "url": "url",
            "slug": "slug"
        },
        "tags": ["a", "b"]
    },
    "items": [
        {
            "id": "abc",
            "price": 50
        },
        {
            "id": "def",
            "price": 110,
            "discount": 50
        }   
    ]
}
But in order to move the data forward, the names of the dictionary keys must follow a specific nomenclature. So they must be translated, like so:
translated_msg = {
    "IDENTIF": 1,
    "SUMM": {
        "ORIG": {
            "WEBADDRESS": "url",
            "LOCATOR": "slug"
        },
        "TAGS": ["a", "b"]
    },
    "PURCHASEDGOODS": [
        {
            "GOODSID": "abc",
            "GOODSPRICE": 50
        },
        {
            "GOODSID": "def",
            "GOODSPRICE": 110,
            "GIVENDISCOUNT": 50
        }   
    ]
}
The new terminology comes from a translation dictionary that has to be manually built by someone who is familiar with the data and the nomenclature to be followed. This field map must also encode the anatomy of the original structure because there may be multiple fields with the same name but in different depths. Notice the two id fields above.
Solution
With all this in mind, here is a field map structure which fits the criteria. Its syntax is part of the solution I came up with and can be modified.
field_map = {
    "/id": "IDENTIF",
    "/summary": "SUMM",
    "/summary/origin": "ORIG",
    "/summary/origin/url": "WEBADDRESS",
    "/summary/origin/slug": "LOCATOR",
    "/summary/tags": "TAGS",
    "/items": "PURCHASEDGOODS",
    "/items//id": "GOODSID",
    "/items//price": "GOODSPRICE",
    "/items//discount": "GIVENDISCOUNT",
}
Notice /items//discount has two slashes in the middle. Slashes represent going deeper one level within the structure.
Inspired by https://stackoverflow.com/a/40857703/10504841, here is a recursive solution that, given a structure and a field map, walks through the entire structure and builds a translated copy:
from typing import Iterable, Union
def is_valid_iterable(struct):
    return isinstance(struct, Iterable) and not isinstance(
        struct, (str, bytes)
    )
def is_key_in_dict(key, dict_):
    try:
        _ = dict_[key]
        return True
    except KeyError:
        return False
def translate_nested_structure(
    structure: Union[dict, list, tuple], trans_dict: dict, prefix: str = ""
) -> Union[dict, list, tuple]:
    """
    Translate dictionary keys in a nested structure using a translation
    dictionary. Maintains the same strucutre and primitive values.
    Useful for translating jsons and avro messages
    If a key is present in the structure but not in the translation dictionary,
    it is understood as undesired and removed from the output structure
    If a (sub)structure is made of only lists or tuples, the output
    is simply a copy of the given (sub)structure
    Supported types and content limitation for dictionary (sub)structures
    ------------------------------------------------------
    Key fields can be of any primitive type or None.
    Tuple keys are somewhat supported, but not fully tested and not documented.
    "/" are not allowed inside string keys, see translation dictionary syntax
    Value field can be lists, tuples, dicts, any primitive or None
    Translation dictionary syntax
    ------------------------------
    The translation dictionary must capture the anatomy of the nested
    structure, as different nested keys may share the same name.
    The syntax for the translation dictionary keys is made of
    "/"s and `orig_key`s.
    "/" are used to indicate going deeper whithin the strucure,
    so "/" may not be present inside string keys in the structure.
    Also, the number of preceding "/" should match the nesting level
    of the (sub)structure
    `orig_key` are pieces of string which contain
    the name of the specified original key in the structure.
    The syntax for the keys is easier to understand if thought of backwards:
    every key must end with an `orig_key`, since those are
    what need to be translated. A single preceding "/"
    indicates `orig_key` is key a inside another dicionary
    (e.g. "/start/in_a_dict`). In this case,
    unless `orig_key` is the first key (e.g. "/test"), then "/"
    must be preced by another `orig_key (e.g. "/start/test`).
    Multiple preceding "/" indicate `orig_key` is in a
    list or tuple (e.g. "/start//in_a_list", "//start").
    Since the translation dictionary values contain the desired
    new translated (sub)structure keys, the syntax and supported types are
    the same as the original structure syntax for keys. See above
    Parameters
    ----------
    structure: [dict | list | tuple]
        Nested dict, list or tuple.
    trans_dict: dict
        Translation dictionary, see example below.
    prefix: str
        Prefix used to find keys in the translation dictionary, leave blank
    Returns
    -------
    translated_structure: [dict, list, tuple]
        Same structure, but with translated dictionary keys
    Examples
    --------
    >>> sample_msg = {
    ...     "a": {
    ...         "b": ["c", "d"],
    ...         "e": [
    ...             {
    ...                 "f": {"g": "h"},
    ...             },
    ...             {
    ...                 "f": {"g": "h", "g2": "h2"},
    ...             },
    ...         ],
    ...         "i": None,
    ...         "j": [],
    ...     },
    ... }
    >>> sample_translated_msg = {
    ...     "aaaa": {
    ...         "bbbb": ["c", "d"],
    ...         "eeee": [
    ...             {
    ...                 "ffff": {"gggg": "h"},
    ...             },
    ...             {
    ...                 "ffff": {"gggg": "h", "gggg2222": "h2"},
    ...             },
    ...         ],
    ...         "iiii": None,
    ...         "jjjj": [],
    ...     },
    ... }
    >>> sample_field_map = {
    ...     "/a": "aaaa",
    ...     "/a/b": "bbbb",
    ...     "/a/e": "eeee",
    ...     "/a/e//f": "ffff",
    ...     "/a/e//f/g": "gggg",
    ...     "/a/e//f/g2": "gggg2222",
    ...     "/a/i": "iiii",
    ...     "/a/j": "jjjj",
    ... }
    >>> translated_msg = translate_nested_structure(
    ...         sample_msg, sample_field_map
    ...     )
    >>> translated_msg == sample_translated_msg
    True
    TODO
    ----
    - Improve the trans dict syntax?
    """
    def translate_dict(dict_struct, trans_dict, prefix=""):
        if not isinstance(dict_struct, dict):
            raise TypeError("Expect dict, received %s", type(dict_struct))
        new_dict = dict()
        for key, value in dict_struct.items():
            new_prefix = "/".join([prefix, str(key)])
            if not is_key_in_dict(new_prefix, trans_dict):
                continue
            new_key = trans_dict[new_prefix]
            if is_valid_iterable(value):
                new_value = translate_nested_structure(
                    value, trans_dict, new_prefix
                )
            else:
                new_value = value
            new_dict[new_key] = new_value
        return new_dict
    def translate_simple_struct(simple_struct, trans_dict, prefix=""):
        if not isinstance(simple_struct, (list, tuple)):
            raise TypeError(
                "Expect list or tuple, received %s", type(simple_struct)
            )
        cls_ = type(simple_struct)
        new_simple_struct = cls_([])
        for item in simple_struct:
            new_prefix = "/".join([prefix, ""])
            if is_valid_iterable(item):
                new_item = translate_nested_structure(
                    item, trans_dict, new_prefix
                )
            else:
                new_item = item
            new_simple_struct += cls_([new_item])
        return new_simple_struct
    if isinstance(structure, dict):
        return translate_dict(structure, trans_dict, prefix)
    else:
        return translate_simple_struct(structure, trans_dict, prefix)
About tuples as dicitonary keys. I tested a bit and it is possible to encode tuples in the current version of the field map encoding, but the syntax can become quite complicated, so I decided to leave them out for now. The encoding should be as human friendly as possible.
- What are your thoughts on the code itself?
- Do you have any suggestions on how to improve the encoding syntax?
- What about increasing the level of abstraction and supporting more structures, such as sets, classes or custom Iterables?
- I'd also like to hear if other people face similar problems. How often, it at all, do people need to translate dictionary keys like this?


translate_nested_structureandis_valid_iterable. \$\endgroup\$"a": [[{"b":0}]]can be mapped with/a///b\$\endgroup\$