Dmitry Romanoff

Posted on Jun 4

Exploring Embedded Documents in AWS DocumentDB with Python

#aws #programming #python #tutorial

AWS DocumentDB, a scalable, highly available, and fully managed document database service that supports MongoDB workloads, offers developers flexibility in modeling data using embedded documents. These structures are particularly useful when organizing related data hierarchically or when optimizing for read operations.

In this article, we'll walk through two practical examples using Python and pymongo to demonstrate:

How to create and insert nested embedded documents
How to insert large arrays of embedded sub-documents in a parent document

🔍 What Are Embedded Documents?

Embedded documents are sub-documents stored within a parent document in a MongoDB-compatible database like AWS DocumentDB. They enable logical grouping of data and can improve read performance by reducing the need for joins or multiple queries.

AWS DocumentDB supports these structures just like MongoDB does, allowing developers to model complex data relationships efficiently.

✅ Connecting to AWS DocumentDB

Before diving into embedded documents, here's how we set up the connection:

from pymongo import MongoClient, errors
import ssl

MONGO_URI = "mongodb://<username>:<password>@<cluster-endpoint>:27017/?tls=true&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false"
CA_FILE = "global-bundle.pem"

client = MongoClient(
    MONGO_URI,
    tls=True,
    tlsCAFile=CA_FILE,
    serverSelectionTimeoutMS=5000
)

try:
    client.server_info()
    print("Connected to AWS DocumentDB successfully.")
except errors.ServerSelectionTimeoutError as err:
    print(f"Connection failed: {err}")

🧱 Example 1: Deeply Nested Embedded Documents

Let’s create a set of documents where each one includes multiple levels of nesting. This simulates a deeply hierarchical data structure.

import random

db = client["test_db"]
collection = db["embedded_docs"]
collection.delete_many({})  # Clean up

MAX_LEVEL = 5

def create_nested_doc(level, base_doc):
    doc = base_doc
    for i in range(level):
        doc = {"level": i, "data": doc}
    return doc

for i in range(100):
    level = random.randint(0, MAX_LEVEL)
    base = {"index": i, "value": f"Item {i}", "embed_level": level}
    doc = create_nested_doc(level, base)
    collection.insert_one(doc)

print("Inserted 100 documents with various embedding depths.")

🔎 Why Use Deep Nesting?

Represent hierarchical data (like categories and subcategories)
Encapsulate logically related information
Minimize the number of collections/tables

🧱 Example 2: Large Embedded Document Arrays

Next, we’ll explore how DocumentDB handles documents that contain large arrays of embedded sub-documents.

import time

db = client["test_db"]
collection = db["large_embedded_docs"]
collection.delete_many({})  # Clean up

sizes = [5000, 10000, 20000, 40000, 100000]

for count in sizes:
    embedded_docs = [{"item_id": i, "value": random.randint(0, 100)} for i in range(count)]
    parent_doc = {
        "doc_type": f"{count}_embedded",
        "created_at": time.time(),
        "embedded_data": embedded_docs
    }

    try:
        collection.insert_one(parent_doc)
        print(f"✅ Inserted document with {count} embedded documents.")
    except Exception as e:
        print(f"❌ Failed to insert document with {count} embedded documents: {e}")

⚙️ Use Cases for Large Embedded Arrays

Product listings with many variants
Log events grouped by session or time window
Survey responses or form fields in batch

🚧 Considerations and Limits

While embedded documents are powerful, remember the following:

Document size limit: AWS DocumentDB supports documents up to 16 MB.
Performance trade-offs: Extremely deep or wide documents can impact query and write performance.
Indexing: You can index fields inside embedded documents, but deeply nested indexing can get complex.

🧪 Final Thoughts

Embedded documents are a key feature of schema-less databases like AWS DocumentDB. They allow for rich, nested data structures, improving data modeling flexibility and access performance when used appropriately.

These code examples should give you a solid foundation for experimenting with embedded document strategies in your own AWS DocumentDB projects.

DEV Community