DEV Community

Cover image for Exploring Embedded Documents in AWS DocumentDB with Python
Dmitry Romanoff
Dmitry Romanoff

Posted on

Exploring Embedded Documents in AWS DocumentDB with Python

AWS DocumentDB, a scalable, highly available, and fully managed document database service that supports MongoDB workloads, offers developers flexibility in modeling data using embedded documents. These structures are particularly useful when organizing related data hierarchically or when optimizing for read operations.

Exploring Embedded Documents in AWS DocumentDB with Python

In this article, we'll walk through two practical examples using Python and pymongo to demonstrate:

  • How to create and insert nested embedded documents
  • How to insert large arrays of embedded sub-documents in a parent document

πŸ” What Are Embedded Documents?

Embedded documents are sub-documents stored within a parent document in a MongoDB-compatible database like AWS DocumentDB. They enable logical grouping of data and can improve read performance by reducing the need for joins or multiple queries.

AWS DocumentDB supports these structures just like MongoDB does, allowing developers to model complex data relationships efficiently.


βœ… Connecting to AWS DocumentDB

Before diving into embedded documents, here's how we set up the connection:

from pymongo import MongoClient, errors
import ssl

MONGO_URI = "mongodb://<username>:<password>@<cluster-endpoint>:27017/?tls=true&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false"
CA_FILE = "global-bundle.pem"

client = MongoClient(
    MONGO_URI,
    tls=True,
    tlsCAFile=CA_FILE,
    serverSelectionTimeoutMS=5000
)

try:
    client.server_info()
    print("Connected to AWS DocumentDB successfully.")
except errors.ServerSelectionTimeoutError as err:
    print(f"Connection failed: {err}")
Enter fullscreen mode Exit fullscreen mode

🧱 Example 1: Deeply Nested Embedded Documents

Let’s create a set of documents where each one includes multiple levels of nesting. This simulates a deeply hierarchical data structure.

import random

db = client["test_db"]
collection = db["embedded_docs"]
collection.delete_many({})  # Clean up

MAX_LEVEL = 5

def create_nested_doc(level, base_doc):
    doc = base_doc
    for i in range(level):
        doc = {"level": i, "data": doc}
    return doc

for i in range(100):
    level = random.randint(0, MAX_LEVEL)
    base = {"index": i, "value": f"Item {i}", "embed_level": level}
    doc = create_nested_doc(level, base)
    collection.insert_one(doc)

print("Inserted 100 documents with various embedding depths.")
Enter fullscreen mode Exit fullscreen mode

πŸ”Ž Why Use Deep Nesting?

  • Represent hierarchical data (like categories and subcategories)
  • Encapsulate logically related information
  • Minimize the number of collections/tables

🧱 Example 2: Large Embedded Document Arrays

Next, we’ll explore how DocumentDB handles documents that contain large arrays of embedded sub-documents.

import time

db = client["test_db"]
collection = db["large_embedded_docs"]
collection.delete_many({})  # Clean up

sizes = [5000, 10000, 20000, 40000, 100000]

for count in sizes:
    embedded_docs = [{"item_id": i, "value": random.randint(0, 100)} for i in range(count)]
    parent_doc = {
        "doc_type": f"{count}_embedded",
        "created_at": time.time(),
        "embedded_data": embedded_docs
    }

    try:
        collection.insert_one(parent_doc)
        print(f"βœ… Inserted document with {count} embedded documents.")
    except Exception as e:
        print(f"❌ Failed to insert document with {count} embedded documents: {e}")
Enter fullscreen mode Exit fullscreen mode

βš™οΈ Use Cases for Large Embedded Arrays

  • Product listings with many variants
  • Log events grouped by session or time window
  • Survey responses or form fields in batch

🚧 Considerations and Limits

While embedded documents are powerful, remember the following:

  • Document size limit: AWS DocumentDB supports documents up to 16 MB.
  • Performance trade-offs: Extremely deep or wide documents can impact query and write performance.
  • Indexing: You can index fields inside embedded documents, but deeply nested indexing can get complex.

Exploring Embedded Documents in AWS DocumentDB with Python


πŸ§ͺ Final Thoughts

Embedded documents are a key feature of schema-less databases like AWS DocumentDB. They allow for rich, nested data structures, improving data modeling flexibility and access performance when used appropriately.

These code examples should give you a solid foundation for experimenting with embedded document strategies in your own AWS DocumentDB projects.


🧰 Resources

Top comments (0)