AWS DocumentDB, a scalable, highly available, and fully managed document database service that supports MongoDB workloads, offers developers flexibility in modeling data using embedded documents. These structures are particularly useful when organizing related data hierarchically or when optimizing for read operations.
In this article, we'll walk through two practical examples using Python and pymongo
to demonstrate:
- How to create and insert nested embedded documents
- How to insert large arrays of embedded sub-documents in a parent document
π What Are Embedded Documents?
Embedded documents are sub-documents stored within a parent document in a MongoDB-compatible database like AWS DocumentDB. They enable logical grouping of data and can improve read performance by reducing the need for joins or multiple queries.
AWS DocumentDB supports these structures just like MongoDB does, allowing developers to model complex data relationships efficiently.
β Connecting to AWS DocumentDB
Before diving into embedded documents, here's how we set up the connection:
from pymongo import MongoClient, errors
import ssl
MONGO_URI = "mongodb://<username>:<password>@<cluster-endpoint>:27017/?tls=true&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false"
CA_FILE = "global-bundle.pem"
client = MongoClient(
MONGO_URI,
tls=True,
tlsCAFile=CA_FILE,
serverSelectionTimeoutMS=5000
)
try:
client.server_info()
print("Connected to AWS DocumentDB successfully.")
except errors.ServerSelectionTimeoutError as err:
print(f"Connection failed: {err}")
π§± Example 1: Deeply Nested Embedded Documents
Letβs create a set of documents where each one includes multiple levels of nesting. This simulates a deeply hierarchical data structure.
import random
db = client["test_db"]
collection = db["embedded_docs"]
collection.delete_many({}) # Clean up
MAX_LEVEL = 5
def create_nested_doc(level, base_doc):
doc = base_doc
for i in range(level):
doc = {"level": i, "data": doc}
return doc
for i in range(100):
level = random.randint(0, MAX_LEVEL)
base = {"index": i, "value": f"Item {i}", "embed_level": level}
doc = create_nested_doc(level, base)
collection.insert_one(doc)
print("Inserted 100 documents with various embedding depths.")
π Why Use Deep Nesting?
- Represent hierarchical data (like categories and subcategories)
- Encapsulate logically related information
- Minimize the number of collections/tables
π§± Example 2: Large Embedded Document Arrays
Next, weβll explore how DocumentDB handles documents that contain large arrays of embedded sub-documents.
import time
db = client["test_db"]
collection = db["large_embedded_docs"]
collection.delete_many({}) # Clean up
sizes = [5000, 10000, 20000, 40000, 100000]
for count in sizes:
embedded_docs = [{"item_id": i, "value": random.randint(0, 100)} for i in range(count)]
parent_doc = {
"doc_type": f"{count}_embedded",
"created_at": time.time(),
"embedded_data": embedded_docs
}
try:
collection.insert_one(parent_doc)
print(f"β
Inserted document with {count} embedded documents.")
except Exception as e:
print(f"β Failed to insert document with {count} embedded documents: {e}")
βοΈ Use Cases for Large Embedded Arrays
- Product listings with many variants
- Log events grouped by session or time window
- Survey responses or form fields in batch
π§ Considerations and Limits
While embedded documents are powerful, remember the following:
- Document size limit: AWS DocumentDB supports documents up to 16 MB.
- Performance trade-offs: Extremely deep or wide documents can impact query and write performance.
- Indexing: You can index fields inside embedded documents, but deeply nested indexing can get complex.
π§ͺ Final Thoughts
Embedded documents are a key feature of schema-less databases like AWS DocumentDB. They allow for rich, nested data structures, improving data modeling flexibility and access performance when used appropriately.
These code examples should give you a solid foundation for experimenting with embedded document strategies in your own AWS DocumentDB projects.
Top comments (0)