Sitemap
MongoDB

MongoDB empowers innovators to create, transform, and disrupt industries by unleashing the power of software and data

Your Guide to Optimizing Slow Queries

MongoDB
21 min readMay 20, 2025

--

This article was written by Tim Kelly. Find him on LinkedIn.

This article covers how to optimize MongoDB queries — if you haven’t pinpointed which ones need improvement, start with Diagnosing Slow Queries.

Query performance slow-downs in MongoDB almost always come down to two things: unoptimized query shapes or unoptimized indexing. And usually, it’s a little of both.

This guide isn’t here to throw endless theory at you. If you’re here, you already know how to read query plans, spot slow queries, and you want to move past just observing performance problems — you want to fix them.

We’ll walk through how to;

  • Write better queries.
  • Build smarter indexes.
  • Structure faster aggregation pipelines.
  • Tune Atlas Search.
  • Think about data modelling in the way MongoDB expects.

This is about building muscle memory to spot mistakes before they hit production. Not just “queries that work” — queries that work fast, scale well, and keep your systems speedy, even when the data grows.

Query optimization

Before diving into indexing strategies, it’s important to understand the foundations of efficient querying in MongoDB. Indexes are critical, but query structure matters just as much. Poorly structured queries can still perform badly even with the right indexes.

MongoDB’s query optimizer automatically selects the best plan available, but it can only optimize what it’s given. Writing efficient queries is the first step toward making indexes effective.

Create efficient query shapes

To really understand what makes a query efficient in MongoDB, it helps to look at how MongoDB thinks about queries behind the scenes. MongoDB groups similar queries together by their “shape,” the structure of the query, not the specific values. This lets it cache query plans and make smart decisions about how to execute them. But not all shapes are created equal. Some are efficient and take advantage of indexes. Others are more like saying, “Just look at everything and figure it out,” which can be painfully slow.

Let’s explore the difference by running a bad query shape and then improving it. We’ll use the sample_mflix dataset that comes with MongoDB Atlas. It’s a movie database with plenty of fields to play with.

Try this query first:

db.movies.find({ fullplot: /space/ }).sort({ year: -1 });

At a glance, it seems fine — we’re just searching for movies with “space” in the plot and sorting them. But if we check Query Profiler, we can see that this got detected as a slow query.

A whopping 1.53 seconds! MongoDB is doing a full collection scan. That means it’s reading every single document to find matches, because it has no index to help it narrow things down. Even worse, it then sorts that entire result set. This is a classic bad query shape: It looks harmless, but it puts a lot of strain on the server.

Now, try something better:

db.movies.find({ year: { $gt: 2000 } }).sort({ year: -1 });

This doesn’t even show up in Query Profiler, but we can run it with .explain(“executionStats”) to see more details. I abbreviated mine to just show some of the key details.

{
winningPlan: {
stage: "FETCH",
inputStage: {
stage: "IXSCAN",
indexName: "year_-1",
indexBounds: {
year: ["(2000, inf]"]
}
}
},
executionStats: {
nReturned: 11988,
executionTimeMillis: 373,
totalKeysExamined: 11988,
totalDocsExamined: 11988,
executionStages: {
stage: "FETCH",
inputStage: {
stage: "IXSCAN",
indexName: "year_-1"
}
}
},
queryShapeHash: "C8696354...",
planCacheShapeHash: "462B5671"
}

Only 0.373 seconds! This one filters by a numeric field that’s indexed and sorts by the same field, so MongoDB can use an index from the start. The result is a much more efficient query shape. If you inspect the explain output, you’ll see that it uses an index scan instead of a full collection scan. That’s exactly what we want — MongoDB does less work, and we get faster results.

The goal is always to shape our queries so that MongoDB doesn’t have to guess or brute-force its way to an answer. Use indexed fields in our filters and sorts, limit the results early, and be intentional about what we’re asking for. Small changes in structure can lead to huge differences in performance.

Project only necessary fields

When we query for documents, MongoDB fetches full documents by default — including every field, even the ones we don’t need. But most of the time, our app or UI only needs a few fields. By explicitly projecting only the fields we care about, we reduce the amount of data MongoDB has to read, sort, and send back over the network.

Let’s use the sample_mflix.movies collection as an example. Suppose we want to list recently released movies, but we only care about the title and release year.

Here’s how we’d write that:

db.movies.find(
{ year: { $gt: 2015 } },
{ title: 1, year: 1, _id: 0 }
).sort({ year: -1 }).limit(5);

This query:

  • Filters for movies released after 2015.
  • Sorts them from most recent to oldest.
  • Limits the result to five documents.
  • Projects just the title and year fields — everything else (like full plot, cast, ratings, etc.) is excluded.

Here’s what the output might look like:

[
{ "title": "Avengers: Endgame", "year": 2019 },
{ "title": "Joker", "year": 2019 },
{ "title": "Spider-Man: Far from Home", "year": 2019 },
{ "title": "Parasite", "year": 2019 },
{ "title": "Knives Out", "year": 2019 }
]

Notice how clean that is. No unnecessary fields, no huge blobs of text, no large arrays — just exactly what we asked for.

Explicit projection like this avoids carrying unused data through the query pipeline and down into our app. That means less work for MongoDB and faster results for us. Especially on large collections, this can have a big impact on performance.

Maximize query selectivity

Let’s look at this idea using the sample_mflix dataset so we can ground it in something real. MongoDB works best when your queries are highly selective — that means they match only a small portion of documents. This lets MongoDB use indexes effectively and skip over large chunks of irrelevant data. The more precise your query, the more helpful the index becomes.

A common example of poor selectivity is using operators like $ne or $nin, which ask MongoDB to return everything except a value. Here’s what that looks like in the movies collection:

db.movies.find({ rated: { $ne: "PG-13" } });

This query returns all movies that aren’t rated PG-13. Seems harmless, but from MongoDB’s point of view, it’s a tough one. It can’t use the rated_1 index efficiently because the database doesn’t know in advance which values will match — you’re asking it to rule something out, not match something directly. The query might scan a large part of the collection just to be sure it hasn’t missed anything.

Compare that to this more selective query:

db.movies.find({ rated: "PG" });

This one is clean and direct. MongoDB knows exactly what you’re looking for and can jump straight to it using the rated_1 index. This is a highly selective query — only a small portion of the data matches, and MongoDB doesn’t need to touch the rest.

This is simple, fast, and efficient. The key takeaway is that even though operators like $ne or $nin feel flexible, they don’t play well with indexes. When possible, prefer direct matches with exact values. If you find yourself needing to rely on low-selectivity queries often, it might be a sign that your data model needs to evolve to better support the questions you’re asking.

Covered queries

A covered query is a query that can be fulfilled entirely using the index, without reading the actual documents. Covered queries are faster because index entries are smaller and often fully reside in memory.

Example:

db.inventory.createIndex({ type: 1, item: 1 });

db.inventory.find(
{ type: "food", item: /^c/ },
{ item: 1, _id: 0 }
);

Because both the filter and projection fields are in the index (and _id is explicitly excluded), MongoDB can satisfy the query without accessing the collection.

Covered queries reduce I/O and avoid the need to load full documents into memory at all. Because they never touch the underlying collection, they can be especially efficient for high-throughput, read-heavy applications.

Hints and server-side operations

In some cases, especially when debugging or fine-tuning queries, we may want to manually control which index MongoDB uses, via .hint():

db.posts.find({ author_name: "Tim" }).hint({ author_name: 1 });

We can also optimize updates using operators like $inc to increment fields server-side rather than read-modify-write patterns:

db.counters.updateOne(
{ _id: "pageViews" },
{ $inc: { count: 1 } }
);

This reduces contention and avoids race conditions.

Optimizing queries with indexing strategies

One of the most common reasons for slow queries in MongoDB is the absence — or misuse — of indexes. Indexes allow MongoDB to skip scanning every document in the collection (COLLSCAN) and quickly narrow down the documents that match a query condition. Take a step back, tailor our indexes for our application, and learn our query patterns with .explain(). In this section, we’ll look at the most commonly used indexing strategies, and how to apply them.

MongoDB supports a wide variety of indexes beyond what we’ll cover here, including geospatial, hashed, wildcard, and text indexes. You can explore the full range of options in the MongoDB Indexes documentation.

Single-field index

A single-field index is the most basic and widely used index. It targets a single field used frequently in query filters.

Example:

db.users.createIndex({ email: 1 })

This index allows queries like the following to return results quickly without scanning the entire collection:

db.users.find({ email: "tim@example.com" })

Use single-field indexes for fields with high cardinality (many unique values) that are often used in find() filters.

Compound index

A compound index includes multiple fields and is ideal when queries involve more than one condition. The order of the fields matters — queries need to follow that order to benefit.

Example:

db.orders.createIndex({ customerId: 1, orderDate: -1 })

This index supports:

  • Filtering by customerId.
  • Filtering by both customerId and orderDate.
  • Sorting by orderDate after filtering by customerId.
db.orders.find({ customerId: "12345" }).sort({ orderDate: -1 })

Queries must follow the prefix of the index to benefit. In the example above, MongoDB will not use the index efficiently if we only filter by orderDate.

Partial index

A partial index includes only the documents that match a specific filter expression. This reduces index size and can speed up queries that always target a subset of data.

Example:

db.logs.createIndex(
{ createdAt: 1 },
{ partialFilterExpression: { severity: "error" } }
)

This index only includes documents where severity is “error”, making it ideal for alerting and monitoring systems.

Time-to-live (TTL) index

TTL indexes are used to automatically expire documents after a certain time. While they’re primarily for managing data retention, they also improve query performance by keeping the collection size in check.

Example:

db.sessions.createIndex(
{ lastAccessedAt: 1 },
{ expireAfterSeconds: 3600 }
)

This setup removes any session one hour after its lastAccessedAt timestamp. TTL indexes are niche, but helpful when we think in terms of hot, cold, and temporal data. Not all data deserves to live forever.

How to know if we need an index

Before adding a new index, test our query with .explain(“executionStats”). For example:

db.users.find({ email: "tim@example.com" }).explain("executionStats")

Look at the executionStats section:

  • If totalDocsExamined is close to nReturned, our query is using an index efficiently.
  • If totalDocsExamined is much higher, we’re likely missing an index or using it ineffectively.

Start with these most common index types:

  • Single-field for frequent equality lookups
  • Compound for multi-condition queries (respecting order)
  • Partial for filtered subsets
  • TTL for automatic cleanup

We can use indexes strategically based on our query patterns, and we should always validate their effectiveness using .explain().

Write overhead of indexes

Indexes can significantly accelerate read performance, but they don’t come free. Each index on a collection adds write-time overhead that can degrade performance — especially for high-throughput, write-heavy workloads.

What happens on writes?

For every write operation (insert, update, or delete), MongoDB must update the indexes associated with that collection:

  • Insert: MongoDB computes the index keys for the new document and inserts them into every index.
  • Delete: MongoDB removes the document’s keys from all indexes.
  • Update: MongoDB will determine whether any indexed fields are affected:
  • If yes, it removes the old index keys and inserts the new ones.
  • If not, the update will skip index modifications.

The cost is proportional to the number of indexes and their complexity (e.g., multikey or compound).

What can we do to minimize write overhead?

  • Limit the number of indexes: Avoid indexing fields that are not frequently used in read filters or sorts.
  • Avoid indexing volatile fields: Fields that change frequently (e.g., counters, timestamps) can lead to frequent index rewrites.
  • Use compound indexes over multiple single-field indexes if your queries consistently use the same combination of fields.
  • Avoid overly large multikey indexes: Changes to large arrays can be very expensive.
  • Use partial indexes to limit write cost to relevant documents only.

And on top of this, and probably most importantly, frequently audit indexes! Applications evolve over time, and index needs change along with our applications. We can’t just set it and forget it.

Optimize aggregation pipelines

If you’ve used the profiler or explain to identify slow-running aggregations, great! You’ve already spotted the problem. Now, it’s time to focus on the solution: writing pipelines that actually run faster.

MongoDB’s aggregation framework includes an internal optimization phase. Some optimizations happen automatically, but many require deliberate structuring of your pipeline. Understanding both lets you push performance much further.

How MongoDB optimizes pipelines (and why we should care)

MongoDB rewrites pipelines internally during the optimization phase:

  • Moves $match stages earlier to reduce document volume
  • Coalesces redundant stages ($match, $limit, $skip)
  • Combines $sort + $limit to reduce memory usage
  • Optimizes $lookup + $unwind to avoid building massive arrays
  • Uses the slot-based execution engine for some pipelines (since v5.2)

These optimizations aren’t just cosmetic — they directly affect how much memory, CPU, and I/O MongoDB consumes. We can inspect these changes by running:

db.collection.aggregate([…], { explain: true })

Still, MongoDB can only optimize within the structure you give it. If our pipeline is poorly ordered, overly verbose, or misaligned with indexes, the optimizer can only do so much. Writing a faster pipeline starts with structuring it correctly ourselves.

Filter early, not late

Filtering early is the single most impactful thing we can do for most pipelines.

Whenever possible:

  • Place $match immediately after collection scan.
  • Push down filters that don’t depend on computed fields.
  • Leverage indexes with your early $match fields.
db.transactions.aggregate([
{ $match: { status: "completed", amount: { $gte: 100 } } },
{ $group: { _id: "$customer_id", total: { $sum: "$amount" } } }
]);

Note: MongoDB can automatically reorder a $match past a $project or $addFields if the filter doesn’t depend on computed values, but don’t rely on it blindly. Write it properly yourself and validate with explain.

Align with indexes

No index = slow pipeline. It’s that simple for most real-world workloads.

Make sure your $match, $sort, and even $group fields are indexed appropriately:

db.transactions.createIndex({ customer_id: 1, created_at: -1 });

db.transactions.aggregate([
{ $match: { customer_id: "C12345" } },
{ $sort: { created_at: -1 } },
{ $limit: 10 }
]);

Here:

  • The $match uses customer_id index.
  • The $sort uses the compound sort on created_at.
  • Because both are in the same index, this becomes an efficient covered query.

Always cross-check explain output for IXSCAN or DISTINCT_SCAN. If you see COLLSCAN, you’re leaving performance on the table.

Control data volume with projection

MongoDB internally optimizes projections when it can, but explicit projection still matters for clarity and precision. Just like with our queries, we don’t want to waste precious bandwidth on unnecessary data. It can also really benefit developers when it comes to reading and understanding what is going in and coming out at each stage of the aggregation pipeline.

Use $project at the right points:

  • To shape the output (at the end).
  • To explicitly control what gets carried forward (if needed for clarity).
  • But don’t assume early $project will help performance — MongoDB already prunes unused fields automatically, when it can.
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $project: { _id: 0, customer_id: 1, amount: 1 } }
]);

If fields are large (embedded arrays, long text fields), confirm with explain that pruning is happening. You may still want an explicit $project for high-volume data sets for peace of mind.

Sequence matters: sort, skip, limit

MongoDB optimizes common patterns like $sort followed by $limit. This can prevent full in-memory sorts.

db.users.aggregate([
{ $sort: { signupDate: -1 } },
{ $limit: 100 }
]);

MongoDB automatically optimizes this internally to maintain only the top 100 documents in memory while sorting.

If you have:

{ $sort: … },
{ $skip: 10 },
{ $limit: 5 }

MongoDB rewrites it during the optimization phase:

{ $sort: { limit: 15 } }, // skip + limit
{ $skip: 10 }

This way, the sort only materializes the 15 top documents, not the whole dataset.

Rule of thumb:
Filter first, sort second, limit third.

Optimize joins: $lookup and $unwind

MongoDB supports joins using $lookup, which can be incredibly useful — but also expensive if misused. When you’re working with joins across collections, you’re effectively asking MongoDB to walk between large sets of documents. That means it’s crucial to give the server every chance to narrow things down as early as possible.

Here’s a version of an optimized $lookup that works well in many cases:

db.orders.aggregate([
{ $match: { status: "completed" } },
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "customer_id",
as: "customer"
}
},
{ $unwind: "$customer" }
]);

This example does three important things right.

First, it applies a $match on the orders collection before doing the join. That’s a huge win — if we only care about completed orders, we should eliminate the others right away so MongoDB doesn’t waste time and memory processing them in the join.

Second, it uses a simple equality join (localField/foreignField) rather than a pipeline-based $lookup. That means MongoDB can use indexes on the customers.customer_id field to look up matches efficiently, without scanning the whole collection. If that field is indexed — and it should be — then the join will be much faster.

Third, it immediately follows the $lookup with an $unwind. MongoDB optimizes this combo under the hood, so the output doesn’t balloon into large arrays that take up memory or require further processing. This coalescence of $lookup and $unwind is especially important when dealing with one-to-one or one-to-few relationships.

If you’re seeing performance issues with $lookup, check two things right away:

  1. Is the foreignField (the field you’re joining on in the other collection) indexed?
  2. Are you filtering both the input and the joined collections as early as possible?

Poor indexing or excessive use of pipeline-based lookups can cause unexpected CPU or memory spikes, especially as your collections grow. Keep your joins narrow, your inputs filtered, and your fields indexed — MongoDB will thank you.

Slot-based execution: understand when it helps

Starting in version 5.2, MongoDB can run parts of aggregations using the slot-based execution engine (SBE), particularly:

  • $group
  • $lookup

If we see “stage”: “GROUP” or “stage”: “EQ_LOOKUP” in our explain output, we’re on SBE. This means MongoDB can efficiently use memory and CPU to optimize these stages automatically.

If our aggregation isn’t on SBE, it’s often because:

  • There are pipeline stages that can’t be optimized (e.g., $unwind with complex logic).
  • Our $lookup uses sub-pipelines or joins on complex types.

Slot-based execution doesn’t require code changes, but writing pipelines that are clean and index-friendly helps trigger it more consistently.

Build for coalescence

MongoDB’s optimizer coalesces neighboring stages whenever possible:

Example:

[
{ $match: { year: 2024 } },
{ $match: { status: "active" } }
]

is internally rewritten as:

{ $match: { $and: [ { year: 2024 }, { status: "active" } ] } }

Less overhead, faster query planning.

We don’t have to manually merge stages, but if we can logically simplify our pipeline, we can help MongoDB make better choices.

Aggregation optimization checklist

Before calling an aggregation, for a simplified way of checking if things are (at least from a high level) “optimized,” make sure:

  • $match comes as early as possible.
  • All $match and $sort fields are indexed.
  • Explain output shows IXSCAN, not COLLSCAN.
  • Projection is applied only where necessary.
  • $sort + $limit are together if sorting large datasets.
  • $lookup uses indexed fields and simple joins.
  • Explain plan shows slot-based execution (GROUP, EQ_LOOKUP) where possible.
  • Large arrays or documents are pruned early to minimize memory pressure.

Optimize MongoDB Atlas Search queries

Atlas Search makes MongoDB way more powerful, but it’s not magic. If you don’t design your indexes and queries properly, you’ll end up with massive indexes, slow queries, and unpredictable scaling costs.

This section walks you through how to actually get fast, predictable MongoDB Atlas Search performance — from index design to query structure to scaling decisions.

Keep our indexes focused

By default, MongoDB Atlas Search indexes everything (dynamic: true), which sounds great until your disk space explodes and your queries slow down. Don’t just let MongoDB Atlas index whatever it finds. Be explicit.

  • Define exactly which fields you care about.
  • Set store: false unless you actually need to pull the field later after a search.

Example custom mapping:

{
"mappings": {
"dynamic": false,
"fields": {
"title": { "type": "string" },
"publishedAt": { "type": "date" }
}
}
}

If you don’t control your mappings, you’re not controlling your performance.

Also: If your index could have more than 2.1 billion objects (lots of deeply nested documents, massive arrays), you either need to shard your collection or use numPartitions.

If you don’t, your indexes can go stale, and you’ll start getting weird query results you won’t realize are stale until it’s too late.

Know what blows up our index size

Some indexing features are way heavier than others. Here’s what to watch:

Especially if you’re indexing fields like ruleBuilder: { key1: value, key2: value, … }, restructure your documents into tuple arrays:

{
"ruleBuilder": [
{ "name": "key1", "value": "value" },
{ "name": "key2", "value": "value" }
]
}

Otherwise, dynamic indexing can seriously eat into our memory.

Write smarter search queries

MongoDB Atlas Search is fast — but only if you make it easy for it to work.

Here’s what we need to do:

  • Filter inside $search, not afterwards with $match. If you stick filtering outside $search, MongoDB has to search way more docs first.

Bad practice:

[
{ $search: { text: { query: "MongoDB", path: "title" } } },
{ $match: { status: "published" } }
]

Better:

[
{ $search: {
compound: {
must: [
{ text: { query: "MongoDB", path: "title" } },
{ term: { query: "published", path: "status" } }
]
}
}
}
]
  • Always $limit before $facet if you’re doing faceted search.
  • Use $search-level sort instead of adding a $sort later.
  • Paginate with searchAfter, not $skip, unless you love slow queries.
  • Use $searchMeta for counting instead of piping everything into $count.

Once you’re inside $search, keep your filtering, sorting, and scoring tight.

Respect memory and resource limits

MongoDB Atlas Search runs mongot, which uses JVM heap and filesystem cache. If you don’t manage index size and query cost, you’ll run into memory pressure fast.

Symptoms of memory trouble:

  • Elevated page faults
  • Disk IOPS spikes
  • CPU saturation
  • Higher query latency
  • mongot crashes

If you see this happening:

  • Stop using dynamic mappings without strict field control.
  • Store only the fields you truly need with store: true.
  • Move to dedicated search nodes if your search workload is production critical.

The bottom line: MongoDB Atlas Search is powerful, but you have to right-size your memory and watch your mappings.

Expect index rebuilds

Atlas will rebuild indexes automatically if:

  • You update your index definition.
  • There’s a breaking change in a search engine upgrade.
  • Hardware issues like corruption happen.

Note: Index rebuilds need ~125% of the current index size in free disk space. If you don’t have enough disk space, the rebuild fails and you’ll have to manually scale up.

The good news, MongoDB Atlas supports no-downtime rebuilds — searches keep working while the new index builds in the background.

Why we should move away from $text and $regex

If we’re still building apps around $text or $regex for search, it’s time to rethink. MongoDB Atlas Search ($search) absolutely crushes them in real-world scenarios. Choose search if you need:

  • Large scale text search.
  • Fuzzy matching (typos).
  • Language awareness (stopwords, stemming).
  • Synonyms.
  • Case-insensitive search.
  • Faceted results.
  • Complex relevance boosting.

We only really need to stick to $regex if:

  • We need full ACID consistency (rare for search cases).
  • We’re on-prem (Atlas Search is Atlas-only).

Otherwise, you’re leaving serious performance on the table by not moving to $search.

A glimpse into data modelling with MongoDB

It’s nearly impossible to talk about improving query performance in MongoDB without also talking about data modelling. MongoDB has a strong design philosophy: Data that is accessed together should be stored together.

This idea drives almost everything in how you should think about your schemas. Done right, it can make your queries blazing fast. Done wrong, no amount of indexes or query tuning will save you.

We’re only going to take a quick glance at the basics here — this is a deep topic with its own best practices, patterns, and trade-offs. If you really want to get good at this, I highly recommend going through the MongoDB schema design patterns when you get the chance.

For now, here’s what you should be thinking about when you model your data.

Store together what we access together

MongoDB is optimized for workloads where one document contains everything a query needs. Whenever possible, model your data so a read can grab all the relevant information in one fetch.

If a query routinely needs user details and their recent orders, it’s usually better to embed a few recent orders inside the user document, not split everything out into different collections that have to be joined later.

Understand when to embed vs. reference

One of the first decisions you’ll make in MongoDB schema design is whether to embed data inside a document or to reference it from another collection. There’s no one-size-fits-all answer — it’s all about understanding your data and how you access it.

Here’s a rule of thumb: Embed when the data is small, changes together, and is always accessed as a unit. Reference when the data grows large, changes independently, or is shared across multiple documents.

Let’s walk through both approaches using a common scenario: orders and order items.

Embed: Simple, tightly-coupled data

Let’s say each order contains a few items, and we always need all the item details when we fetch an order. The data isn’t huge and it doesn’t change after being written. This is a great candidate for embedding.

db.orders.insertOne({
_id: 1,
customer_id: "abc123",
status: "completed",
created_at: ISODate("2024-04-01T10:00:00Z"),
items: [
{ name: "Coffee Mug", quantity: 2, price: 12.99 },
{ name: "Notebook", quantity: 1, price: 5.49 }
]
});

Now, when we fetch the order, we get everything in one document:

db.orders.find({ _id: 1 });

This pattern is ideal for small sets of data that are always used together. There’s no join, no extra query, and it’s easy to reason about. But it does have limits: If items grow to 500+ entries or start changing independently of the order (e.g., item price updates), it starts to break down.

Reference: Decoupled, flexible data

Now, imagine you’re building a product catalog. Items are reused across many orders, and their details can change over time. Embedding the entire item into every order would cause massive duplication and make updates painful. In this case, referencing makes more sense.

// Product catalog
db.products.insertMany([
{ _id: "p1", name: "Coffee Mug", price: 12.99 },
{ _id: "p2", name: "Notebook", price: 5.49 }
]);

// Order refers to product IDs
db.orders.insertOne({
_id: 2,
customer_id: "xyz789",
status: "pending",
created_at: ISODate("2024–04–01T10:00:00Z"),
items: [
{ product_id: "p1", quantity: 2 },
{ product_id: "p2", quantity: 1 }
]
});

To display an order with item details, you’d need a $lookup:

db.orders.aggregate([
{ $match: { _id: 2 } },
{ $unwind: "$items" },
{
$lookup: {
from: "products",
localField: "items.product_id",
foreignField: "_id",
as: "itemDetails"
}
},
{ $unwind: "$itemDetails" },
{
$project: {
_id: 0,
product: "$itemDetails.name",
quantity: "$items.quantity",
price: "$itemDetails.price"
}
}
]);

This pattern supports:

  • Updates to product details without touching every order.
  • Reuse of product data across multiple orders.
  • Avoiding document bloat for large or growing item sets.

Yes, there’s more complexity, but it’s a better fit when your data is larger, shared, or changes independently.

Think about document size

MongoDB has a 16MB limit per document. We’re unlikely to hit it by accident, but if you’re embedding growing arrays (like user comments, logs, or versions), you need to be careful.

Good rule of thumb: If a list inside a document could grow unbounded, consider splitting it out into a separate collection.

Otherwise, you’re setting yourself up for weird bugs and slow queries later.

Model for our queries, not our “Objects”

Traditional relational database design encourages normalizing everything into neat, clean tables. MongoDB flips that: We model around your queries.

If, 90% of the time, you need to grab the user and their orders in one go, model it so one document can deliver it, even if it feels “denormalized.”

Avoid joins where we can

MongoDB has $lookup for joining collections, but joins are expensive compared to single-document reads.

Use $lookup carefully — ideally for low-cardinality lookups (e.g., adding a few fields from a small related collection).

The more you can answer queries from a single document, the better your application will scale.

Quick summary

When it comes to MongoDB schema design:

  • Store together what you read together.
  • Embed when it makes reads faster.
  • Reference when data changes independently or grows too large.
  • Model based on your application’s query patterns, not strict normalization rules.
  • Aim for single-document reads whenever possible.

Good data modelling isn’t just about clean design — it’s about making your queries faster, your app simpler, and your system easier to scale. And it’s worth investing the time to get it right early on.

Conclusion

At the end of the day, MongoDB rewards people who plan ahead. Good query shapes, the right indexes, clean aggregation pipelines, focused Atlas Search indexes, and a schema that matches your application’s real-world behavior — none of it is optional if you care about performance.

MongoDB will do a lot of work for you under the hood with optimizers and smarter defaults. But it can’t save you from bad structure. If the foundations aren’t there, no amount of explain plans, extra CPUs, or cluster upgrades will fix it.

Learn your query patterns. Model for the real world. Index deliberately. Profile and course-correct proactively, before it becomes a firefight. Do that, and MongoDB will scale far and fast.

--

--

MongoDB
MongoDB

Published in MongoDB

MongoDB empowers innovators to create, transform, and disrupt industries by unleashing the power of software and data

MongoDB
MongoDB

Written by MongoDB

MongoDB unleashes the power of software and data for innovators everywhere.

No responses yet