Interviewer: “Imagine a MySQL table with 10 million records. A query uses LIMIT 1000000,20
. Why would this be slow? What's the specific execution flow, and how would you optimize it?"
This is a fantastic, practical question that hits on a common performance bottleneck in MySQL: deep pagination. When the offset
in a LIMIT offset, count
clause is very large, query performance can plummet dramatically. A query for LIMIT 0,20
might be lightning fast, while LIMIT 1000000,20
on the same 10-million-row table could take many seconds, or even minutes.
Let’s break down why this happens and explore effective solutions.
Why LIMIT 1000000,20
is Slow: The Execution Flow
The core reason for the slowdown is that MySQL, in most cases, needs to generate, order (if an ORDER BY
clause exists), and then traverse through all offset + count
rows before it can discard the offset
rows and return the requested count
rows.
So, for LIMIT 1000000,20
, MySQL has to effectively process 1,000,020 rows. Here's a more detailed look at the typical execution flow, especially when an ORDER BY
clause is present:
- Filtering (if
WHERE
clause exists): MySQL first applies anyWHERE
clause conditions to select a subset of rows. Let's assume for this deep pagination problem, a significant number of rows still qualify. - Ordering (if
ORDER BY
clause exists):
- Using an Index for Ordering: If there’s an index that matches the
ORDER BY
clause, MySQL will use this index to retrieve rows in the correct order. It will read 1,000,020 rows from the index. The Hidden Cost — Bookmark Lookups: If the query selects columns that are not part of the ordering index (e.g.,SELECT col1, col2, col3 FROM ... ORDER BY indexed_col
), then for each of those 1,000,020 index entries, MySQL must perform a "bookmark lookup" (or "table lookup") to fetch the actual row data from the main table.
This involves many random I/O operations, which are very slow, especially when repeated over a million times.
- Not Using an Index for Ordering (Filesort): If there’s no suitable index for the
ORDER BY
clause, MySQL must perform afilesort
. It reads the qualifying rows, sorts them in memory (if they fit) or using temporary disk files (if they don't), and then scans through the sorted result. This sorting operation on potentially millions of rows is extremely resource-intensive.
Row Traversal and Discarding: After obtaining the ordered set of (at least) 1,000,020 rows (either directly from an index or after a filesort), MySQL reads through them sequentially.
Discarding Offset Rows: It discards the first 1,000,000 rows.
Returning Count Rows: Finally, it returns the next 20 rows.
The main performance killers are:
- The sheer volume of rows processed (1,000,020).
- The numerous bookmark lookups if the ordering index is not a covering index for all selected columns.
- The potential for a costly
filesort
operation.
Optimization Strategies
We can combat this deep pagination slowdown with a couple of robust techniques:
1. Keyset Pagination (or “Seek Method” / Using a Starting ID)
This is the most efficient method for sequential pagination where you’re always fetching the “next” page. Instead of an offset, you use a condition based on the last seen value from the previous page, typically the primary key or an ordered column.
Suppose you are paginating through an Articles
table ordered by publish_date
(which is indexed and unique or nearly unique), and the last publish_date
on the previous page was '2024-05-10 09:30:00'
and its unique article_id
was 12345
.
-- For pages ordered by publish_date DESC, then article_id DESC
SELECT article_id, title, publish_date
FROM Articles
WHERE (publish_date < '2024-05-10 09:30:00') -- Previous page's last publish_date
OR (publish_date = '2024-05-10 09:30:00' AND article_id < 12345) -- Tie-breaker
ORDER BY publish_date DESC, article_id DESC
LIMIT 20;
If ordering by a unique key like id
(primary key), it's simpler:
SELECT article_id, title, publish_date
FROM Articles
WHERE article_id > 1000000 -- Assuming previous page ended at article_id 1000000
ORDER BY article_id ASC
LIMIT 20;
Why it’s efficient:
MySQL can directly “seek” to the starting point in the index (e.g., article_id = 1000000) and then read the next 20 records by traversing the B+ tree leaf nodes’ linked list. There’s no scanning and discarding of a million rows.
As shown above, if the result of the last query is 9, when you query again, you only need to traverse N pieces of data after 9 to get the result, so the efficiency is very high.
Pros: Very fast for “next page” style pagination.
Cons: Doesn’t allow users to jump to arbitrary page numbers (e.g., page 1 to page 500).
2. Covering Index + Subquery
This is a powerful technique for optimizing deep pagination when arbitrary page jumps are needed.
Original (Potentially Slow) Query on the 10M-row UserActions
table:
SELECT user_id, action_type, action_details, created_at
FROM UserActions
ORDER BY created_at DESC
LIMIT 1000000, 20;
If created_at
is indexed, but user_id
, action_type
, action_details
are not part of that index, this query will perform ~1,000,020 bookmark lookups.
Optimized Query:
The strategy is to first get the primary keys (id) of the desired 20 rows using a subquery that benefits from a covering index, and then join back to the main table.
SELECT ual1.user_id, ual1.action_type, ual1.action_details, ual1.created_at
FROM UserActions ual1
JOIN (
SELECT id -- Assuming 'id' is the primary key
FROM UserActions
ORDER BY created_at DESC
LIMIT 1000000, 20
) AS ual2 ON ual1.id = ual2.id;
Why it’s faster:
- The Subquery (
ual2
): - Selects only
id
(primary key) and orders bycreated_at
. - Crucially, ensure you have a covering index on
(created_at, id)
for this table. - With this covering index, the subquery can satisfy the
ORDER BY created_at
, theLIMIT 1000000,20
, and theSELECT id
entirely from the index. It doesn't touch the main table data for the 1,000,020 rows it considers for the offset. Scanning a (relatively narrow) index is much faster and involves sequential I/O. No bookmark lookups are done for these million-plus rows. - The Outer Query:
- The subquery returns only 20
id
values. - The outer query then joins
UserActions
(ual1
) with these 20id
s. Sinceid
is the primary key, this join is extremely fast (20 efficient primary key lookups).
This technique dramatically reduces the number of expensive bookmark lookups from ~1,000,020 to just 20.
What is a Covering Index?
A covering index includes all the columns required to satisfy a query (from SELECT, WHERE, ORDER BY parts that operate on the index) directly from the index itself, without needing to access the main table data. This eliminates costly bookmark lookups, significantly boosting performance.
By applying these refined understanding and optimization techniques, the performance issues associated with MySQL’s deep pagination using LIMIT offset, count
on large tables can be effectively addressed.
Streamline Your SQL Optimization with Chat2DB
Understanding MySQL’s execution flow and manually crafting optimized queries for deep pagination can be challenging. This is where intelligent database tools can significantly accelerate your workflow.
Chat2DB (https://chat2db.ai) is an AI-powered, versatile database client designed to enhance your productivity with databases like MySQL, PostgreSQL, Oracle, and many others.
Consider how Chat2DB can assist:
- AI-Powered Query Generation & Optimization: Get help writing complex queries or receive suggestions to optimize existing ones. Chat2DB’s AI can help you think through strategies like covering indexes or structuring subqueries.
- Simplified
EXPLAIN
Analysis: Easily executeEXPLAIN
directly within Chat2DB to understand query plans. (Future enhancements might even offer visual interpretations!) - Efficient Database Management: Connect to and manage multiple database instances and schemas with ease.
Top comments (0)