Jing for Chat2DB

Posted on May 14

The LIMIT offset, count Trap: Why Large Offsets Slow Down MySQL?

#mysql #database #sql #programming

Interviewer: “Imagine a MySQL table with 10 million records. A query uses LIMIT 1000000,20. Why would this be slow? What's the specific execution flow, and how would you optimize it?"

This is a fantastic, practical question that hits on a common performance bottleneck in MySQL: deep pagination. When the offset in a LIMIT offset, count clause is very large, query performance can plummet dramatically. A query for LIMIT 0,20 might be lightning fast, while LIMIT 1000000,20 on the same 10-million-row table could take many seconds, or even minutes.

Let’s break down why this happens and explore effective solutions.

Why `LIMIT 1000000,20` is Slow: The Execution Flow

The core reason for the slowdown is that MySQL, in most cases, needs to generate, order (if an ORDER BY clause exists), and then traverse through all offset + count rows before it can discard the offset rows and return the requested count rows.

So, for LIMIT 1000000,20, MySQL has to effectively process 1,000,020 rows. Here's a more detailed look at the typical execution flow, especially when an ORDER BY clause is present:

Filtering (if WHERE clause exists): MySQL first applies any WHERE clause conditions to select a subset of rows. Let's assume for this deep pagination problem, a significant number of rows still qualify.
Ordering (if ORDER BY clause exists):

Using an Index for Ordering: If there’s an index that matches the ORDER BY clause, MySQL will use this index to retrieve rows in the correct order. It will read 1,000,020 rows from the index. The Hidden Cost — Bookmark Lookups: If the query selects columns that are not part of the ordering index (e.g., SELECT col1, col2, col3 FROM ... ORDER BY indexed_col), then for each of those 1,000,020 index entries, MySQL must perform a "bookmark lookup" (or "table lookup") to fetch the actual row data from the main table.

This involves many random I/O operations, which are very slow, especially when repeated over a million times.

Not Using an Index for Ordering (Filesort): If there’s no suitable index for the ORDER BY clause, MySQL must perform a filesort. It reads the qualifying rows, sorts them in memory (if they fit) or using temporary disk files (if they don't), and then scans through the sorted result. This sorting operation on potentially millions of rows is extremely resource-intensive.

Row Traversal and Discarding: After obtaining the ordered set of (at least) 1,000,020 rows (either directly from an index or after a filesort), MySQL reads through them sequentially.
Discarding Offset Rows: It discards the first 1,000,000 rows.
Returning Count Rows: Finally, it returns the next 20 rows.

The main performance killers are:

The sheer volume of rows processed (1,000,020).
The numerous bookmark lookups if the ordering index is not a covering index for all selected columns.
The potential for a costly filesort operation.

Optimization Strategies

We can combat this deep pagination slowdown with a couple of robust techniques:

1. Keyset Pagination (or “Seek Method” / Using a Starting ID)

This is the most efficient method for sequential pagination where you’re always fetching the “next” page. Instead of an offset, you use a condition based on the last seen value from the previous page, typically the primary key or an ordered column.

Suppose you are paginating through an Articles table ordered by publish_date (which is indexed and unique or nearly unique), and the last publish_date on the previous page was '2024-05-10 09:30:00' and its unique article_id was 12345.

-- For pages ordered by publish_date DESC, then article_id DESC
SELECT article_id, title, publish_date
FROM Articles
WHERE (publish_date < '2024-05-10 09:30:00') -- Previous page's last publish_date
   OR (publish_date = '2024-05-10 09:30:00' AND article_id < 12345) -- Tie-breaker
ORDER BY publish_date DESC, article_id DESC
LIMIT 20;

If ordering by a unique key like id (primary key), it's simpler:

SELECT article_id, title, publish_date
FROM Articles
WHERE article_id > 1000000 -- Assuming previous page ended at article_id 1000000
ORDER BY article_id ASC
LIMIT 20;

Why it’s efficient:

MySQL can directly “seek” to the starting point in the index (e.g., article_id = 1000000) and then read the next 20 records by traversing the B+ tree leaf nodes’ linked list. There’s no scanning and discarding of a million rows.

As shown above, if the result of the last query is 9, when you query again, you only need to traverse N pieces of data after 9 to get the result, so the efficiency is very high.

Pros: Very fast for “next page” style pagination.

Cons: Doesn’t allow users to jump to arbitrary page numbers (e.g., page 1 to page 500).

2. Covering Index + Subquery

This is a powerful technique for optimizing deep pagination when arbitrary page jumps are needed.

Original (Potentially Slow) Query on the 10M-row UserActions table:

SELECT user_id, action_type, action_details, created_at
FROM UserActions
ORDER BY created_at DESC
LIMIT 1000000, 20;

If created_at is indexed, but user_id, action_type, action_details are not part of that index, this query will perform ~1,000,020 bookmark lookups.

Optimized Query:

The strategy is to first get the primary keys (id) of the desired 20 rows using a subquery that benefits from a covering index, and then join back to the main table.

SELECT ual1.user_id, ual1.action_type, ual1.action_details, ual1.created_at
FROM UserActions ual1
JOIN (
    SELECT id  -- Assuming 'id' is the primary key
    FROM UserActions
    ORDER BY created_at DESC
    LIMIT 1000000, 20
) AS ual2 ON ual1.id = ual2.id;

Why it’s faster:

The Subquery (ual2):
Selects only id (primary key) and orders by created_at.
Crucially, ensure you have a covering index on (created_at, id) for this table.
With this covering index, the subquery can satisfy the ORDER BY created_at, the LIMIT 1000000,20, and the SELECT id entirely from the index. It doesn't touch the main table data for the 1,000,020 rows it considers for the offset. Scanning a (relatively narrow) index is much faster and involves sequential I/O. No bookmark lookups are done for these million-plus rows.
The Outer Query:
The subquery returns only 20 id values.
The outer query then joins UserActions (ual1) with these 20 ids. Since id is the primary key, this join is extremely fast (20 efficient primary key lookups).

This technique dramatically reduces the number of expensive bookmark lookups from ~1,000,020 to just 20.

What is a Covering Index?

A covering index includes all the columns required to satisfy a query (from SELECT, WHERE, ORDER BY parts that operate on the index) directly from the index itself, without needing to access the main table data. This eliminates costly bookmark lookups, significantly boosting performance.

By applying these refined understanding and optimization techniques, the performance issues associated with MySQL’s deep pagination using LIMIT offset, count on large tables can be effectively addressed.

Streamline Your SQL Optimization with Chat2DB

Understanding MySQL’s execution flow and manually crafting optimized queries for deep pagination can be challenging. This is where intelligent database tools can significantly accelerate your workflow.

Chat2DB (https://chat2db.ai) is an AI-powered, versatile database client designed to enhance your productivity with databases like MySQL, PostgreSQL, Oracle, and many others.

Consider how Chat2DB can assist:

AI-Powered Query Generation & Optimization: Get help writing complex queries or receive suggestions to optimize existing ones. Chat2DB’s AI can help you think through strategies like covering indexes or structuring subqueries.
Simplified EXPLAIN Analysis: Easily execute EXPLAIN directly within Chat2DB to understand query plans. (Future enhancements might even offer visual interpretations!)
Efficient Database Management: Connect to and manage multiple database instances and schemas with ease.

DEV Community

The LIMIT offset, count Trap: Why Large Offsets Slow Down MySQL?

Why `LIMIT 1000000,20` is Slow: The Execution Flow

Optimization Strategies

1. Keyset Pagination (or “Seek Method” / Using a Starting ID)

2. Covering Index + Subquery

Streamline Your SQL Optimization with Chat2DB

Top comments (0)

Why LIMIT 1000000,20 is Slow: The Execution Flow

Optimization Strategies

1. Keyset Pagination (or “Seek Method” / Using a Starting ID)

2. Covering Index + Subquery

Streamline Your SQL Optimization with Chat2DB

Why `LIMIT 1000000,20` is Slow: The Execution Flow