Optimize SELECT query with ORDER BY, OFFSET and LIMIT of postgresql

Question

This is my table schema

Column       |          Type          |                      Modifiers                      
-------------+------------------------+------------------------------------------------------
id           | integer                | not null default nextval('message_id_seq'::regclass)
date_created | bigint                 |
content      | text                   |
user_name    | character varying(128) |
user_id      | character varying(128) |
user_type    | character varying(8)   |
user_ip      | character varying(128) |
user_avatar  | character varying(128) |
chatbox_id   | integer                | not null
Indexes:
    "message_pkey" PRIMARY KEY, btree (id)
    "idx_message_chatbox_id" btree (chatbox_id)
    "indx_date_created" btree (date_created)
Foreign-key constraints:
    "message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE

This is the query

SELECT * 
FROM message 
WHERE chatbox_id=$1 
ORDER BY date_created 
OFFSET 0 
LIMIT 20;

($1 will be replaced by the actual ID)

It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records

I am using Postresql Server 9.1.5 with default options.

Update the output of EXPLAIN ANALYZE

Limit  (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
->  Index Scan Backward using indx_date_created on message  (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)

Update server specification

Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB

Final solution

Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested

I am using Postresql Server 9.1.5 with default options. You could start by setting estimated_cache_size to 3/4 of your available RAM, and setting work_mem to ~10M. And maybe set random_page_cost to 1.5. and run VACUUM ANALYZE on all relevant tables. — wildplasser
– wildplasser, Commented Nov 24, 2012 at 16:30
@laoshanlung: The EXPLAIN output seems to be OK. So I guess it is the good case, not the bad case. Of course the bad case is the interesting thing. — A.H.
– A.H., Commented Nov 24, 2012 at 18:36

A.H. · Accepted Answer · 2012-11-24 18:40:56Z

8

You could try to give PostgreSQL a better index for that query. I propose something like this:

create index invent_suitable_name on message(chatbox_id, date_created);

or

 create index invent_suitable_name on message(chatbox_id, date_created desc);

answered Nov 24, 2012 at 18:40

A.H.

66.7k16 gold badges97 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ihor Romanchenko Over a year ago

You do not need specify desc for index fields for postgres. Postgres can scan indexes in both directions.

Craig Ringer Over a year ago

@IgorRomanchenko While true, creating the index sorted in the correct direction will make the index scan considerably faster. Backward index scans are quite a bit slower.

Tan Nguyen Over a year ago

I posted my table schema, I have already had index for chatbox_id and date_created

Ihor Romanchenko Over a year ago

@CraigRinger Thank you, Didn't know about that (as always not everything is in the manual).

A.H. Over a year ago

@laoshanlung: Currently you have two indizes each one containing one field. My proposal creates one index covering both fields. From a performance view this is very different.

Ihor Romanchenko · Accepted Answer · 2012-11-25 10:07:33Z

Try adding an index for chatbox_id, date_created. For this particular query it will give you maximum performance.

For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).

UPD My guess for the reason of bad performance:

At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:

Index scan on chatbox_id;
Ordering of returned records to get top 20.

Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.

UPD2 EXPALIN ANALYZE shows 0.376 ms time and a good plan. Can you give details about a case with bad performance?

I updated my server specification, I think it is powerful enough for handling 3 millions records, isn't it?
@laoshanlung With default configurations postgresql will use like 10% of server power. + a bad query plan can slow down any server.

Collectives™ on Stack Overflow

Optimize SELECT query with ORDER BY, OFFSET and LIMIT of postgresql

2 Answers 2

5 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

2 Comments

Linked

Related