4

This is my table schema

Column       |          Type          |                      Modifiers                      
-------------+------------------------+------------------------------------------------------
id           | integer                | not null default nextval('message_id_seq'::regclass)
date_created | bigint                 |
content      | text                   |
user_name    | character varying(128) |
user_id      | character varying(128) |
user_type    | character varying(8)   |
user_ip      | character varying(128) |
user_avatar  | character varying(128) |
chatbox_id   | integer                | not null
Indexes:
    "message_pkey" PRIMARY KEY, btree (id)
    "idx_message_chatbox_id" btree (chatbox_id)
    "indx_date_created" btree (date_created)
Foreign-key constraints:
    "message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE

This is the query

SELECT * 
FROM message 
WHERE chatbox_id=$1 
ORDER BY date_created 
OFFSET 0 
LIMIT 20;

($1 will be replaced by the actual ID)

It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records

I am using Postresql Server 9.1.5 with default options.


Update the output of EXPLAIN ANALYZE

Limit  (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
->  Index Scan Backward using indx_date_created on message  (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)

Update server specification

Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB

Final solution

Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested

10
  • Can you add EXPLAIN ANALYZE of the query? Commented Nov 24, 2012 at 16:26
  • 2
    I am using Postresql Server 9.1.5 with default options. You could start by setting estimated_cache_size to 3/4 of your available RAM, and setting work_mem to ~10M. And maybe set random_page_cost to 1.5. and run VACUUM ANALYZE on all relevant tables. Commented Nov 24, 2012 at 16:30
  • Have you run an ANALYZE on the table recently? Commented Nov 24, 2012 at 17:12
  • I have updated the ouput of EXPLAIN ANALYZE Commented Nov 24, 2012 at 17:38
  • @laoshanlung: The EXPLAIN output seems to be OK. So I guess it is the good case, not the bad case. Of course the bad case is the interesting thing. Commented Nov 24, 2012 at 18:36

2 Answers 2

8

You could try to give PostgreSQL a better index for that query. I propose something like this:

create index invent_suitable_name on message(chatbox_id, date_created);

or

 create index invent_suitable_name on message(chatbox_id, date_created desc);
Sign up to request clarification or add additional context in comments.

5 Comments

You do not need specify desc for index fields for postgres. Postgres can scan indexes in both directions.
@IgorRomanchenko While true, creating the index sorted in the correct direction will make the index scan considerably faster. Backward index scans are quite a bit slower.
I posted my table schema, I have already had index for chatbox_id and date_created
@CraigRinger Thank you, Didn't know about that (as always not everything is in the manual).
@laoshanlung: Currently you have two indizes each one containing one field. My proposal creates one index covering both fields. From a performance view this is very different.
3

Try adding an index for chatbox_id, date_created. For this particular query it will give you maximum performance.

For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).

UPD My guess for the reason of bad performance:

At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:

  1. Index scan on chatbox_id;
  2. Ordering of returned records to get top 20.

Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.

UPD2 EXPALIN ANALYZE shows 0.376 ms time and a good plan. Can you give details about a case with bad performance?

2 Comments

I updated my server specification, I think it is powerful enough for handling 3 millions records, isn't it?
@laoshanlung With default configurations postgresql will use like 10% of server power. + a bad query plan can slow down any server.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.