0

I have data that correspond to 400 millions of rows in a table and it will certainly keep increasing, I would like to know what can I do to have such a table in PostgreSQL in a way that it would still be posible to make complex queries using it. In other words what should I do to have all the data in the most performative way?

9
  • Depend on what is a complex query for you. For example you can use Inheritance and partition the data by day. Commented Mar 28, 2017 at 17:56
  • Any kind of query, since a query using multiple joins and regex to a query using simple filters and aggregates, a friend of mine have suggested to use partitioning, but I don't know if it fits my case where I receive 2 million rows per day. Because if I divide it per month it would still be big (about 60 million rows), and by day I would have a huge amount of tables. Commented Mar 28, 2017 at 18:00
  • Again depend on your requirement. For example I have 4 millions for day and just do my calculation and delete the old data. Then only query over consolidated data, not the raw data. Commented Mar 28, 2017 at 18:02
  • For example, if I have a table regarding e-mails sent by me and I would like to cross customers information to know if a specific customer has received an specific e-mail. With 80 million records this query is already painful slow. Commented Mar 28, 2017 at 18:08
  • I will vote to close because your question is too vague. Please read How-to-Ask And here is a great place to START to learn how improve your question quality and get better answers. Commented Mar 28, 2017 at 18:11

1 Answer 1

1

Try to find a way to split your data into partitons (e.g. by day/month/week/year).

In Postgres, it is implemented using inheritance.

This way, if your queries are able to just use certain partitions, you'll have to handle less data at a time (e.g. read less data from disk).

You'll have to design your tables/indexes/partitions together with your queries - their struture will depend on how you want to use them.

Also, you could have overnight jobs preparing materialised views based on historical data. This way you don't have to delete you old data and you can deal with an aggregated view and most recent data only.

Sign up to request clarification or add additional context in comments.

2 Comments

This seems to be my only allternative. I'm just afraid that with time it would become slow, because right now we have over 60 millions of rows per month and it's still increasing. I will have to create child tables every month right? Sorry, I'm still learning about all this process.
@MarcusVinícius Yes, if you go for partitioning by month, then you'll need a separate process/script that will keep creating new partitions. If you can aggregate your data overnight in a materialised view - then it shouldn't become slower when you have more data.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.