I have data that correspond to 400 millions of rows in a table and it will certainly keep increasing, I would like to know what can I do to have such a table in PostgreSQL in a way that it would still be posible to make complex queries using it. In other words what should I do to have all the data in the most performative way?
-
Depend on what is a complex query for you. For example you can use Inheritance and partition the data by day.Juan Carlos Oropeza– Juan Carlos Oropeza2017-03-28 17:56:34 +00:00Commented Mar 28, 2017 at 17:56
-
Any kind of query, since a query using multiple joins and regex to a query using simple filters and aggregates, a friend of mine have suggested to use partitioning, but I don't know if it fits my case where I receive 2 million rows per day. Because if I divide it per month it would still be big (about 60 million rows), and by day I would have a huge amount of tables.Marcus Vinícius– Marcus Vinícius2017-03-28 18:00:46 +00:00Commented Mar 28, 2017 at 18:00
-
Again depend on your requirement. For example I have 4 millions for day and just do my calculation and delete the old data. Then only query over consolidated data, not the raw data.Juan Carlos Oropeza– Juan Carlos Oropeza2017-03-28 18:02:17 +00:00Commented Mar 28, 2017 at 18:02
-
For example, if I have a table regarding e-mails sent by me and I would like to cross customers information to know if a specific customer has received an specific e-mail. With 80 million records this query is already painful slow.Marcus Vinícius– Marcus Vinícius2017-03-28 18:08:34 +00:00Commented Mar 28, 2017 at 18:08
-
I will vote to close because your question is too vague. Please read How-to-Ask And here is a great place to START to learn how improve your question quality and get better answers.Juan Carlos Oropeza– Juan Carlos Oropeza2017-03-28 18:11:00 +00:00Commented Mar 28, 2017 at 18:11
1 Answer
Try to find a way to split your data into partitons (e.g. by day/month/week/year).
In Postgres, it is implemented using inheritance.
This way, if your queries are able to just use certain partitions, you'll have to handle less data at a time (e.g. read less data from disk).
You'll have to design your tables/indexes/partitions together with your queries - their struture will depend on how you want to use them.
Also, you could have overnight jobs preparing materialised views based on historical data. This way you don't have to delete you old data and you can deal with an aggregated view and most recent data only.