Timeline for Database vs Flat files for rarely accessed data

Current License: CC BY-SA 3.0

15 events

when toggle format	what		by	license	comment
Nov 4, 2020 at 14:52	comment	added	Stack Exchange Broke The Law		Sanity check: you have an index on `data_timestamp`, right?
Nov 21, 2017 at 23:28	comment	added	Frank Hileman		Having written similar tools in the past, I would never use a database to store such data. Files are the way to go.
Nov 21, 2017 at 22:37	answer	added	Vince		timeline score: 1
Nov 21, 2017 at 10:28	history	tweeted			twitter.com/StackSoftEng/status/932918538318336001
Nov 20, 2017 at 18:55	comment	added	kdgregory		MySQL provides partitioning for just this type of use-case. While there are definitely cases where it makes sense to split the data into different repositories, I suspect this isn't one of them.
Nov 19, 2017 at 15:17	comment	added	Doc Brown		Before guessing around what will happen when you get 2 billion rows, I would recommend to make a test, generate that much data artificially and profile this. I would not be astonished when the test shows no noteable performance degradation for those queries. However, you should also have backup/recovery times in mind, those will definitely increase linearly with the database size.
Nov 19, 2017 at 14:45	answer	added	Ralf Kleberhoff		timeline score: 4
Nov 19, 2017 at 13:59	answer	added	Christophe		timeline score: 8
Nov 19, 2017 at 11:11	comment	added	Patrick		Have you tried using a timeseries-database for your timeseries-data? e.g. blog.timescale.com/timescaledb-vs-6a696248104e
Nov 19, 2017 at 9:34	answer	added	Martin Maat		timeline score: 1
Nov 19, 2017 at 9:31	comment	added	GrandmasterB		Database size shouldn't impact your query performance if the tables are indexed properly.
Nov 19, 2017 at 9:09	answer	added	Martin Maat		timeline score: 2
Nov 19, 2017 at 7:07	comment	added	Phil Helix		I wouldn't store it all in the same database let alone the same table. Could you setup two database catalogues with identical structures for the querying of current data and another for historical data? You could have a job that moves current data into historical data after 30 days or so. This should improve query performance for your most queried data.
Nov 19, 2017 at 5:13	review	First posts
Nov 20, 2017 at 9:24
Nov 19, 2017 at 5:11	history	asked	Ananth	CC BY-SA 3.0