0

Similar to Improving performance with a common low-cardinality field

We get a large dataset and we load it by source. Let's say team_id. We currently have our data partitioned by team_id and then by the timestamps of events. However, this does mean the team_id ends up being repeated over each monthly (for e.g.) partition for that team. Is there a way to save space both in the tables and in any indexes we use that would include team_id?

5
  • How much space to you expect to save? Commented Sep 8 at 15:35
  • 1
    How many teams do you have? A smallint supports up to 32767 teams, integer up to 2147483647 teams, and bigint up to 9223372036854775807. Now you can choose between 2, 4, and 8 bytes per record. This represents roughly 2, 4, and 8 GB per one billion records. How many billion records do you have or expect? Commented Sep 8 at 16:34
  • 1
    Short answer: unless your data set is in the tens of TB or more, it's probably not worth your time to overthink this, except as a learning exercise. Commented Sep 8 at 22:07
  • The dataset is about 300-400 GB right now, we get about 80 M records per year. Commented Sep 9 at 14:56
  • 1
    80 million records per year translates to a few MB in storage per year. It's a waste of time to try to optimize storage. Commented Sep 10 at 15:57

2 Answers 2

2

For a very small, mostly constant number of teams and large cardinalities of otherwise narrow rows, it can make sense to have a separate table per team - and no team_id column at all, which mainly benefits index size (and performance). Especially if each team table is, in turn, partitioned (by time) like in your case.

I would only consider the added overhead if the bulk of your queries targets a single team. Else you create a lot of overhead when addressing multiple or all teams.

You do save some overhead by removing one level of partitioning in your case.

2

No, you have to keep the team_id in every row of every partition, even if it is the same for all rows in a partition.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.