3

I am designing a program and I am looking to decide over a Dictionary vs a List. Since I know I have unique values per item I imagined a Dictionary/List that looks like:

Dictionary<(int k1, string k2, int k3 ), myDataClassForDic> myDict = ...;
return myDict[(target_key1, target_key2, target_key3)]

// vs.

List<myDataClassForList> myList;
foreach (v in myList)
{
  if ((v.k1 == target_key1)  && (v.k2 == target_key2) && (v.k3 == target_key3))
  {
    return v;
  }
}

My Dictionary/List size range is 1~100, but more commonly on the lower end < 20. I have a decent amount of retrievals based on the dic keys (sometimes not ALL keys so I have loop through the dictionary sometimes to get all items). And I am also constantly adding and removing items (no replacements) from the dic/List staying within whatever its max size is for the situation (FIFO). With always a unique combination of k1, k2, k3.

I was told that Dictionary Adding/Removing has a costly overhead for the hashmap, more so than a List and whether I have considered that. I also did a bit of googling and Copilot asking and found that a Dic is typically O(1) and therefore better than a List, but I also have some incomplete lookups that become O(n) as I have to find my target(s). But also there isa hashmap use cost as 3 values must be mashed together to get my key and then value.

Lists are typically much lighter, faster to loop through. But otherwise I lose on O(1) on direct key lookups, and removals from the head (I have no adds to anywhere but the end).

Copilot says the the memory overhead cost is 50%~100% more than a List and CPU overhead cost is 10%~20% more, but offset by O(1) complexity. 50~100% sounds like ALOT, which tilts me to List.

My usecase also means I probably will have lots of these Lists/Dicts of smaller sizes than I do bigger sizes. As I am tracking something that moves through various processes and when it finishes I output a log and delete the tracked data.

If I have loads of smaller sizes and only some larger sizes will the List be better for BOTH memory and Speed/CPU costs? Or is the extra memory cost neglible for the Dict with the O(1) benefit as well as the ability to much more easily handle sizes of up to 100 items?

7
  • 21
    Do not optimize small data structures until you have established, by measuring, that (a) you have a performance problem and (b) it is caused by your underlying data structure. Commented Oct 7, 2024 at 7:01
  • @Kilian Foth in that case should I start with the more simpler List then? Or should I start with the Dict and then change to List if problems start. Or it doesnt matter what i start with? Commented Oct 7, 2024 at 7:23
  • 4
    I'd probably start with the list and revisit when it becomes necessary. Remember that order-of-magnitude expressions are accurate only in the limit. As long as you have small absolute values, the hashing overhead in a dict often outweighs the actual lookup, so that O(1) may not be as good as it seems. Commented Oct 7, 2024 at 7:34
  • 2
    If you think this will be critical, define your own class that is either derived from or a proxy object for a List/Dictionary, then you don't have to change quite so much code when you decide to change it. But I'll repeat @KilianFoth 's statement: do not micro optimize without benchmarking the whole program. Commented Oct 7, 2024 at 8:52
  • 2
    Literally every data structure weights more than corresponding list/array. Why does this matter? The only question you should ask is: is that overhead big enough for me to care about? And in this particular case the answer is: of course not. With 100 items, we are talking about wasting few kb at most. Come on. Fast lookup is almost always more important. Commented Oct 7, 2024 at 10:23

5 Answers 5

3

Unfortunately, you can't use big-O analysis to make guesses about real-world performance for a given amount of operations.

Big-O does not tell you what operations are fastest for a given n. It just tells you there is some threshold n above which O(1) will be faster than O(n) - but this could be 1 element, 5 elements or a hundred elements. Typically this threshold is not documented anywhere.

The only way to get a reasonable idea about performance for a given amount of data, is to measure.

Big-O is only useful for reasoning about the effect of growth in input. If an operation is O(n) you know that it will at at worst be twice as slow if the data size doubles, while an O(1) operation will not take any longer regardless of growth in input size. This is useful information if you are writing generic tools and algorithms which should scale reasonably with input size. But for an application where you already have a reasonable idea about the input size (as you seem to have) this is not the relevant concern. The relevant concern is if performance is acceptable for the expected amount of data.

Copilot says the the memory overhead [for a Dictionary] cost is 50%~100% more than a List and CPU overhead cost is 10%~20% more, but offset by O(1) complexity. 50~100% sounds like ALOT, which tilts me to List.

Copilot is correct that a dictionary trades memory for performance, but if you are working with reference types, the overhead is only in space for pointers not for the actual objects. The size of a 100 pointers is completely insignificant on a modern computer. It is less than 1Kb. A 100% overhead to an insignificant amount is still an insignificant amount, unless you are working in an extremely memory-constrained environment. But then you probably shouldn't use .net in the first place.

Furthermore, it seems you assume the lookup and add/remove of elements in collections are the performance bottleneck in your application. But presumably your application does a lot more than that? for example, if the application reads from a file, the cost might easily dwarf operations on in-memory collections.

Overall, it seems you are worrying about factors which might be insignificant for the performance of your application. It is seldom obvious where the bottleneck will be in a complex application, which is why the general advice is to measure before you optimize.

General advice for optimization:

  • Unless you know you have performance problems, focus on writing the simplest and most maintainable code. In your example, it is clearly the dictionary which is simplest, since you avoid a loop and a conditional.
  • When you know you have a performance problem, use profiling to pinpoint precisely where the bottleneck is. It might be different from where you expect. If you optimize something which isn't a bottleneck it might be wasted effort, or worse, you might make the code less maintainable for no benefit.
5

I debated posting this answer because Ewan has good advice, but here goes anyways.

You are over-thinking this.

This is such a small data set that you just need to go with whichever structure meets your needs at a functional level and is easiest to implement. At this point, you could choose a list when a dictionary would have been more appropriate and I bet it makes zero difference in the grand scheme of things. If you still cannot decide, literally flip a coin and go with it.

You have far more interesting problems to solve than this micro optimization.

1
  • For example, MacOS / iOS will always store small dictionaries in a plain array, and lookup is done by scanning the list of keys.Saves lots of space because no hash table is needed, saves usually time because no hashing is needed, and the array is just as large as required, not like hash tables which usually have unused entries. Commented Oct 8, 2024 at 16:23
5

At small sizes of data structures, readability and maintainability of your code is often more important than a minuscule improvement in runtime performance or storage space.

Performance or storage optimization should (in my opinion) only be done when

  • There is a clear problem regarding time or size needed for the data structure.
  • The alternative choice provides a clear improvement.
  • Readability and maintainability isn't worsened substantially.
2
  • As far as I can see, there could be lots of these small arrays around upto 400? of varying sizes constantly being added to and removed to. So scalability is sort of important. But a single data structure will be about 1~100 items. Earlier in the comments someone said that the overhead for small size dicts far outweigh its benefits, so it seems as though the list is the way to go, even if I have to loop through it each time to find the item I want, and also need to remove from the head constantly? (as this is FIFO) Commented Oct 9, 2024 at 0:00
  • 2
    As your update pattern for these structures is (mostly or exclusively?) FIFO, a dictionary would apparently be the wrong data structure. If you exclusively add at one end and remove at the other end, a DeQue or ring buffer would be the right data structure. These are often provided in libraries for your language. Their random access behavior is O(n) if you have no knowledge about the keys, but for example if your keys have a monotonically increasing component (such as a timestamp) you might be able to improve access time by using binary search or something similar. Commented Oct 9, 2024 at 5:35
4

If you really care about optimising then you have to measure.

BUT! if you have cases where you do multiple lookups from a list, and its noticeably slow, using a dictionary instead is a common, obvious and EASY TO TRY fix.

If you haven't tried because you also do have to optimise memory and worry about missed keys etc then you should already realise that there isn't a one size fits all solution.

1

I'd say go with whatever semantically matches your domain the best. Unique values, random/directed access? Sounds like a Dictionary to me.

Using a list would silently allow illegal duplicates unless you explicitly add code to check. Why would you?

All the other comments about optimisation also stand.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.