Skip to main content
2 of 5
added 209 characters in body
user avatar
user avatar

What kind of compilers are we talking about, and what sort of "memory"? Because in your example, assuming a reasonable optimizer, the expression a+b needs to generally be stored in a register (a form of memory) prior to doing such arithmetic.

So if we're talking about a dumb compiler that encounters a+b twice, it's going to allocate more registers (memory) in your second example, because your first example might just store that expression once in a single register mapped to the local variable, but we're talking about very silly compilers at this point... unless you're working with another type of silly compiler that stack spills every single variable all over the place, in which case maybe the first one would cause it more grief to optimize than the second*.

I still want to scratch that and think the second one is likely to use more memory with a dumb compiler even if it's prone to stack spills, because it might end up allocating three registers for a+b and spill a and b more. If we're talking most primitive optimizer then capturing a+b to s will probably "help" it use less registers/stack spills.

This is all extremely speculative in rather silly ways absent measurements/disassembly and even in the worst-case scenarios, this is not a "memory vs. performance" case (because even among the worst optimizers I can think of, we're not talking about anything but temporary memory like stack/register), it's purely a "performance" case at best, and among any reasonable optimizer the two are equivalent.

When to optimize for memory vs performance speed for a method?

As for this question if I can tackle it more broadly, often I don't find the two diametrically opposed. Especially if your access patterns are sequential, and given the speed of the CPU cache, often a reduction in the amount of bytes processed sequentially for non-trivial inputs translates (up to a point) to plowing through that data faster. Of course there are breaking points where if the data is much, much smaller in exchange for way, way more instructions, it might be faster to process sequentially in larger form in exchange for fewer instructions.

But I've found many devs tend to underestimate how much a reduction in memory use in these types of cases can translate to proportional reductions in time spent processing. It's very humanly intuitive to translate performance costs to instructions rather than memory access to the point of reaching for big LUTs in some vain attempt to speed up some small computations, only to find performance degraded with the additional memory access.

user321630