I'm investigating some bytecode optimizations in Python 3.12 related to loop performance and local variable lookups (LOAD_FAST vs. LOAD_DEREF). I noticed a bizarre performance discrepancy when running dynamically evaluated code inside a local function scope versus a standard loop.
Consider these two setups. Both are trying to execute a tight loop where a local variable is updated, but Setup B uses exec() to dynamically run the inner logic within the same local context. On Python 3.12.2, Setup B runs roughly 3x to 4x slower than Setup A.
import timeit
# Setup A: Standard local loop
def test_standard():
x = 0
for _ in range(10_000_000):
x += 1
return x
# Setup B: Executing loop logic dynamically
def test_dynamic():
x = 0
local_vars = {'x': x}
# Running the exact same loop structure inside exec
exec("""
for _ in range(10_000_000):
x += 1
""", globals(), local_vars)
return local_vars['x']
print("Standard:", timeit.timeit(test_standard, number=1))
print("Dynamic (exec):", timeit.timeit(test_dynamic, number=1))
I ran dis.dis on both to see what the compiler is doing under the hood. For test_standard, Python completely optimizes the loop using LOAD_FAST and STORE_FAST opcodes because it maps x strictly to the local namespace array:
6 22 LOAD_FAST 0 (x)
24 LOAD_CONST 2 (1)
26 BINARY_OP 0 (+)
30 STORE_FAST 0 (x)
However, when inspecting the code object generated inside exec(), even though it is passed a dedicated local_vars dictionary, it defaults to using LOAD_NAME and STORE_NAME.
Questions:
Since Python 3.11+ introduced the Specializing Adaptive Interpreter, why doesn't the adaptive interpreter optimize
LOAD_NAMEto a localized fast-path inside anexec()code block once it detects the type/dictionary structure isn't changing?Is there a strict architectural reason why
exec()code objects are fundamentally barred from utilizingLOAD_FASToptimization mechanics, even when provided an explicit, isolated locals dictionary?