8

I'm investigating some bytecode optimizations in Python 3.12 related to loop performance and local variable lookups (LOAD_FAST vs. LOAD_DEREF). I noticed a bizarre performance discrepancy when running dynamically evaluated code inside a local function scope versus a standard loop.

Consider these two setups. Both are trying to execute a tight loop where a local variable is updated, but Setup B uses exec() to dynamically run the inner logic within the same local context. On Python 3.12.2, Setup B runs roughly 3x to 4x slower than Setup A.

import timeit

# Setup A: Standard local loop
def test_standard():
    x = 0
    for _ in range(10_000_000):
        x += 1
    return x

# Setup B: Executing loop logic dynamically 
def test_dynamic():
    x = 0
    local_vars = {'x': x}
    # Running the exact same loop structure inside exec
    exec("""
for _ in range(10_000_000):
    x += 1
""", globals(), local_vars)
    return local_vars['x']

print("Standard:", timeit.timeit(test_standard, number=1))
print("Dynamic (exec):", timeit.timeit(test_dynamic, number=1))

I ran dis.dis on both to see what the compiler is doing under the hood. For test_standard, Python completely optimizes the loop using LOAD_FAST and STORE_FAST opcodes because it maps x strictly to the local namespace array:

 6           22 LOAD_FAST                0 (x)
             24 LOAD_CONST               2 (1)
             26 BINARY_OP                0 (+)
             30 STORE_FAST               0 (x)

However, when inspecting the code object generated inside exec(), even though it is passed a dedicated local_vars dictionary, it defaults to using LOAD_NAME and STORE_NAME.

Questions:

  1. Since Python 3.11+ introduced the Specializing Adaptive Interpreter, why doesn't the adaptive interpreter optimize LOAD_NAME to a localized fast-path inside an exec() code block once it detects the type/dictionary structure isn't changing?

  2. Is there a strict architectural reason why exec() code objects are fundamentally barred from utilizing LOAD_FAST optimization mechanics, even when provided an explicit, isolated locals dictionary?

New contributor
ttboy is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
1
  • You certainly can have LOAD_FAST used inside exec code: just be sure to create a full function declaration inside the exec body, and call it there. (of course, just local variables to the function can use LOAD_FAST^) - I suppose the accepted answer bellow goes through it in more detail. Commented 8 hours ago

1 Answer 1

8

Why LOAD_FAST is not used

In answer to question 2, root-level exec-ed code is not able to use LOAD_FAST and STORE_FAST instructions. This is because it is executed as if it were a module object. This means it always uses the LOAD_NAME and STORE_NAME instructions. LOAD_FAST and STORE_FAST cannot be used as adaptive instructions for *_NAME instructions as they work on different data structures. *_FAST works on arrays and *_NAME works on mappings. If you wanted to do this specialisation you need to change the backing data store from map to an array. You would also need to alter any functions that use the namespace of the module for its globals. Otherwise they would try to access the array as if it were map. At best cause a crash or at worst carry on with a silent bug. However, since the module might not have a handle to these functions any more, this is impossible. So the code must stick to using the *_NAME instructions.

The speedup you are looking for is also trivially available by wrapping your code in a function. When you do this the interpreter will automatically uses the *_FAST instructions, without requiring complex logic to adapt the *_NAME instructions. For example:

def test_dynamic_fast():
    locals_ = {}
    exec("""
def f():  # <-- code wrapped by function f
    x = 0
    for _ in range(10_000_000):
        x += 1
    return x
""", locals=locals_)
    # extract function from namespace and execute it
    return locals_['f']()

Why you are not seeing adaptive instructions

In answer to question 1, adaptive instructions are specialised versions of a given instruction, they cannot change they type of instruction. For example, LOAD_GLOBAL can be specialised to LOAD_GLOBAL_BUILTIN, but not to LOAD_FAST (I use LOAD_GLOBAL here as there are no adaptive instructions for LOAD_NAME). This is because LOAD_GLOBAL_BUILTIN works on a subset of the inputs that LOAD_GLOBAL works on (ie. it skips over the module's globals and goes straight for the builtins module). However, LOAD_FAST works on a different set of inputs (namely, it works on arrays rather than mappings).

If you want to see what adaptive instructions are being used you need to do two things. First you need a long-lived code object. Instructions cannot be specialised until they have been run at least a few times first. When you exec a string, exec compiles the string to a code object, executes the code object, and then discards the code object. So you cannot inspect the code object for what optimisations were made. Second, you need to ask dis to give you the adaptive instructions. By default, it gives the standard, non-specialised instructions. Example:

x = 0
src = 'x += 1'
# create a long lived code object, so we can see what adaptions are made
code = compile(src, '<string>', mode='exec')

# code not yet run, so no adaptions have been made
dis.dis(code, adaptive=True)
#  0           RESUME                   0         <-- standard
#
#  1           LOAD_NAME                0 (x)
#              LOAD_SMALL_INT           1
#              BINARY_OP               13 (+=)    <-- standard
#              STORE_NAME               0 (x)
#              LOAD_CONST               1 (None)  <-- standard
#              RETURN_VALUE

# run code a few times
for _ in range(10):
    exec(code)
assert x == 10  # check code really did run ten times

# see what adaptions have been made
dis.dis(code, adaptive=True)
#  0           RESUME_CHECK             0         <-- adaptive
#
#  1           LOAD_NAME                0 (x)
#              LOAD_SMALL_INT           1
#              BINARY_OP_ADD_INT       13 (+=)    <-- adaptive
#              STORE_NAME               0 (x)
#              LOAD_CONST_IMMORTAL      1 (None)  <-- adaptive
#              RETURN_VALUE

In the example, you can see that x += 1 (BINARY_OP) has been specialised as BINARY_OP_ADD_INT as that is all it ever does -- add integers together.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.