Description
Bug report
Code handed to the optimizer may not include instrumentation. If instrumentation is added later, the executor does not see it.
We remove all ENTER_EXECUTORS
when instrumenting, but that doesn't fix the problem of executors that are still running.
Example
A loop that calls foo
:
while large_number > 0:
foo()
large_number -= 1
The loop gets turned into an executor, and sometime before large_number
reaches zero, a debugger gets attached and in some callee of foo
turns on monitoring of calls. We would expect all subsequent calls to foo()
to be monitored, but they will not be, as the executor knows nothing about the call to foo()
being instrumented.
Let's not rely on the executor/optimizer to handle this.
We could add a complex de-optimization strategy, so that executors are invalidated when instrumentation occurs, but that is the sort of thing we want to be doing for maximum performance, not for correctness.
It is much safer, and ultimately no slower (once we implement fancy de-optimization) to add extra checks for instrumentation, so that unoptimized traces are correct.
The solution
The solution is quite simple, add a check for instrumentation after every call.
We can make this less slow (and less simple) by combining the eval-breaker check and instrumentation check into one. This makes the check slightly more complex, but should speed up RESUME
by less than it slows down every call as every call already has an eval-breaker check.
Combining the two checks reducing the number of bits for versions from 64 to 24 (we need the other bits for eval-breaker, gc, async exceptions, etc). A 64 bit number never overflows (computers don't last long enough), but 24 bit numbers do.
This will have three main effects, beyond the hopefully very small performance impact for all code:
- It removes the need for a complete call stack scan of all threads when instrumenting.
- Tools that set or change monitoring many times will see significant performance improvements as we no longer need to traverse all stacks to re-instrument the whole call stack whenever monitoring changes
- Tools that set or change monitoring many millions of times will break, as we run out of versions. It is likely that these tools were already broken or had performance so bad as to be totally unusable.
Making the check explicit in the bytecode.
We can simplify all the call instructions by removing the CHECK_EVAL_BREAKER
check at the end and adding an explicit RESUME
instruction after every CALL
Although this has the disadvantage of making the bytecode larger and adding dispatch overhead, it does have the following advantages:
- Allows tier 1 to optimize the
RESUME
toRESUME_CHECK
which might be cancel out the additional dispatch overhead. - Makes the check explicit, making it feasible for tier 2 optimizations to remove it.
Linked PRs
- GH-109369: Merge all eval-breaker flags and monitoring version into one word. #109846
- GH-109369: Add machinery for deoptimizing tier2 executors, both individually and globally. #110358
- GH-109369: Add machinery for deoptimizing tier2 executors, both individually and globally. #110384
- GH-109369 Add vectorcall to
PyLong_Type
#111642 - GH-109369: Exit tier 2 if executor is invalid #111657
Metadata
Metadata
Assignees
Projects
Status