-1

Imagine the python bytecode BINARY_SUBSRC, which takes two arguments, and does a subscript operation on arg1 indexed by arg2. However, during the runtime, is there a way to see what the values of arg1 and arg2 would be, perhaps by reading those values off the evaluation stack ? I know about the modules like inspect, dis etc. Although they allow me to access and read the values for things like passed arguments, local variables etc. I need a more fine grained control.

I have tried backtracking the previous bytecode instructions and to guess what the values for arg1 and arg2 would be but it gets messy real quick. Instead, it would be great if I could access the top of evaluation stack to see the return value of previous bytecode instruction, which would be the argument to the next bytecode instruction etc.

8
  • 1
    Are you talking about modifying the bytecode instructions? Just before the BINARY_SUBSCR, you could insert a DUP_TOP_TWO to copy the parameters, then a function call to do something with those values, and finally a POP_TOP to discard the function's return value. Commented May 31, 2022 at 12:52
  • 1
    binary_subscr was just to point out a bytecode op that takes two args. If the user has a list subscript code like List[a + b + c], i want to get the top of stack value so I can see what a + b + c evaluated to. Commented May 31, 2022 at 12:57
  • You didn't answer my question - are you wanting to do this via bytecode modification? The same basic idea (duplicate values, call function, pop result) would apply to any opcode you want to instrument. Commented May 31, 2022 at 13:05
  • If you have any suggestions other than bytecode modification, that would probably be my go to, but bytecode manipulation is fine if there appears to be no cleaner way. Commented May 31, 2022 at 13:10
  • 1
    This question is a duplicate of https://stackoverflow.com/questions/57142762/debug-the-cpython-opcode-stack. Commented Jun 6, 2022 at 7:12

1 Answer 1

1

trepanxpy is a python bytecode debugger. Just like pdb it can step trough the code. With the difference being that trepanxpy steps between bytecode instructions and can view the stack. trepanxpy needs x-python to work, x-python is a bytecode interpreter written in Python. Both can be installed with pip.

# main.py
def foo(a, b, c):
    return a[b + c]


# simulate off by 1 error 1 + 2 > len(a) - 1
print(foo([10, 11, 12], 1, 2))

Below i use trepanxpy to debug the code. I have added some comments and removed some output for readability and privacy.

PS > trepan-xpy main.py
Running x-python main.py with ()
(main.py:2): <module>
-> 2 def foo(a, b, c):
(trepan-xpy) stepi
  <# ~15 step instructions ommited to the relevant bytecode #>
(trepan-xpy) stepi
(main.py:3 @6): foo
.. 3     return a[b + c]
       @  6: BINARY_ADD (1, 2)
(trepan-xpy) info stack
 0: <class 'int'> 2
 1: <class 'int'> 1
 2: <class 'list'> [10, 11, 12]
(trepan-xpy) stepi
(main.py:3 @8): foo
.. 3     return a[b + c]
       @  8: BINARY_SUBSCR ([10, 11, 12], 3)
(trepan-xpy) info stack
  <# note that 3 is the index and the list has only 3 elements #>
 0: <class 'int'> 3
 1: <class 'list'> [10, 11, 12]
(trepan-xpy) __import__("dis").dis(foo)
  <# disassembled bytecode of foo #>
  3           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 LOAD_FAST                2 (c)
              6 BINARY_ADD
              8 BINARY_SUBSCR
             10 RETURN_VALUE
(trepan-xpy) stepi
  <# errors ommitted #>
IndexErrorlist index out of range
  <# stack trace ommitted #>
trepan-xpy: That's all, folks...
  <# note that the debugger shuts down after the IndexError was raised #>

As you can see it is possible to step trough bytecode. But most of the time you can achieve a similar results easier with pdb. for example:

PS > python -m pdb main.py
> main.py(2)<module>()
-> def foo(a, b, c):
(Pdb) continue
 <# traceback ommitted #>
IndexError: list index out of range
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> main.py(3)foo()
-> return a[b + c]
(Pdb) p dir()
['a', 'b', 'c']
(Pdb) p (a, b, c)
([10, 11, 12], 1, 2)
(Pdb) p b + c
3
(Pdb) p len(a)
3
(Pdb) exit()
Post mortem debugger finished. The main.py will be restarted
> main.py(2)<module>()
-> def foo(a, b, c):
(Pdb) q

So to answer your question

You can use a bytecode debugger to get access to Python bytecode arguments in runtime.

Sadly trepanxpy does not have a lot of features that make pdb usable. Debugging non trivial pieces of code might be unfeasable. I am missing features such as breakpoints and being able to inspect anything after an exception is trown.

I would recommend using pdb over trepanxpy most of the time. But it can be usefull to use trepanxpy to step trough a very minimal reproducable piece of code.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the example but using a different interpreter is not feasible for this project. I am looking for a programmatic approach rather than using a debugger. As far as i know, pdb is not super user friendly to use programmatically, especially at bytecode level tracing. Plus, using pdb kind of clutters the stack trace which is not desirable here.
Could you add some details on what you are trying to achieve then? Because from your question it seemed like you have used static analisis to find out the values of bytecode, and you are looking to dynamicly find the values of bytecode now. A debugger would help to find the values dynamicly. But now it seems like you want to access them programmaticly? Do you want to change the input of bytecode at runtime?
I need a way to dynamically access and see what arguments an opcode will be evaluated with during runtime. I haven't given pdb a try, so I am not sure if it can be called dynamically during runtime, but I do know that integrating a debugger will clutter the call stack, which I am using to keep track of a few things related to function calls. I believe my only solution would be to do something like @jasonharper suggested in his original comment, but I wanted to keep the question open for him or someone else to possibly offer a "cleaner" way.
In addition, please correct me if I am wrong but I think pdb steps over the code line by line, which would skip multiple bytecode instructions on each advancement.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.