4
\$\begingroup\$

I want to store the information returned by the dis function of the dis module in a structured way, using a dict, associating the mnemonics of each code of a line to the correspondent line number.

Note: Instruction is a subclass of _Instruction, defined in the dis module as a namedtuple.

Output:

disassembled_dict = 
{
 147: [ Instruction(opname='LOAD_FAST', opcode=124, arg=0, arg1=0, arg2=0, argval='x', argrepr='x', offset=0, starts_line=147, is_jump_target=False),
        Instruction(opname='LOAD_CONST', opcode=100, arg=1, arg1=1, arg2=0, argval=1, argrepr='1', offset=3, starts_line=None, is_jump_target=False), 
        Instruction(opname='INPLACE_ADD', opcode=55, arg=None, arg1=None, arg2=None, argval=None, argrepr='', offset=6, starts_line=None, is_jump_target=False), 
        Instruction(opname='STORE_FAST', opcode=125, arg=0, arg1=0, arg2=0, argval='x', argrepr='x', offset=7, starts_line=None, is_jump_target=False)
      ] 

 148: [ Instruction(opname='LOAD_FAST', opcode=124, arg=0, arg1=0, arg2=0, argval='x', argrepr='x', offset=10, starts_line=148, is_jump_target=False),
        Instruction(opname='LOAD_CONST', opcode=100, arg=2, arg1=2, arg2=0, argval=2, argrepr='2', offset=13, starts_line=None, is_jump_target=False), 
        Instruction(opname='BINARY_POWER', opcode=19, arg=None, arg1=None, arg2=None, argval=None, argrepr='', offset=16, starts_line=None, is_jump_target=False), 
        Instruction(opname='RETURN_VALUE', opcode=83, arg=None, arg1=None, arg2=None, argval=None, argrepr='', offset=17, starts_line=None, is_jump_target=False)
      ]
}


Code:

import dis

def preatty_dis(function):
    fcode = function.__code__
    ## call dis._get_instructions_bytes just the original dis function do
    disassembled_raw = [instruction for instruction in
                        dis._get_instructions_bytes(fcode.co_code, fcode.co_varnames,
                                                    fcode.co_names, fcode.co_consts,
                                                    fcode.co_cellvars + fcode.co_freevars,
                                                    dict(dis.findlinestarts(fcode)))]
    iter_instructions = iter(disassembled_raw)
    disassembled_dict = {}
    line_pack = []
    while True:
        try:
            if not(line_pack):
                instruction = next(iter_instructions)
                line_pack.append(instruction)
            else:
                instruction = line_pack[0]

            if(instruction.starts_line):
                instruction = next(iter_instructions)
                while(instruction.starts_line is None):
                    line_pack.append(instruction)
                    instruction = next(iter_instructions)
                else:
                    ## line_pack[0] is the first mnemonic of the code line
                    ## line_pack[0].starts_line is the number of the code line
                    disassembled_dict.update({line_pack[0].starts_line : (line_pack)})
                    line_pack = [instruction]
            else:
                disassembled_dict.update({line_pack[0].starts_line : (line_pack)})
                line_pack = []
        except StopIteration:
            ## append the last group
            print(line_pack)
            disassembled_dict.update({line_pack[0].starts_line : (line_pack)})
            line_pack = []
            break
    return disassembled_dict

The code works on Python 3.5, but I'm sure there are many chances to make it more idiomatic, cleaner and more readable...

Suggestions?

\$\endgroup\$
1
  • \$\begingroup\$ I didn't understand a point he made, maybe you can clear it for me. He suggested to define a local function for disassembled_dict.update({line_pack[0].starts_line : (line_pack)}), but I think that is atomic, I can't split it up... \$\endgroup\$ Commented Mar 3, 2016 at 20:58

2 Answers 2

1
\$\begingroup\$

This is a superfluous use of a list comprehension: [instruction for instruction in dis._get_instructions_bytes(…)]. You can just call dis._get_instructions_bytes() directly.

As indicated by the _ prefix, dis._get_instructions_bytes() is an undocumented and unsupported method. If you choose to write code that relies on undocumented behaviour, you have a duty to point it out with an obvious apologetic comment.

I'm not convinced that you need to use an undocumented function, though. dis.get_instructions() seems to work just fine.

The use of next() to iterate through disassembled_raw is awkward, and I'm sure that is what prompted you to post this question. Fundamentally, what you want to achieve is a "group by" operation, so you want to use itertools.groupby(). The only tricky thing is that you want instruction.starts_line being None to mean that that instruction should be lumped with the previous batch. For that, you could use a hack: make the key function stateful.

You seem to have a stray print() call at the end, left over from debugging.

Suggested solution

import dis
import itertools

def pretty_dis(function):
    last_seen_starts_line = None
    def stateful_starts_line(instruction):
        """Extract line number for the instruction, which is either
           instruction.starts_line if there is one, or the previously
           seen instruction.starts_line if there isn't."""
        nonlocal last_seen_starts_line
        last_seen_starts_line = instruction.starts_line or last_seen_starts_line
        return last_seen_starts_line

    return {
        line_num: list(instructions)
        for line_num, instructions in itertools.groupby(
            dis.get_instructions(function), key=stateful_starts_line
        )
    }
\$\endgroup\$
1
  • \$\begingroup\$ I've chosen dis.get_instructions() as you suggested. \$\endgroup\$ Commented Mar 3, 2016 at 21:52
1
\$\begingroup\$
  • It should probably spell pretty_dis.
  • Instead of breaking you can put the return disassembled_dict right there since that's the only thing you do after the loop anyway.
  • A few of the parenthesis aren't needed, i.e. after while, not and around line_pack.
  • The expression disassembled_dict.update({line_pack[0].starts_line : (line_pack)}) appears three times - time to put that into a (local) function. You could also fit in the line_pack reset with some thought.

Otherwise looks good IMO.

\$\endgroup\$
2
  • \$\begingroup\$ update is a built-in method of the dict type, right? \$\endgroup\$ Commented Mar 3, 2016 at 20:55
  • \$\begingroup\$ Yes, I meant the whole expression is repeated three times, so it should be extracted into a separate function. \$\endgroup\$ Commented Mar 4, 2016 at 11:03

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.