Skip to main content
replaced http://programmers.stackexchange.com/ with https://softwareengineering.stackexchange.com/
Source Link
  • You probably don't want to go by which patterns occur most often, but by which patterns take the most time, which puts you into the realm of profile-guided optimizations. Certainly doable, but quite a hassle in practice.
  • If the pattern writes to variables used outside of the pattern or does complicated control flow like returning from the surrounding function, you either need to include the surrounding code in the pattern (and lose opportunities for applying it) or do complicated, problem-specific rewriting of the surrounding code.
  • Unless you do the second step manually, the translation to native code can probably not do much better than simply chaining together the dynamically-typed, late-bound operations that the interpreter would perform while interpreting the original code. This often gives a nice speedup, but it's rarely anywhere as drastic as going from an interpreted loop to memset.
  • It may be quite hard to choose the appropriate boundaries for the patterns automatically. Maybe you want to assume some inputs are constant, maybe you want to generalize other parts to apply the optimization more generally, etc.
  • Dynamic languages usually have quite extensive capabilities to break optimizations.
  • Oh my god don't even get me started on how hard it is to statically optimize dynamic languages.
  • Really, don't.
  • It's so hard you wouldn't believe.It's so hard you wouldn't believe.
  • At the very least you'd have to assume all values involved have one concrete type (and ideally a built-in one that can't be modified) and bail out of the pattern if you find yourself with different types. And even that can break in languages like Ruby where you can even redefine even arithmetic on numbers.
    • (Of course you could stomp your foot and declare that these things are not permitted, but then it's not the same language any more and you could get much greater wins across the board by optimizing the interpreter for this new restriction, or perhaps even make it a static compiler.)
  • You probably don't want to go by which patterns occur most often, but by which patterns take the most time, which puts you into the realm of profile-guided optimizations. Certainly doable, but quite a hassle in practice.
  • If the pattern writes to variables used outside of the pattern or does complicated control flow like returning from the surrounding function, you either need to include the surrounding code in the pattern (and lose opportunities for applying it) or do complicated, problem-specific rewriting of the surrounding code.
  • Unless you do the second step manually, the translation to native code can probably not do much better than simply chaining together the dynamically-typed, late-bound operations that the interpreter would perform while interpreting the original code. This often gives a nice speedup, but it's rarely anywhere as drastic as going from an interpreted loop to memset.
  • It may be quite hard to choose the appropriate boundaries for the patterns automatically. Maybe you want to assume some inputs are constant, maybe you want to generalize other parts to apply the optimization more generally, etc.
  • Dynamic languages usually have quite extensive capabilities to break optimizations.
  • Oh my god don't even get me started on how hard it is to statically optimize dynamic languages.
  • Really, don't.
  • It's so hard you wouldn't believe.
  • At the very least you'd have to assume all values involved have one concrete type (and ideally a built-in one that can't be modified) and bail out of the pattern if you find yourself with different types. And even that can break in languages like Ruby where you can even redefine even arithmetic on numbers.
    • (Of course you could stomp your foot and declare that these things are not permitted, but then it's not the same language any more and you could get much greater wins across the board by optimizing the interpreter for this new restriction, or perhaps even make it a static compiler.)
  • You probably don't want to go by which patterns occur most often, but by which patterns take the most time, which puts you into the realm of profile-guided optimizations. Certainly doable, but quite a hassle in practice.
  • If the pattern writes to variables used outside of the pattern or does complicated control flow like returning from the surrounding function, you either need to include the surrounding code in the pattern (and lose opportunities for applying it) or do complicated, problem-specific rewriting of the surrounding code.
  • Unless you do the second step manually, the translation to native code can probably not do much better than simply chaining together the dynamically-typed, late-bound operations that the interpreter would perform while interpreting the original code. This often gives a nice speedup, but it's rarely anywhere as drastic as going from an interpreted loop to memset.
  • It may be quite hard to choose the appropriate boundaries for the patterns automatically. Maybe you want to assume some inputs are constant, maybe you want to generalize other parts to apply the optimization more generally, etc.
  • Dynamic languages usually have quite extensive capabilities to break optimizations.
  • Oh my god don't even get me started on how hard it is to statically optimize dynamic languages.
  • Really, don't.
  • It's so hard you wouldn't believe.
  • At the very least you'd have to assume all values involved have one concrete type (and ideally a built-in one that can't be modified) and bail out of the pattern if you find yourself with different types. And even that can break in languages like Ruby where you can even redefine even arithmetic on numbers.
    • (Of course you could stomp your foot and declare that these things are not permitted, but then it's not the same language any more and you could get much greater wins across the board by optimizing the interpreter for this new restriction, or perhaps even make it a static compiler.)
Source Link
user7043
user7043

This idea is... less crazy than I'd have said at first glance. To cast it in more sober language, this would mean:

  1. Identifying common patterns in the interpreted code
  2. Writing or generating native code equivalent to those patterns
  3. Replacing instances of these patterns with calls to the native code

Minus the third step, which is often left to users of the language, this is a very popular way to make dynamic languages faster. Well, they don't grab random N-grams out of the code, they choose meaningful operations for which a separate function makes semantic sense, but still, they speed up (e.g.) Python code by writing common operations in C and calling it in their Python program.

With this prior art in mind, your concerns seem unfounded to me. It will be just like any other natively implemented function, and calling those is usually quite efficient. There is a tiny bit of overhead from the jump (but keep in mind that you have at least one jump per bytecode instruction), and some more from the icache pressure, but if the function was worth optimizing in the first place then the speedup will far outweigh those concerns.

But as always, the devil is in the details. Here are just a few:

  • You probably don't want to go by which patterns occur most often, but by which patterns take the most time, which puts you into the realm of profile-guided optimizations. Certainly doable, but quite a hassle in practice.
  • If the pattern writes to variables used outside of the pattern or does complicated control flow like returning from the surrounding function, you either need to include the surrounding code in the pattern (and lose opportunities for applying it) or do complicated, problem-specific rewriting of the surrounding code.
  • Unless you do the second step manually, the translation to native code can probably not do much better than simply chaining together the dynamically-typed, late-bound operations that the interpreter would perform while interpreting the original code. This often gives a nice speedup, but it's rarely anywhere as drastic as going from an interpreted loop to memset.
  • It may be quite hard to choose the appropriate boundaries for the patterns automatically. Maybe you want to assume some inputs are constant, maybe you want to generalize other parts to apply the optimization more generally, etc.
  • Dynamic languages usually have quite extensive capabilities to break optimizations.
  • Oh my god don't even get me started on how hard it is to statically optimize dynamic languages.
  • Really, don't.
  • It's so hard you wouldn't believe.
  • At the very least you'd have to assume all values involved have one concrete type (and ideally a built-in one that can't be modified) and bail out of the pattern if you find yourself with different types. And even that can break in languages like Ruby where you can even redefine even arithmetic on numbers.
    • (Of course you could stomp your foot and declare that these things are not permitted, but then it's not the same language any more and you could get much greater wins across the board by optimizing the interpreter for this new restriction, or perhaps even make it a static compiler.)

Maybe you can fix some of these problems by getting more inspiration by JIT compilers, specifically tracing ones. That is, run the program, let the JIT compiler identify hot code paths (including assumptions made during optimization) and statically insert these optimized traces into the program. Tracing JIT compilers already know how to handle all these things. It would be mostly an engineering challenge (making data structures and machine code suitable for a static binary).

Another variant on the same idea is to give up on automating the second and third step: Use tools to find patterns that might benefit from optimization, then optimize them manually. Less shiny, but also more likely to actually work.

Conclusion: Nice idea for a thesis, but I wouldn't bet on it working out in its current form.