The Wayback Machine - https://web.archive.org/web/20250501213511/https://github.com/github/codeql/pull/12964
Skip to content

Ruby: Remove canonical return nodes #12964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 8, 2023

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Apr 28, 2023

This PR is about porting the join-order improvement from #12979 to the type tracking library shared between Ruby and Python. The motivation for Ruby is to be able to get rid of canonical return nodes, which were only introduced to avoid bad joins during type tracking (and dually, to have steps from parameter nodes to their corresponding SSA node be simple local flow steps instead of call steps).

The join-order from #12979 is implemented in the first commit, and it is about getting exactly this join-order in the various step relations:

TypeTracker step(TypeTracker t, LocalSourceNode nodeFrom, LocalSourceNode nodeTo) {
    exists(StepSummary summary |
      step(nodeFrom, _, summary) and
      result = t.append(summary) and
      step(nodeFrom, nodeTo, summary) // important to only perform this join after `nodeFrom` and `summary` have been restricted above
    )
  }

The above, however, does not work when type tracking is used in mutual recursion with the call graph construction, because step becomes a recursive call (or, actually the subset of step that is stepCall becomes recursive), which means that we will have a conjunct with 3 recursive calls (the two stepCall calls and the recursive type tracking call itself). The solution is to split this non-linear recursion into two non-linear predicates: one that first joins with the projected stepCall relation (e.g. 13ada1e#diff-26e3a30a43f4e4f576971a851f3a70fd7c796624c81c5387d6cf6b2db3ed58edR506-R507), followed by a predicate that joins with the full stepCall relation (e.g. 13ada1e#diff-26e3a30a43f4e4f576971a851f3a70fd7c796624c81c5387d6cf6b2db3ed58edR493).

The second commit does the above for Ruby, and additionally (1) gets rid of canonical return nodes (SynthReturnNode), (2) no longer treats SSA definitions for parameters as local source nodes, (3) treats the step from parameter to SSA definition as a normal local flow step in type tracking, as in normal data flow, and (4) exposes SelfParameterNode publicly, for use in queries/libraries.

The third commit applies the same call graph join-order improvements to Python, as mentioned above.

CC: @aschackmull

@github-actions github-actions bot added the Ruby label Apr 28, 2023
@hvitved hvitved force-pushed the ruby/remove-synth-returns branch from 5a3b101 to 7dc4956 Compare April 28, 2023 09:29
@hvitved hvitved force-pushed the ruby/remove-synth-returns branch from 7dc4956 to 1696d7e Compare April 28, 2023 11:17
@hvitved hvitved changed the title Ruby: Remove synthetic return nodes Ruby: Remove canonical return nodes May 1, 2023
@hvitved hvitved force-pushed the ruby/remove-synth-returns branch 4 times, most recently from a7a4417 to 37efc63 Compare May 4, 2023 11:29
@hvitved hvitved force-pushed the ruby/remove-synth-returns branch from 37efc63 to fbc34d2 Compare May 17, 2023 09:14
@hvitved hvitved force-pushed the ruby/remove-synth-returns branch from fbc34d2 to 1788c54 Compare May 24, 2023 09:13
@hvitved hvitved added the no-change-note-required This PR does not need a change note label May 25, 2023
@hvitved hvitved marked this pull request as ready for review May 25, 2023 12:00
@hvitved hvitved requested review from a team as code owners May 25, 2023 12:00
@asgerf
Copy link
Contributor

asgerf commented May 26, 2023

There's quite a few lost results in the Ruby evaluation.

In particular, there are projects that didn't lose any call edges or taint sources, but where we lost a bunch of tainted nodes. That would imply that we lost some steps somehow, but it's not clear why. Have you investigated the underlying reason?

Given the changes to type-tracking, one possible explanation could be that API graphs can no longer track use/def nodes to the same extent it did before, and so some summaries no longer apply to the same call sites.

For now I've opened #13299 which adds some meta-queries for measuring calls to summarized callables. Seems useful to have in general, and perhaps this could help narrow down the root cause of these changes.

Otherwise the code LGTM although it's honestly a bit hard to review. There seems to be an unfortunate increase in complexity overall, but it seems to be internal, while the public API is actually getting simplified so I'm ok with that.

yoff
yoff previously approved these changes May 26, 2023
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python changes look fine, and the Python evaluation looks like a small speed-up, even :-)
I agree that the code looks a bit more complicated now, but the rewrite is quite systematic and I wonder if it calls for being packed into parameterised modules (although not necessarily in this PR).

In particular the first commit seems to call for a module for folding over a relation and get the right join order, while the two following commits suggest a module for recursive type tracking.

@hvitved
Copy link
Contributor Author

hvitved commented Jun 6, 2023

There's quite a few lost results in the Ruby evaluation.

In particular, there are projects that didn't lose any call edges or taint sources, but where we lost a bunch of tainted nodes. That would imply that we lost some steps somehow, but it's not clear why. Have you investigated the underlying reason?

I checked the changes to Hard-coded credentials on fastlane, and it is a result of the (buggy) "field flow branch limit" implementation, where each actual return node will now contribute to the join predicate, instead of having just a single contribution from the canonical return node.

@hvitved
Copy link
Contributor Author

hvitved commented Jun 6, 2023

I agree that the code looks a bit more complicated now, but the rewrite is quite systematic and I wonder if it calls for being packed into parameterised modules (although not necessarily in this PR).

I'll do this. Putting this PR back into draft, until I see whether the approach works.

@hvitved hvitved marked this pull request as draft June 6, 2023 13:42
@hvitved hvitved force-pushed the ruby/remove-synth-returns branch from 025471d to 3569561 Compare June 6, 2023 19:51
@hvitved hvitved force-pushed the ruby/remove-synth-returns branch from 3569561 to 48ac3e5 Compare June 7, 2023 07:04
@hvitved hvitved marked this pull request as ready for review June 7, 2023 07:42
@hvitved
Copy link
Contributor Author

hvitved commented Jun 7, 2023

PR is ready for review again; only the last three commits are new.

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra packaging work, the diff now looks extremely nice!
I really like how we can just use the interface in the Python code base and have the join-order logic happen for free behind the scenes.

Comment on lines +233 to +247
pragma[nomagic]
private predicate stepProj(TypeTrackingNode nodeFrom, StepSummary summary) {
step(nodeFrom, _, summary)
}

bindingset[nodeFrom, t]
pragma[inline_late]
pragma[noopt]
private TypeTracker stepInlineLate(TypeTracker t, TypeTrackingNode nodeFrom, TypeTrackingNode nodeTo) {
exists(StepSummary summary |
stepProj(nodeFrom, summary) and
result = t.append(summary) and
step(nodeFrom, nodeTo, summary)
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern looks like it could also be hidden in a parameterised module. I think of it as one of the micro modules that @jbj envisioned when he spoke in Nice about "the future with parameterised modules".

It does not have to happen in this PR, though :-)

Comment on lines +238 to +240
bindingset[nodeFrom, t]
pragma[inline_late]
pragma[noopt]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Jamen, det er jo hele tre ting på én gang! Det går da virkelig ikke."

(With apologies to those in the audience who did not grow up with Danish commercials for Kinder eggs.)

@hvitved
Copy link
Contributor Author

hvitved commented Jun 8, 2023

@asgerf : Would you like another look before merge?

Copy link
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! LGTM and thanks for investigating the result changes.

@hvitved hvitved merged commit cee7088 into github:main Jun 8, 2023
@hvitved hvitved deleted the ruby/remove-synth-returns branch June 8, 2023 08:07

pragma[nomagic]
private Node track(Input::State state, TypeTracker t) {
t.start() and Input::start(result, state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this PR is merged already, but shouldn't the start have had the filter applied as well? @hvitved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will not work in the Ruby use-case, but I realize that in principle we ought to restrict all the Python start predicates using ignoreForCallGraph, yet as soon as we take a single step things will be filtered away, so I don't think it will matter in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python Ruby
5 participants