Shared: Support flow summaries from ReturnValues#22061
Conversation
… from the return value of a call.
There was a problem hiding this comment.
Pull request overview
Extends the shared flow-summary infrastructure to support “reverse-flow” style summaries where writes through an indirect ReturnValue[*] can be modeled as flowing back into parameters/receiver state (e.g., C++ std::string::operator[]-style patterns). This enables replacing legacy reverse-flow models with flow summaries like ReturnValue[*] -> Argument[this].
Changes:
- Adds shared support for flow summaries originating from return kinds via
FlowSummaryCallBase, plus plumbing to map return-kinds to synthetic parameter positions. - Updates multiple language-specific flow-summary integrations to the new
summaryLocalStep(Node pred, SummaryNode succ, ...)API and addsStepsInputSig.getSummaryNode. - Adds/updates C++ external-model tests and models to exercise reverse-flow via
ReturnValue[*].
Show a summary per file
| File | Description |
|---|---|
| swift/ql/lib/codeql/swift/dataflow/internal/TaintTrackingPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| swift/ql/lib/codeql/swift/dataflow/internal/FlowSummaryImpl.qll | Implements StepsInputSig.getSummaryNode adapter for Swift flow summaries. |
| swift/ql/lib/codeql/swift/dataflow/internal/DataFlowPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| shared/dataflow/codeql/dataflow/internal/FlowSummaryImpl.qll | Core shared implementation: adds call base + return-kind-as-input support and new summary local-step plumbing. |
| rust/ql/lib/codeql/rust/dataflow/internal/TaintTrackingImpl.qll | Updates flow-summary local step hookup to pass Node directly. |
| rust/ql/lib/codeql/rust/dataflow/internal/FlowSummaryImpl.qll | Adds FlowSummaryCallBase and implements getSummaryNode for Rust integration. |
| rust/ql/lib/codeql/rust/dataflow/internal/DataFlowImpl.qll | Updates flow-summary local step hookup to pass Node directly. |
| ruby/ql/lib/codeql/ruby/dataflow/internal/TaintTrackingPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| ruby/ql/lib/codeql/ruby/dataflow/internal/FlowSummaryImpl.qll | Adds FlowSummaryCallBase and implements getSummaryNode for Ruby integration. |
| ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| python/ql/lib/semmle/python/dataflow/new/internal/TaintTrackingPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| python/ql/lib/semmle/python/dataflow/new/internal/FlowSummaryImpl.qll | Adds FlowSummaryCallBase and implements getSummaryNode for Python integration. |
| python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| javascript/ql/lib/semmle/javascript/dataflow/internal/TaintTrackingPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| javascript/ql/lib/semmle/javascript/dataflow/internal/sharedlib/DataFlowArg.qll | Adds FlowSummaryCallBase type alias for JS integration. |
| javascript/ql/lib/semmle/javascript/dataflow/internal/FlowSummaryPrivate.qll | Implements StepsInputSig.getSummaryNode for JS flow-summary steps. |
| javascript/ql/lib/semmle/javascript/dataflow/internal/DataFlowPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| java/ql/lib/semmle/code/java/dataflow/internal/TaintTrackingUtil.qll | Updates flow-summary local step hookup to pass Node directly. |
| java/ql/lib/semmle/code/java/dataflow/internal/FlowSummaryImpl.qll | Adds FlowSummaryCallBase and implements getSummaryNode for Java integration. |
| java/ql/lib/semmle/code/java/dataflow/internal/DataFlowUtil.qll | Updates flow-summary local step hookup to pass Node directly. |
| go/ql/lib/semmle/go/dataflow/internal/TaintTrackingUtil.qll | Updates flow-summary local step hookup to pass Node directly. |
| go/ql/lib/semmle/go/dataflow/internal/FlowSummaryImpl.qll | Adds FlowSummaryCallBase and implements getSummaryNode for Go integration. |
| go/ql/lib/semmle/go/dataflow/internal/DataFlowUtil.qll | Updates flow-summary local step hookup to pass Node directly. |
| csharp/ql/lib/semmle/code/csharp/dataflow/internal/TaintTrackingPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| csharp/ql/lib/semmle/code/csharp/dataflow/internal/FlowSummaryImpl.qll | Adds FlowSummaryCallBase and implements getSummaryNode for C# integration. |
| csharp/ql/lib/semmle/code/csharp/dataflow/internal/DataFlowPrivate.qll | Updates flow-summary local step hookup to pass Node directly. |
| cpp/ql/test/library-tests/dataflow/external-models/test.cpp | Adds test cases exercising reverse-flow via assignments through reference returns. |
| cpp/ql/test/library-tests/dataflow/external-models/sources.expected | Updates golden source results for added reverse-flow test cases. |
| cpp/ql/test/library-tests/dataflow/external-models/sinks.expected | Updates golden sink results for added reverse-flow test cases. |
| cpp/ql/test/library-tests/dataflow/external-models/flow.ext.yml | Adds external flow-summary models using ReturnValue[*] for reverse-flow scenarios. |
| cpp/ql/test/library-tests/dataflow/external-models/flow.expected | Updates golden flow edges/nodes to reflect new summary-node behavior. |
| cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/TaintTrackingUtil.qll | Updates flow-summary local step hookup to pass Node directly. |
| cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll | Updates flow-summary local step hookup to pass Node directly. |
| cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll | Adds a synthetic argument node/position for flow summaries representing indirect-return writes. |
| cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowNodes.qll | Routes enclosing-callable resolution through shared flow-summary infrastructure. |
| cpp/ql/lib/semmle/code/cpp/dataflow/internal/FlowSummaryImpl.qll | Implements C/C++ instantiation of new shared hooks (call base, return-kind parameter position, out-node mapping). |
Review details
- Files reviewed: 36/36 changed files
- Comments generated: 1
- Review effort level: Low
| - ["", "TemplateClass1", True, "templateFunction2<U,V>", "(U,V)", "", "Argument[1]", "ReturnValue", "value", "manual"] | ||
| - ["", "TemplateClass2<T,U>", True, "function", "(U,T)", "", "Argument[1]", "ReturnValue", "value", "manual"] No newline at end of file | ||
| - ["", "TemplateClass2<T,U>", True, "function", "(U,T)", "", "Argument[1]", "ReturnValue", "value", "manual"] | ||
| - ["", "ReverseFlow", True, "get_ptr", "", "", "ReturnValue[*]", "Argument[-1].Field[value]", "value", "manual"] |
There was a problem hiding this comment.
I take it that the model one would naturally write is
- ["", "ReverseFlow", True, "get_ptr", "", "", "Argument[-1].Field[value]", "ReturnValue[*]", "value", "manual"]but it doesn't automatically induce a reverse step because ReturnValue[*] means it is not a getter, but a getter+setter?
There was a problem hiding this comment.
Oh, that's a good point. The "natural model" (i.e., Argument[-1].Field[value] -> ReturnValue[*]) actually does induce a reverse step from the dataflow library since it's a getter for a single field. So maybe the example wasn't super well chosen.
| /** | ||
| * A base class of calls that are candidates for flow summary modeling. | ||
| */ | ||
| class FlowSummaryCallBase { | ||
| string toString(); | ||
| } | ||
|
|
||
| /** Gets a call that targets summarized callable `sc`. */ | ||
| default FlowSummaryCallBase getASourceCall(SummarizedCallableBase sc) { none() } | ||
|
|
||
| /** Gets the callable corresponding to summarized callable `c`. */ | ||
| default Lang::DataFlowCallable getSummarizedCallableAsDataFlowCallable(SummarizedCallableBase c) { | ||
| none() | ||
| } | ||
|
|
||
| /** Gets the enclosing callable of `call`. */ | ||
| default Lang::DataFlowCallable getSourceCallEnclosingCallable(FlowSummaryCallBase call) { none() } |
There was a problem hiding this comment.
Can all of these be replaced with DataFlowCall getACall(SummarizedCallable sc);, moving it from StepsInputSig to here?
There was a problem hiding this comment.
Unfortunately that introduces a non-monotonic recursion issue (since getACall returns a DataFlowCall and the summary call branch for DataFlowCall is defined using a FlowSummaryImpl::Private::SummaryNode).
I've pushed what I believe you suggest here in case I've misunderstood your suggestion: MathiasVP@b3f695d
There was a problem hiding this comment.
The following appears to fix it:
- Make
DataFlowCall.getStaticCallSourceTargetabstract. - Add an
override Declaration getStaticCallSourceTarget() { none() }toSummaryCall.
There was a problem hiding this comment.
This works for C++ (and Rust and Swift), but all other languages seem to have non-monotonic recursion problems: https://github.com/MathiasVP/ql/commits/mad-write-through-model-simplification/
The
std::stringclass in C++ is mutable so you can do:This is just a regular assignment where the left-hand side is a reference to the internal buffer. Historically, to support this in C++ taint-tracking we've specified "reverse flows":
codeql/cpp/ql/lib/semmle/code/cpp/models/implementations/StdString.qll
Lines 384 to 386 in b8c78fd
So when an assignment flows from the right-hand side to the left-hand side (i.e., to the returned reference (actually to the indirections of the reference, but nevermind)) the "reverse flow" model transfers flow to the qualifier.
We'd obviously like to get rid of all these old models and replace them with flow summaries. However, flow summaries do not allow for these reverse flow summaries. This PR fixes that so that we can model this using a summary such as
ReturnValue[*] -> Argument[this].Most of the work is in bb2ec12. We add a new summary node (which is expected to be lifted to a proper
ArgumentNodeby each language) along with a new parameter/argument position to transfer the value into the summarized callable. 662f522 shows how the new predicates are instantiated for C/C++.Commit-by-commit review recommended.
This PR doesn't actually add any real models to C/C++. However, I have a local branch where I've checked on DCA that the performance of these changes are fine.