Python: Improve some flow summaries by hvitved · Pull Request #22101 · github/codeql

hvitved · 2026-07-01T09:56:37Z

This PR rewrites some Python flow summaries into equivalent* but much more performant summaries. Some summaries, like builtins.enumerate, could be rewritten directly by making use of existing ContentSets, while others, such as builtins.dict, required adding support for "with content".

*The flow summary for builtins.zip has been generalized by removing the restriction that only the first two arguments are taken into account.

DCA looks excellent; we reduce analysis time on dask__dask by 95 %, but also recover the performance on saltstack__salt that was originally lost on #21888 (note that the performance lost on that PR for ytdl-org__youtube-dl was already recovered in #21941).

owen-mc · 2026-07-01T11:25:14Z

-        preservesValue = true
-      )
+      input = "Argument[0].WithAnyDictionaryElement" and
+      output = "ReturnValue" and


This seems to change the meaning of the flow summary - previously it went from Argument[0].DictionaryElement[x] to ReturnValue.DictionaryElement[x] for all keys x, and now it goes from Argument[0].WithAnyDictionaryElement to ReturnValue. Can you explain?

Argument[0].DictionaryElement[x] is saying that data must be stored inside a dictionary value with some key, which has the exact same effect as the original flow summary, except it doesn't compile down to a bunch of read-steps followed by a bunch of identical store-steps, and it is hence much more performant.

Ah, I think I get it now. I've never come across this mechanism before - go seems to be one of the few languages not using it. So, if I've understood correctly, if the input is <input>.WithAnyDictionaryElement and the output is <output> then the MaD machinery will create flow from <input> to <output> as long as there is flow to <input>.AnyDictionaryElement? A little bit like a lookahead in regexes, where it checks for the existence of something without consuming it.

Copilot

Pull request overview

This PR updates Python dataflow flow summaries to be (mostly) equivalent while significantly improving performance by relying more on existing ContentSet encodings and adding support for summaries that depend on “with content”. It also broadens the builtins.zip modeling by removing the prior limitation that only the first two arguments were considered.

Changes:

Rework several stdlib flow summaries to use Any*Element / WithAny*Element-style encodings instead of per-index/per-key expansion.
Add infrastructure to encode and consume “with content” in flow summaries and plumb “expects content” through the dataflow internals.
Update Python library tests to reflect the new (generalized) behavior, including new expected and known-spurious flows.

Show a summary per file

File	Description
python/ql/test/library-tests/frameworks/django-orm/testapp/orm_tests.py	Updates ORM test expectations to reflect newly modeled flow through `in_bulk().values()`.
python/ql/test/library-tests/dataflow/coverage/test_builtins.py	Adjusts `zip` tuple test expectations for generalized argument handling and records a known spurious flow.
python/ql/lib/semmle/python/frameworks/Stdlib.qll	Rewrites several stdlib flow summaries to use broader content encodings and “with content” forms.
python/ql/lib/semmle/python/dataflow/new/internal/FlowSummaryImpl.qll	Adds encoding helper for “with content” in flow summary representations.
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll	Hooks `expectsContent` up to flow-summary-specific content expectations.

Review details

Files reviewed: 4/5 changed files
Comments generated: 0
Review effort level: Low

github-actions Bot added the Python label Jul 1, 2026

Python: Improve some flow summaries

a5444b5

hvitved force-pushed the python/flow-summaries-improvements branch from 34db1c7 to a5444b5 Compare July 1, 2026 10:06

Python: Update inline test expectations

2bf6031

owen-mc reviewed Jul 1, 2026

View reviewed changes

hvitved added the no-change-note-required This PR does not need a change note label Jul 1, 2026

hvitved marked this pull request as ready for review July 1, 2026 12:32

hvitved requested a review from a team as a code owner July 1, 2026 12:32

Copilot AI review requested due to automatic review settings July 1, 2026 12:32

Copilot started reviewing on behalf of hvitved July 1, 2026 12:32 View session

Copilot AI reviewed Jul 1, 2026

View reviewed changes

owen-mc approved these changes Jul 1, 2026

View reviewed changes

hvitved merged commit 6c3c5ea into github:main Jul 1, 2026
20 checks passed

hvitved deleted the python/flow-summaries-improvements branch July 1, 2026 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python: Improve some flow summaries#22101

Python: Improve some flow summaries#22101
hvitved merged 2 commits into
github:mainfrom
hvitved:python/flow-summaries-improvements

hvitved commented Jul 1, 2026 •

edited

Loading

Uh oh!

owen-mc Jul 1, 2026

Uh oh!

hvitved Jul 1, 2026

Uh oh!

owen-mc Jul 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hvitved commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

owen-mc Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

hvitved Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

owen-mc Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hvitved commented Jul 1, 2026 •

edited

Loading

owen-mc Jul 1, 2026 •

edited

Loading