Kotlin extractor: fix K2 regressions without expected churn#22076
Draft
andersfugmann wants to merge 6 commits into
Draft
Kotlin extractor: fix K2 regressions without expected churn#22076andersfugmann wants to merge 6 commits into
andersfugmann wants to merge 6 commits into
Conversation
831e87b to
6f3f2fc
Compare
Why this is needed: - Under K2, top-level declarations from external binaries are attached directly to IrExternalPackageFragment rather than to an IrClass file-class parent. - That bypassed the normal class-source location path, so some external file-class entities ended up without stable binary file locations. - Missing/unstable locations caused drift in tests that depend on external file class member resolution and location facts. What this changes: - Resolve binary paths from IrMemberWithContainerSource (JvmPackagePartSource) via a dedicated getContainerSourceBinaryPath helper. - In KotlinUsesExtractor, when extracting top-level external declarations, attach file-class location from container-source binary path when available. - Track external file classes whose locations were emitted to avoid duplicate hasLocation facts. This targets the K2 external file-class location gap (for example file_classes and external-property-overloads parity). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4303989 to
8157a86
Compare
Why this is needed: - With Kotlin 2.0 analysis, some local-variable locations resolve to a wider declaration span than before. - The previous extractor logic used provider-based ranges that can cover type, annotations, and modifiers, which shifts expected variable location facts. - This caused parity drift in tests that expect the location to point at the variable name token itself. What this changes: - Cache current source text per file during extraction. - Derive variable-name offsets by scanning the declaration slice and locating the declared identifier token. - Emit local-variable declaration/expr locations from that identifier span, with fallback to the previous provider when source offsets are unavailable. This restores stable name-anchored variable locations under Kotlin 2.0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Why this is needed: - Under K2, binary Java symbols are represented differently from K1: JavaSourceElement metadata is often absent and sources are exposed through VirtualFileBasedSourceElement. - Without recovery logic, callable matching can miss declared Java methods, callable labels can drift (primitive vs boxed reference types), and external Java declaration stubs can gain wildcard noise when Java signatures are not available. - These differences produced Kotlin 2.0 parity drift in tests that rely on stable Java/Kotlin cross-extractor callable identity. What this changes: - Add K2-aware Java binary inspection helpers (ASM-based fallback) to detect declared methods and parameter/return reference-vs-primitive shape when JavaSourceElement metadata is unavailable. - Recover Java callables more reliably in KotlinUsesExtractor, including a binary-class fallback path. - Normalise callable labels and call result typing to boxed Java classes when K2 enhanced reference types appear as Kotlin primitives. - Accept K2's `Any` form for Object.equals(Object) and keep binary declaration checks stable. - Suppress default wildcard insertion for external Java declaration stubs when no Java callable metadata is available, preventing synthetic wildcard drift. This commit restores Java interop parity for Kotlin 2.0 extraction paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Why this is needed: - The extractor compatibility fixes now preserve the information these Kotlin1-era tests were protecting, even when compiled with Kotlin 2.4 and `-language-version 2.0`. - Keeping mixed legacy language-version wiring in individual tests is no longer necessary and obscures the intended steady-state execution mode. What this changes: - Update all affected Kotlin1 compatibility integration tests to run with `-language-version 2.0` directly. - Keep the expected extraction signal aligned for extractor information output. - Remove the obsolete CODEOWNERS entry for the retired `java/ql/test-kotlin1/` path. This consolidates the language-version transition into a single test rollup commit, as requested. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Why this is needed: - library-tests/exprs/DB-CHECK was failing with INVALID_KEY and INVALID_KEY_SET in params for kotlin.jvm.internal.Intrinsics.areEqual. - The Java binary probing code matched methods by name plus arity and used the first match, which is ambiguous when both primitive and boxed overloads exist. - Under that ambiguity, callable labels could be boxed while extracted params remained primitive (or vice versa), creating conflicting rows for the same key. What changed: - For both parameter and return-type probing, gather all matching overloads and compute classifier-vs-primitive from the full candidate set. - Return a concrete answer only when all matches agree; return null when matches disagree. - Apply the same unambiguous matching rule in both K1 metadata and K2 ASM fallback paths. Effect: - The boxing fallback now activates only when the Java binary evidence is deterministic, preventing callable-label collisions and restoring DB integrity in the affected Kotlin2 dataset check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Why this is needed: - library-tests/multiple_files/method_accesses.ql regressed because receiver class locations for external file-class members became concrete file paths. - For stdlib-style unresolved container-source paths, forcing a concrete location changed stable output from synthetic unknown location to external path-based locations. What changed: - Added shouldUseConcreteExternalFileClassLocation to distinguish reliable concrete paths from unresolved placeholders. - In external package-fragment parent handling, only write an external file-class location when the normalized path is concrete and stable. - If no reliable path is available, keep prior synthetic behaviour by not forcing a concrete location. Effect: - Restores stable receiver-location output for method_accesses while preserving concrete locations when we have trustworthy binary-path information. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8157a86 to
b82ec0e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR groups the Kotlin extractor fixes needed to keep the migrated Kotlin1 and Kotlin2 integration tests stable when compiled with Kotlin 2.4 and
-language-version 2.0. The objective is parity with the previous extraction output: no loss of extracted information, and no expected-output churn that is not justified by a genuine behavioural difference between K1 and K2.What changed
Extractor changes, in dependency order:
Restore external file-class locations under K2. Under K2, top-level declarations from external binaries attach to
IrExternalPackageFragmentrather than to anIrClassfile-class parent, which bypassed the normal class-source location path. Binary paths are now resolved from the container source (JvmPackagePartSource) so external file-class entities keep stable binary file locations.Anchor local variable locations to the identifier. K2 resolves some local-variable locations to a wider declaration span (covering type, annotations, and modifiers). The variable-name offset is now derived by scanning the declaration slice, restoring name-anchored locations, with a fallback to the previous provider when source offsets are unavailable.
Reconcile Java binary signatures under K2. K2 represents binary Java symbols differently from K1 (
VirtualFileBasedSourceElementrather thanJavaSourceElement, with metadata often absent). ASM-based fallback inspection keeps callable matching, callable labels (primitive versus boxed), and external Java stubs stable.Roll Kotlin1 suites forward to language-version 2.0. With the extractor fixes in place, the migrated Kotlin1 tests run directly under
-language-version 2.0. This also removes the obsolete CODEOWNERS entry for the retiredjava/ql/test-kotlin1/path.Disambiguate binary overload probing. Fixes a DB-CHECK
INVALID_KEYfailure caused by ambiguous name-plus-arity matching when both primitive and boxed overloads exist. The probe now gathers all candidates and returns a concrete answer only when they agree, otherwise it returns null.Keep synthetic locations for unresolved file classes. Avoids forcing concrete external paths for stdlib-style unresolved container-source paths, preserving stable receiver-location output for
multiple_files/method_accesses.Scope Object-method redeclaration recovery. Constrains the
equals(Object)recovery to non-generic parents, skips it when a concrete sibling declaration already exists, and treats Kotlin binary classes (carryingkotlin.Metadata) as non-Java for redeclaration recovery.Notes on the approach
Most of these are genuine K2 adaptations (1, 2, 3, 5). Two are conservative stability measures rather than behavioural improvements: the synthetic-location handling (6) preserves prior output for unresolved paths, and the Object-method scoping (7) keeps K1-era expectations intact. Both are candidates for a follow-up that aligns K1 and K2 behaviour uniformly and updates the expectations instead.
The standalone 2.4.0 extractor target builds cleanly under
-Werror.This addresses investigation needed for Kotlin1 tests noted in https://github.com/github/codeql-kotlin-team/issues/196