Add AvxIfma + AvxIfma.V512 hardware intrinsics#130079
Conversation
Adds the managed AvxIfma class (V128/V256 VEX form) and nested AvxIfma.V512 (EVEX form) for VPMADD52LUQ / VPMADD52HUQ. Structural changes only (no tests yet, no build validation yet): - hwintrinsiclistxarch.h: new AVXIFMA range + AVX512v2 IFMA entries - hwintrinsicxarch.cpp: V512VersionOfIsa AVXIFMA/AVX512v2 case, lookupInstructionSet Ifma dispatch - hwintrinsiccodegenxarch.cpp: dispatch cases for MultiplyAdd52Low/High - lsraxarch.cpp: RMW cases - lowerxarch.cpp: default-branch assert extension - CorInfoInstructionSet.cs: nested type lookup + LookupPlatformIntrinsicTypes - InstructionSetDesc.txt: nested type alias AvxIfma_V512 - InstructionSetHelpers.cs: optimistic support - projitems: include new files - AvxIfma.cs + AvxIfma.PlatformNotSupported.cs: managed surface - System.Runtime.Intrinsics.cs: ref assembly Closes dotnet#98833, closes dotnet#96476.
Adds sample tests mirroring the AvxVnni.V512 test shape: - X86_Avx/AvxIfma/ — V128/V256 VEX form - X86_Avx/AvxIfma_V512/ — V512 EVEX form NativeAOT smoke test: uncomments AvxIfma / AvxIfma.X64 checks (were pre-approved but not yet wired) and adds AvxIfma.V512 check. AvxIfma.V512 folds under AVX512v2 (Avx512Vbmi is a representative sibling).
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR adds the managed System.Runtime.Intrinsics.X86.AvxIfma surface (VEX AVX-IFMA for Vector128/256<ulong>) plus a nested AvxIfma.V512 surface (EVEX AVX-512-IFMA for Vector512<ulong>), and wires the corresponding JIT/AOT instruction-set plumbing and sample tests.
Changes:
- Add
AvxIfma/AvxIfma.V512public API surface (ref + CoreLib implementation + PNSE stubs) and include it in CoreLib build items. - Extend JIT HWIntrinsic tables and codegen/LSRA/lowering handling for
VPMADD52LUQ/HUQ(VEX + EVEX forms). - Add JIT and NativeAOT smoke tests covering the new intrinsics.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/AvxIfma.cs | Adds managed intrinsic declarations for AVX-IFMA + nested V512 surface. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/AvxIfma.PlatformNotSupported.cs | Adds PNSE stubs for unsupported platforms/hardware. |
| src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems | Wires new CoreLib source files into the build. |
| src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs | Adds public ref-assembly API surface for AvxIfma / AvxIfma.V512. |
| src/coreclr/jit/hwintrinsicxarch.cpp | Updates ISA lookup / V512 ISA mapping for the new intrinsics. |
| src/coreclr/jit/hwintrinsiclistxarch.h | Adds AVXIFMA and AVX512v2 intrinsic table entries for MultiplyAdd52*. |
| src/coreclr/jit/hwintrinsiccodegenxarch.cpp | Adds codegen dispatch coverage for the new intrinsic IDs. |
| src/coreclr/jit/lsraxarch.cpp | Adds LSRA handling for the new RMW-style 3-operand intrinsics. |
| src/coreclr/jit/lowerxarch.cpp | Extends containment asserts/logic to include the new intrinsic IDs. |
| src/coreclr/tools/Common/JitInterface/CorInfoInstructionSet.cs | Adds type/ISA mapping entries for nested AvxIfma forms and AVX512v2 aliasing. |
| src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt | Adds AvxIfma_V512 alias mapping for the Avx512Ifma row. |
| src/coreclr/tools/Common/InstructionSetHelpers.cs | Adds avxifma_v512 to the optimistic instruction-set set. |
| src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma/Program.AvxIfma.cs | Adds test program stub/namespace for AVXIFMA tests. |
| src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma/AvxIfmaSampleTest.cs | Adds V128/V256 semantic sample tests for MultiplyAdd52*. |
| src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma/*.csproj | Adds project files to run the AVXIFMA JIT tests in r/ro variants. |
| src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma_V512/Program.AvxIfma_V512.cs | Adds test program stub/namespace for AVXIFMA V512 tests. |
| src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma_V512/AvxIfma_V512SampleTest.cs | Adds V512 semantic sample tests for MultiplyAdd52*. |
| src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma_V512/*.csproj | Adds project files to run the AVXIFMA V512 JIT tests in r/ro variants. |
| src/tests/nativeaot/SmokeTests/HardwareIntrinsics/Program.cs | Enables NativeAOT smoke checks for AvxIfma and adds AvxIfma.V512. |
| HARDWARE_INTRINSIC(AVX512v2, MultiplyAdd52Low, 64, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmadd52luq, INS_invalid, INS_invalid}, 5, -1, HW_Category_SimpleSIMD, HW_Flag_NoFlag) | ||
| HARDWARE_INTRINSIC(AVX512v2, MultiplyAdd52High, 64, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmadd52huq, INS_invalid, INS_invalid}, 5, -1, HW_Category_SimpleSIMD, HW_Flag_NoFlag) |
There was a problem hiding this comment.
Good catch — but the fix was actually the ordering of the HARDWARE_INTRINSIC entries, not the simdSize itself. binarySearchId uses strcmp (case-sensitive ASCII) to locate intrinsics within an ISA range, so entries must be alphabetically sorted. My initial ordering had MultiplyAdd52Low before MultiplyAdd52High, but 'H' (0x48) < 'L' (0x4C), so the search couldn't find High and ILC failed with "Code generation failed" on the NativeAOT builds.
Fixed in add8927e21d: swapped so High precedes Low in both the AVXIFMA range and the AVX512v2 range, and moved the pair to the correct alphabetical position between MultiShift and PermuteVar16x8.
On the fixed-simdSize=64 concern for AVX512v2 specifically: this mirrors the AvxVnni.V512 pattern in #128365 exactly (NI_AVX512v3_MultiplyWideningAndAdd is also simdSize=64). The class-name dispatch in lookupInstructionSet chooses AVXIFMA vs AVX512v2 based on compSupportsHWIntrinsic — for a call site using V128/V256 args, the class-name lookup returns AVXIFMA (V128/V256, simdSize=-1) whenever the AVXIFMA CPUID bit is available (which is true across all AOT targets in optimisticInstructionSetSupport).
There was a problem hiding this comment.
Update: the real root cause is different — the simdSize=64 mismatch is a theoretical issue but wasn't what fired here. The actual failure was a KeyNotFoundException in HardwareIntrinsicILProvider.GetMethodIL.
InstructionSetDesc.txt had AvxIfma in the R2R-name column (col 4) instead of the managed-name column (col 3), so the generated ArchitectureToValidInstructionSets emitted the avxifma entry with an empty ManagedName. HardwareIntrinsicILProvider skips empty ManagedNames when building its class-name→InstructionSet dictionary — so ILC's _instructionSetMap["AvxIfma"] threw:
System.Collections.Generic.KeyNotFoundException: The given key 'AvxIfma' was not present in the dictionary.
at ILCompiler.HardwareIntrinsicILProvider.GetMethodIL(...) HardwareIntrinsicILProvider.cs:line 44
Fixed in ff40b908575 — moved AvxIfma from col 4 to col 3 in InstructionSetDesc.txt and updated the already-generated CorInfoInstructionSet.cs accordingly.
| public abstract class AvxIfma : System.Runtime.Intrinsics.X86.Avx2 | ||
| { | ||
| internal AvxIfma() { } | ||
| public static new bool IsSupported { get { throw null; } } | ||
| public static System.Runtime.Intrinsics.Vector128<ulong> MultiplyAdd52Low(System.Runtime.Intrinsics.Vector128<ulong> addend, System.Runtime.Intrinsics.Vector128<ulong> left, System.Runtime.Intrinsics.Vector128<ulong> right) { throw null; } | ||
| public static System.Runtime.Intrinsics.Vector128<ulong> MultiplyAdd52High(System.Runtime.Intrinsics.Vector128<ulong> addend, System.Runtime.Intrinsics.Vector128<ulong> left, System.Runtime.Intrinsics.Vector128<ulong> right) { throw null; } | ||
| public static System.Runtime.Intrinsics.Vector256<ulong> MultiplyAdd52Low(System.Runtime.Intrinsics.Vector256<ulong> addend, System.Runtime.Intrinsics.Vector256<ulong> left, System.Runtime.Intrinsics.Vector256<ulong> right) { throw null; } | ||
| public static System.Runtime.Intrinsics.Vector256<ulong> MultiplyAdd52High(System.Runtime.Intrinsics.Vector256<ulong> addend, System.Runtime.Intrinsics.Vector256<ulong> left, System.Runtime.Intrinsics.Vector256<ulong> right) { throw null; } | ||
| public new abstract partial class X64 : System.Runtime.Intrinsics.X86.Avx2.X64 | ||
| { | ||
| internal X64() { } | ||
| public static new bool IsSupported { get { throw null; } } | ||
| } | ||
| public abstract partial class V512 | ||
| { | ||
| internal V512() { } | ||
| public static bool IsSupported { get { throw null; } } | ||
| public static System.Runtime.Intrinsics.Vector512<ulong> MultiplyAdd52Low(System.Runtime.Intrinsics.Vector512<ulong> addend, System.Runtime.Intrinsics.Vector512<ulong> left, System.Runtime.Intrinsics.Vector512<ulong> right) { throw null; } | ||
| public static System.Runtime.Intrinsics.Vector512<ulong> MultiplyAdd52High(System.Runtime.Intrinsics.Vector512<ulong> addend, System.Runtime.Intrinsics.Vector512<ulong> left, System.Runtime.Intrinsics.Vector512<ulong> right) { throw null; } | ||
| } |
There was a problem hiding this comment.
Tanner explicitly authorised the AvxIfma.V512 nested-class shape (as opposed to a separate Avx512Ifma type) in his 2026-06-30 reply on #98833:
"@jamesburton, feel free to pick this up. #96476 should be done simultaneously and should have the same fixup we've had approved and done for the other ISAs, where its a nested V512 class."
The nested-V512 pattern mirrors the AvxVnni.V512 restructure that landed in #128365 (which itself replaced an original Avx512Vnni proposal on the same grounds). @saucecontrol also pre-proposed this exact shape on #96476 on 2024-11-06. So the shape carries prior consensus even though the individual comment approving the original Avx512Ifma still exists in the issue history.
…tical binarySearchId uses strcmp (case-sensitive ASCII order) to find intrinsics within an ISA range. My initial ordering put MultiplyAdd52Low before MultiplyAdd52High, but 'H' (0x48) < 'L' (0x4C), so the binary search would fail on High. Same issue for both AVXIFMA and AVX512v2 ranges. Also reorders MultiplyAdd52*/PermuteVar* so Multiply* precedes PermuteVar* (M < P). This surfaced as ILC 'Code generation failed' on the AvxIfma_handwritten_r NativeAOT tests.
The LookupPlatformIntrinsicTypes switch was missing cases for InstructionSet.X64_AVXIFMA/AVXIFMA_X64 and X86_AVXIFMA. When the AOT compiler tried to resolve AvxIfma against an AVXIFMA-only target (e.g. NativeAOT smoke tests X64Baseline / X64Avx2), the type wasn't found and ILC failed with 'Code generation failed for AvxIfma.get_IsSupported()'. Mirrors the existing X64_AVXVNNI/X86_AVXVNNI cases.
| HARDWARE_INTRINSIC(AVX512v2, MultiplyAdd52High, 64, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmadd52huq, INS_invalid, INS_invalid}, 5, -1, HW_Category_SimpleSIMD, HW_Flag_NoFlag) | ||
| HARDWARE_INTRINSIC(AVX512v2, MultiplyAdd52Low, 64, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmadd52luq, INS_invalid, INS_invalid}, 5, -1, HW_Category_SimpleSIMD, HW_Flag_NoFlag) |
InstructionSetDesc.txt had AvxIfma in the R2R-name column (col 4) instead of the managed-name column (col 3), so the generated ArchitectureToValidInstructionSets emitted 'avxifma' with an empty ManagedName. HardwareIntrinsicILProvider builds its class-name -> InstructionSet dictionary from ArchitectureToValidInstructionSets, skipping entries with empty ManagedName, so lookups for 'AvxIfma' threw KeyNotFoundException at ILC time. Fixed the .txt source-of-truth (so any regeneration is consistent) and the already-generated CorInfoInstructionSet.cs to set ManagedName='AvxIfma'.
Summary
Adds the managed
AvxIfmasurface for the AVX-IFMA (VEX) and AVX-512-IFMA (EVEX) ISAs, exposingVPMADD52LUQ/VPMADD52HUQonVector128<ulong>/Vector256<ulong>/Vector512<ulong>.Closes #98833 (AVX-IFMA — VEX, V128/V256)
Closes #96476 (AVX-512-IFMA — EVEX, V512)
Both handled together per Tanner's direction on #98833:
Shape
Mirrors the
AvxVnni/AvxVnni.V512pattern that landed in #128365, and the shape @saucecontrol pre-proposed for IFMA on 2024-11-06:AvxIfma.V512folds under the existingAVX512v2JIT instruction set (same relationship asAvxVnni.V512→AVX512v3in #128365). The class-name dispatch inlookupInstructionSetchoosesAVXIFMAwhen that CPUID bit is present, otherwiseAVX512v2— so on a hypothetical machine with AVX-512-IFMA but not VEX AVX-IFMA,AvxIfma.V512.IsSupported(andAvxIfma.IsSupported) still reporttrue.What was already wired
Significant pre-existing plumbing meant this PR is small:
INS_vpmadd52luq/INS_vpmadd52huqalready ininstrsxarch.h(FIRST_AVXIFMA_INSTRUCTION..LAST_AVXIFMA_INSTRUCTION)emitxarch.cppalready gates VEX form onInstructionSet_AVXIFMA(line 353) and EVEX form onInstructionSet_AVX512v2(line 430)Is3OpRmwInstructionalready returnstruevia the AVXIFMA range checkInstructionSet_AVXIFMA/AVXIFMA_X64enums, R2R IDs (AvxIfma=66,Avx512Ifma=76),EnableAVXIFMAconfig, andcodeman.cppCPUID gate — all presentWhat this PR adds
JIT wire-up:
hwintrinsiclistxarch.h: newAVXIFMArange (MultiplyAdd52Low/MultiplyAdd52Highat ULONG slot withHW_Flag_NoEvexSemantics) + AVX512v2 V512 entrieshwintrinsicxarch.cpp:V512VersionOfIsamaps AVXIFMA/AVX512v2 → AVX512v2;lookupInstructionSetdispatches "Ifma" class between AVXIFMA and AVX512v2 bycompSupportsHWIntrinsichwintrinsiccodegenxarch.cpp: dispatch cases for the 4 new NIs alongside the existing VNNI/AVX512v3 grouplsraxarch.cpp: RMW handling for the 4 new NIslowerxarch.cpp: default-branch assert extended for the new NIs + AVXIFMA rangeR2R + AOT metadata:
CorInfoInstructionSet.cs: nested-type lookup forAvxIfma(X64/V512/V512_X64) on X64 and X86 +LookupPlatformIntrinsicTypesfor the AVX512v2 case + newInstructionSetInfo("avx512v2", "AvxIfma_V512", ...)entriesInstructionSetDesc.txt: addsAvxIfma_V512alias to the Avx512Ifma rowInstructionSetHelpers.cs: addsavxifma_v512to the optimistic setManaged surface:
AvxIfma.cs+AvxIfma.PlatformNotSupported.cs(new files).Shared.projitems: includes bothSystem.Runtime.Intrinsics.cs(ref assembly): full API surfaceTests:
X86_Avx/AvxIfma/— V128/V256 VEX form sample testX86_Avx/AvxIfma_V512/— V512 EVEX form sample testAvxIfma/AvxIfma.X64checks (were pre-approved stubs) and addsAvxIfma.V512checkValidation
build.cmd clr.jit+clr.runtime+libs -c Release -arch x64— 0 errors / 0 warningsifmaprobe/ probe DLL swap deferred to CI: locally the packs pipeline is currently broken by a WiX installer failure, so the standardRestoreAdditionalProjectSources → Shippingapproach doesn't refresh. Runtime verification will land via the CI test legs.Compatibility note
The R2R ID
Avx512Ifma = 76retains its numeric value; existing R2R images continue to work. TheInstructionSetDesc.txtrow now recordsAvxIfma_V512as the nested-type alias, matching the pattern PR #128365 used forAvx512Vnni→AvxVnni_V512.🤖 Generated with Claude Code