Skip to content

Add AvxIfma + AvxIfma.V512 hardware intrinsics#130079

Open
jamesburton wants to merge 5 commits into
dotnet:mainfrom
jamesburton:feature/avxifma
Open

Add AvxIfma + AvxIfma.V512 hardware intrinsics#130079
jamesburton wants to merge 5 commits into
dotnet:mainfrom
jamesburton:feature/avxifma

Conversation

@jamesburton

Copy link
Copy Markdown
Contributor

Summary

Adds the managed AvxIfma surface for the AVX-IFMA (VEX) and AVX-512-IFMA (EVEX) ISAs, exposing VPMADD52LUQ / VPMADD52HUQ on Vector128<ulong> / Vector256<ulong> / Vector512<ulong>.

Closes #98833 (AVX-IFMA — VEX, V128/V256)
Closes #96476 (AVX-512-IFMA — EVEX, V512)

Both handled together per Tanner's direction on #98833:

"@jamesburton, feel free to pick this up. #96476 should be done simultaneously and should have the same fixup we've had approved and done for the other ISAs, where its a nested V512 class."

Shape

Mirrors the AvxVnni / AvxVnni.V512 pattern that landed in #128365, and the shape @saucecontrol pre-proposed for IFMA on 2024-11-06:

public abstract class AvxIfma : Avx2
{
    public static new bool IsSupported { get; }

    public static Vector128<ulong> MultiplyAdd52Low (Vector128<ulong> addend, Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> MultiplyAdd52High(Vector128<ulong> addend, Vector128<ulong> left, Vector128<ulong> right);
    public static Vector256<ulong> MultiplyAdd52Low (Vector256<ulong> addend, Vector256<ulong> left, Vector256<ulong> right);
    public static Vector256<ulong> MultiplyAdd52High(Vector256<ulong> addend, Vector256<ulong> left, Vector256<ulong> right);

    public new abstract class X64 : Avx2.X64 { public static new bool IsSupported { get; } }

    public abstract class V512
    {
        public static bool IsSupported { get; }
        public static Vector512<ulong> MultiplyAdd52Low (Vector512<ulong> addend, Vector512<ulong> left, Vector512<ulong> right);
        public static Vector512<ulong> MultiplyAdd52High(Vector512<ulong> addend, Vector512<ulong> left, Vector512<ulong> right);
    }
}

AvxIfma.V512 folds under the existing AVX512v2 JIT instruction set (same relationship as AvxVnni.V512AVX512v3 in #128365). The class-name dispatch in lookupInstructionSet chooses AVXIFMA when that CPUID bit is present, otherwise AVX512v2 — so on a hypothetical machine with AVX-512-IFMA but not VEX AVX-IFMA, AvxIfma.V512.IsSupported (and AvxIfma.IsSupported) still report true.

What was already wired

Significant pre-existing plumbing meant this PR is small:

  • INS_vpmadd52luq / INS_vpmadd52huq already in instrsxarch.h (FIRST_AVXIFMA_INSTRUCTION..LAST_AVXIFMA_INSTRUCTION)
  • emitxarch.cpp already gates VEX form on InstructionSet_AVXIFMA (line 353) and EVEX form on InstructionSet_AVX512v2 (line 430)
  • Is3OpRmwInstruction already returns true via the AVXIFMA range check
  • InstructionSet_AVXIFMA / AVXIFMA_X64 enums, R2R IDs (AvxIfma=66, Avx512Ifma=76), EnableAVXIFMA config, and codeman.cpp CPUID gate — all present

What this PR adds

JIT wire-up:

  • hwintrinsiclistxarch.h: new AVXIFMA range (MultiplyAdd52Low/MultiplyAdd52High at ULONG slot with HW_Flag_NoEvexSemantics) + AVX512v2 V512 entries
  • hwintrinsicxarch.cpp: V512VersionOfIsa maps AVXIFMA/AVX512v2 → AVX512v2; lookupInstructionSet dispatches "Ifma" class between AVXIFMA and AVX512v2 by compSupportsHWIntrinsic
  • hwintrinsiccodegenxarch.cpp: dispatch cases for the 4 new NIs alongside the existing VNNI/AVX512v3 group
  • lsraxarch.cpp: RMW handling for the 4 new NIs
  • lowerxarch.cpp: default-branch assert extended for the new NIs + AVXIFMA range

R2R + AOT metadata:

  • CorInfoInstructionSet.cs: nested-type lookup for AvxIfma (X64/V512/V512_X64) on X64 and X86 + LookupPlatformIntrinsicTypes for the AVX512v2 case + new InstructionSetInfo("avx512v2", "AvxIfma_V512", ...) entries
  • InstructionSetDesc.txt: adds AvxIfma_V512 alias to the Avx512Ifma row
  • InstructionSetHelpers.cs: adds avxifma_v512 to the optimistic set

Managed surface:

  • AvxIfma.cs + AvxIfma.PlatformNotSupported.cs (new files)
  • .Shared.projitems: includes both
  • System.Runtime.Intrinsics.cs (ref assembly): full API surface

Tests:

  • X86_Avx/AvxIfma/ — V128/V256 VEX form sample test
  • X86_Avx/AvxIfma_V512/ — V512 EVEX form sample test
  • NativeAOT smoke: uncomments AvxIfma / AvxIfma.X64 checks (were pre-approved stubs) and adds AvxIfma.V512 check

Validation

  • build.cmd clr.jit+clr.runtime+libs -c Release -arch x64 — 0 errors / 0 warnings
  • Local runtime verification via ifmaprobe / probe DLL swap deferred to CI: locally the packs pipeline is currently broken by a WiX installer failure, so the standard RestoreAdditionalProjectSources → Shipping approach doesn't refresh. Runtime verification will land via the CI test legs.

Compatibility note

The R2R ID Avx512Ifma = 76 retains its numeric value; existing R2R images continue to work. The InstructionSetDesc.txt row now records AvxIfma_V512 as the nested-type alias, matching the pattern PR #128365 used for Avx512VnniAvxVnni_V512.

🤖 Generated with Claude Code

Adds the managed AvxIfma class (V128/V256 VEX form) and nested AvxIfma.V512 (EVEX form) for VPMADD52LUQ / VPMADD52HUQ.

Structural changes only (no tests yet, no build validation yet):
- hwintrinsiclistxarch.h: new AVXIFMA range + AVX512v2 IFMA entries
- hwintrinsicxarch.cpp: V512VersionOfIsa AVXIFMA/AVX512v2 case, lookupInstructionSet Ifma dispatch
- hwintrinsiccodegenxarch.cpp: dispatch cases for MultiplyAdd52Low/High
- lsraxarch.cpp: RMW cases
- lowerxarch.cpp: default-branch assert extension
- CorInfoInstructionSet.cs: nested type lookup + LookupPlatformIntrinsicTypes
- InstructionSetDesc.txt: nested type alias AvxIfma_V512
- InstructionSetHelpers.cs: optimistic support
- projitems: include new files
- AvxIfma.cs + AvxIfma.PlatformNotSupported.cs: managed surface
- System.Runtime.Intrinsics.cs: ref assembly

Closes dotnet#98833, closes dotnet#96476.
Adds sample tests mirroring the AvxVnni.V512 test shape:
- X86_Avx/AvxIfma/ — V128/V256 VEX form
- X86_Avx/AvxIfma_V512/ — V512 EVEX form

NativeAOT smoke test: uncomments AvxIfma / AvxIfma.X64 checks (were pre-approved but not yet wired) and adds AvxIfma.V512 check. AvxIfma.V512 folds under AVX512v2 (Avx512Vbmi is a representative sibling).
Copilot AI review requested due to automatic review settings July 1, 2026 12:46
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 1, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Jul 1, 2026
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the managed System.Runtime.Intrinsics.X86.AvxIfma surface (VEX AVX-IFMA for Vector128/256<ulong>) plus a nested AvxIfma.V512 surface (EVEX AVX-512-IFMA for Vector512<ulong>), and wires the corresponding JIT/AOT instruction-set plumbing and sample tests.

Changes:

  • Add AvxIfma / AvxIfma.V512 public API surface (ref + CoreLib implementation + PNSE stubs) and include it in CoreLib build items.
  • Extend JIT HWIntrinsic tables and codegen/LSRA/lowering handling for VPMADD52LUQ/HUQ (VEX + EVEX forms).
  • Add JIT and NativeAOT smoke tests covering the new intrinsics.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/AvxIfma.cs Adds managed intrinsic declarations for AVX-IFMA + nested V512 surface.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/AvxIfma.PlatformNotSupported.cs Adds PNSE stubs for unsupported platforms/hardware.
src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems Wires new CoreLib source files into the build.
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs Adds public ref-assembly API surface for AvxIfma / AvxIfma.V512.
src/coreclr/jit/hwintrinsicxarch.cpp Updates ISA lookup / V512 ISA mapping for the new intrinsics.
src/coreclr/jit/hwintrinsiclistxarch.h Adds AVXIFMA and AVX512v2 intrinsic table entries for MultiplyAdd52*.
src/coreclr/jit/hwintrinsiccodegenxarch.cpp Adds codegen dispatch coverage for the new intrinsic IDs.
src/coreclr/jit/lsraxarch.cpp Adds LSRA handling for the new RMW-style 3-operand intrinsics.
src/coreclr/jit/lowerxarch.cpp Extends containment asserts/logic to include the new intrinsic IDs.
src/coreclr/tools/Common/JitInterface/CorInfoInstructionSet.cs Adds type/ISA mapping entries for nested AvxIfma forms and AVX512v2 aliasing.
src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt Adds AvxIfma_V512 alias mapping for the Avx512Ifma row.
src/coreclr/tools/Common/InstructionSetHelpers.cs Adds avxifma_v512 to the optimistic instruction-set set.
src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma/Program.AvxIfma.cs Adds test program stub/namespace for AVXIFMA tests.
src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma/AvxIfmaSampleTest.cs Adds V128/V256 semantic sample tests for MultiplyAdd52*.
src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma/*.csproj Adds project files to run the AVXIFMA JIT tests in r/ro variants.
src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma_V512/Program.AvxIfma_V512.cs Adds test program stub/namespace for AVXIFMA V512 tests.
src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma_V512/AvxIfma_V512SampleTest.cs Adds V512 semantic sample tests for MultiplyAdd52*.
src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxIfma_V512/*.csproj Adds project files to run the AVXIFMA V512 JIT tests in r/ro variants.
src/tests/nativeaot/SmokeTests/HardwareIntrinsics/Program.cs Enables NativeAOT smoke checks for AvxIfma and adds AvxIfma.V512.

Comment thread src/coreclr/jit/hwintrinsiclistxarch.h Outdated
Comment on lines +1099 to +1100
HARDWARE_INTRINSIC(AVX512v2, MultiplyAdd52Low, 64, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmadd52luq, INS_invalid, INS_invalid}, 5, -1, HW_Category_SimpleSIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(AVX512v2, MultiplyAdd52High, 64, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmadd52huq, INS_invalid, INS_invalid}, 5, -1, HW_Category_SimpleSIMD, HW_Flag_NoFlag)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — but the fix was actually the ordering of the HARDWARE_INTRINSIC entries, not the simdSize itself. binarySearchId uses strcmp (case-sensitive ASCII) to locate intrinsics within an ISA range, so entries must be alphabetically sorted. My initial ordering had MultiplyAdd52Low before MultiplyAdd52High, but 'H' (0x48) < 'L' (0x4C), so the search couldn't find High and ILC failed with "Code generation failed" on the NativeAOT builds.

Fixed in add8927e21d: swapped so High precedes Low in both the AVXIFMA range and the AVX512v2 range, and moved the pair to the correct alphabetical position between MultiShift and PermuteVar16x8.

On the fixed-simdSize=64 concern for AVX512v2 specifically: this mirrors the AvxVnni.V512 pattern in #128365 exactly (NI_AVX512v3_MultiplyWideningAndAdd is also simdSize=64). The class-name dispatch in lookupInstructionSet chooses AVXIFMA vs AVX512v2 based on compSupportsHWIntrinsic — for a call site using V128/V256 args, the class-name lookup returns AVXIFMA (V128/V256, simdSize=-1) whenever the AVXIFMA CPUID bit is available (which is true across all AOT targets in optimisticInstructionSetSupport).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the real root cause is different — the simdSize=64 mismatch is a theoretical issue but wasn't what fired here. The actual failure was a KeyNotFoundException in HardwareIntrinsicILProvider.GetMethodIL.

InstructionSetDesc.txt had AvxIfma in the R2R-name column (col 4) instead of the managed-name column (col 3), so the generated ArchitectureToValidInstructionSets emitted the avxifma entry with an empty ManagedName. HardwareIntrinsicILProvider skips empty ManagedNames when building its class-name→InstructionSet dictionary — so ILC's _instructionSetMap["AvxIfma"] threw:

System.Collections.Generic.KeyNotFoundException: The given key 'AvxIfma' was not present in the dictionary.
   at ILCompiler.HardwareIntrinsicILProvider.GetMethodIL(...) HardwareIntrinsicILProvider.cs:line 44

Fixed in ff40b908575 — moved AvxIfma from col 4 to col 3 in InstructionSetDesc.txt and updated the already-generated CorInfoInstructionSet.cs accordingly.

Comment on lines +10573 to +10592
public abstract class AvxIfma : System.Runtime.Intrinsics.X86.Avx2
{
internal AvxIfma() { }
public static new bool IsSupported { get { throw null; } }
public static System.Runtime.Intrinsics.Vector128<ulong> MultiplyAdd52Low(System.Runtime.Intrinsics.Vector128<ulong> addend, System.Runtime.Intrinsics.Vector128<ulong> left, System.Runtime.Intrinsics.Vector128<ulong> right) { throw null; }
public static System.Runtime.Intrinsics.Vector128<ulong> MultiplyAdd52High(System.Runtime.Intrinsics.Vector128<ulong> addend, System.Runtime.Intrinsics.Vector128<ulong> left, System.Runtime.Intrinsics.Vector128<ulong> right) { throw null; }
public static System.Runtime.Intrinsics.Vector256<ulong> MultiplyAdd52Low(System.Runtime.Intrinsics.Vector256<ulong> addend, System.Runtime.Intrinsics.Vector256<ulong> left, System.Runtime.Intrinsics.Vector256<ulong> right) { throw null; }
public static System.Runtime.Intrinsics.Vector256<ulong> MultiplyAdd52High(System.Runtime.Intrinsics.Vector256<ulong> addend, System.Runtime.Intrinsics.Vector256<ulong> left, System.Runtime.Intrinsics.Vector256<ulong> right) { throw null; }
public new abstract partial class X64 : System.Runtime.Intrinsics.X86.Avx2.X64
{
internal X64() { }
public static new bool IsSupported { get { throw null; } }
}
public abstract partial class V512
{
internal V512() { }
public static bool IsSupported { get { throw null; } }
public static System.Runtime.Intrinsics.Vector512<ulong> MultiplyAdd52Low(System.Runtime.Intrinsics.Vector512<ulong> addend, System.Runtime.Intrinsics.Vector512<ulong> left, System.Runtime.Intrinsics.Vector512<ulong> right) { throw null; }
public static System.Runtime.Intrinsics.Vector512<ulong> MultiplyAdd52High(System.Runtime.Intrinsics.Vector512<ulong> addend, System.Runtime.Intrinsics.Vector512<ulong> left, System.Runtime.Intrinsics.Vector512<ulong> right) { throw null; }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tanner explicitly authorised the AvxIfma.V512 nested-class shape (as opposed to a separate Avx512Ifma type) in his 2026-06-30 reply on #98833:

"@jamesburton, feel free to pick this up. #96476 should be done simultaneously and should have the same fixup we've had approved and done for the other ISAs, where its a nested V512 class."

The nested-V512 pattern mirrors the AvxVnni.V512 restructure that landed in #128365 (which itself replaced an original Avx512Vnni proposal on the same grounds). @saucecontrol also pre-proposed this exact shape on #96476 on 2024-11-06. So the shape carries prior consensus even though the individual comment approving the original Avx512Ifma still exists in the issue history.

…tical

binarySearchId uses strcmp (case-sensitive ASCII order) to find intrinsics within an ISA range. My initial ordering put MultiplyAdd52Low before MultiplyAdd52High, but 'H' (0x48) < 'L' (0x4C), so the binary search would fail on High. Same issue for both AVXIFMA and AVX512v2 ranges. Also reorders MultiplyAdd52*/PermuteVar* so Multiply* precedes PermuteVar* (M < P).

This surfaced as ILC 'Code generation failed' on the AvxIfma_handwritten_r NativeAOT tests.
The LookupPlatformIntrinsicTypes switch was missing cases for InstructionSet.X64_AVXIFMA/AVXIFMA_X64 and X86_AVXIFMA. When the AOT compiler tried to resolve AvxIfma against an AVXIFMA-only target (e.g. NativeAOT smoke tests X64Baseline / X64Avx2), the type wasn't found and ILC failed with 'Code generation failed for AvxIfma.get_IsSupported()'. Mirrors the existing X64_AVXVNNI/X86_AVXVNNI cases.
Copilot AI review requested due to automatic review settings July 1, 2026 15:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

Comment on lines +1093 to +1094
HARDWARE_INTRINSIC(AVX512v2, MultiplyAdd52High, 64, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmadd52huq, INS_invalid, INS_invalid}, 5, -1, HW_Category_SimpleSIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(AVX512v2, MultiplyAdd52Low, 64, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmadd52luq, INS_invalid, INS_invalid}, 5, -1, HW_Category_SimpleSIMD, HW_Flag_NoFlag)
InstructionSetDesc.txt had AvxIfma in the R2R-name column (col 4) instead of the managed-name column (col 3), so the generated ArchitectureToValidInstructionSets emitted 'avxifma' with an empty ManagedName. HardwareIntrinsicILProvider builds its class-name -> InstructionSet dictionary from ArchitectureToValidInstructionSets, skipping entries with empty ManagedName, so lookups for 'AvxIfma' threw KeyNotFoundException at ILC time.

Fixed the .txt source-of-truth (so any regeneration is consistent) and the already-generated CorInfoInstructionSet.cs to set ManagedName='AvxIfma'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: : AVX-IFMA Intrinsics [API Proposal]: AVX-512 IFMA Intrinsics

2 participants