Releases · riscv/integrated-matrix-extension

7 min read Original article ↗

Release riscv-isa-release-fa55752-2026-05-04

Release riscv-isa-release-eb2e37e-2026-05-03

Release riscv-isa-release-d694183-2026-05-03

Release riscv-isa-release-01076b3-2026-05-03

Release riscv-isa-release-e9ce69a-2026-05-02

Release riscv-isa-release-ca78f49-2026-05-01

Version for committee approval vote

Arithmetic model

  • Floating-point multiply-accumulate semantics defined by three implementation-defined parameters (G, psm, rnd),
    disclosed per (SEW, W, lambda).
    • G is the number of sub-dot-products combined into a partial sum.
    • psm selects between exact internal computation and a bulk-normalized (RVBNA) representation.
    • rnd controls an optional round-to-odd step before accumulation into C.
    • Final accumulation into C uses the dynamic rounding mode from frm; exception flags accumulate into fflags.

Interrupt and execution model

  • vstart = 0 required for all 11 GEMM multiply-accumulate instructions (illegal-instruction if non-zero). Tile
    load/store instructions continue to honor vstart through their body loops.

Dimensional consistency

  • N_tile divisor unified to lambda × LMUL across all integer and floating-point GEMM instructions.
  • MUL_C requirements tightened: lambda^2 must divide VLEN/SEW, and the C register group index must be a
    multiple of MUL_C.

Terminology

  • "Widening" (arithmetic) and "packing" (storage layout) distinguished consistently, with both coupled through
    parameter W.

Format support

  • Mixed-format floating-point inputs permitted on the non-widening vfmmacc.vv (E4M3 × E5M2, FP16 × BF16); prior
    restriction removed.
  • Mixed OFP8 inputs permitted on all four floating-point multiply-accumulate instructions (vfmmacc.vv,
    vfwmmacc.vv, vfqmmacc.vv, vf8wmmacc.vv).

Editorial

  • vsetivli mentioned alongside vsetvli where altfmt_A/altfmt_B/bs sit outside the immediate field.
  • Assorted normative cross-references added; consistent use of "non-widening"/"widening" throughout.
  • SAIL tile load/store: LD declared mutable (var) since it is reassigned within the body.

Release riscv-isa-release-44e4b76-2026-04-13

Zvvm specification for Internal Review #2 (Mar 30)

This release was created by: ptomsich
Release of RISC-V ISA, built from commit c37ab37, is now available.

IME Specification Release Notes

Changes since 2026-03-09

New instructions

  • v8wmmacc.vv (funct6=0x3b, OPIVV): Integer 8× widening matrix multiply-accumulate. Inputs at SEW/8, accumulator at SEW. Only SEW ∈ {32, 64} is valid. Fills the encoding gap for Zvvi4i32mm (Int4→Int32) and Zvvi8i64mm (Int8→Int64).

  • vf8wmmacc.vv (funct6=0x17, OPFVV): Floating-point 8× widening matrix multiply-accumulate. Supports microscaling via vm=0 with E8M0 block scales from v0. Fills the gap for Zvvofp4fp32mm (OFP4→FP32) and Zvvofp8fp64mm (OFP8→FP64).

  • vf8wimmacc.vv (funct6=0x3b, OPIVV, vm=0): Integer-input 8× widening FP-accumulate with E8M0 microscaling. Shares its opcode with v8wmmacc.vv; vm=0 selects the MX form. Supports Zvvxi4fp32mm (MXINT4→FP32) and Zvvxi8fp64mm (MXINT8→FP64).

Normative changes

  • vtype field positions finalised: altfmt_A moved tovtype[XLEN-5], altfmt_B to vtype[XLEN-6], bs to vtype[XLEN-7] — immediately below lambda[2:0]. All three fields are now outside the vsetvli immediate range and require vsetvl or vsetivli to configure. The provisional editorial notes about future relocation have been removed.

  • Subextension dependencies relaxed: The blanket Zve64ddependency has been replaced with the minimum Zve subset per subextension. Integer-only subextensions with accumulators ≤ 32-bit now depend on Zve32x; 64-bit integer accumulators on Zve64x; FP subextensions with accumulators ≤ 32-bit on Zve32f. Only FP64-accumulator subextensions retain Zve64d.

  • Round-to-odd for partial sums: The accumulation rounding model now specifies that the optional rounding of the partial sum S must use round-to-odd (RTO) mode: C ← roundfrm(C + roundrto(S)). The final accumulation into C continues to use the dynamic rounding mode from frm.

  • vfmmacc.vv microscaling removed: Microscaling (vm=0) is no longer supported for non-widening FP multiply-accumulate. vm=0 is reserved for vfmmacc.vv (W=1). The contradictory "When vm=0" exception clauses and the dead microscaling SAIL code have been removed.

  • λ terminology clarified: λ is now described as a "tile-layout parameter" rather than "the K dimension." Two occurrences of the conflation have been corrected; K_eff = λ × W × LMUL is consistently presented as the derived effective K dimension.

  • MXINT4 defined: MXINT4 (E8M0-scaled signed 4-bit integer) is now explicitly defined by this specification as analogous to OCP MX's MXINT8 but with 4-bit elements. Corresponding MX subextensions Zvvxi4fp32mm and Zvvxni4fp32mm have been added to the microscaling subextension table.

  • OCP MX citation: Added a proper normative reference to the OCP Microscaling Formats (MX) v1.0 Specification with URL.

  • Zvvmttls separated: Transposing tile load/store instructions (vmttl.v, vmtts.v) are now in their own sub-extension Zvvmttls, separate from the order-preserving Zvvmtls.

  • Missing subextension restored: Zvvofp8fp32mm (OFP8 × OFP8 → FP32) added to the computational subextension table.

Encoding map updates

  • Added FP encoding map entries for vf8wmmacc.vv (W=8): SEW=32 (OFP4→FP32) and SEW=64 (OFP8→FP64), with all altfmt combinations and microscaling columns.
  • Added integer encoding map entries for v8wmmacc.vv (W=8): SEW=32 (Int4→Int32) and SEW=64 (Int8→Int64).
  • Added integer MX encoding map entries for vf8wimmacc.vv (W=8, vm=0): SEW=32 (MXINT4→FP32) and SEW=64 (MXINT8→FP64).
    ...

Read more

Zvvm release for Internal Review (IME TG)

Release Notes — Zvvm Family of Integrated Matrix Extensions

Scope: All changes to src/integrated-matrix.adoc since the repository
baseline at f83dff03.


New specification: Zvvm Family of Integrated Matrix Extensions

This release introduces the complete specification for the Zvvm family of
Integrated Matrix Extensions
, a set of RISC-V ISA extensions that add matrix
multiply-accumulate instructions operating entirely within the standard V
register file, with tile geometry derived algebraically from existing vtype
fields (VLEN, SEW, LMUL) and a new aspect-ratio field λ.


Specification content

Core integer multiply-accumulate (Zvvmm)

  • vmmacc.vv (W=1), vwmmacc.vv (W=2), vqwmmacc.vv (W=4) covering the
    full integer subextension family from Zvvmmi4b (Int4×Int4→Int8) through
    Zvvmmd (Int64×Int64→Int64).
  • Independent altfmt_A / altfmt_B fields in vtype select signed or
    unsigned interpretation of each input independently; the accumulator is
    always signed.
  • Vector masking (vm=0) is explicitly reserved for the integer family; all
    integer multiply-accumulate instructions require vm=1.
  • LMUL is restricted to integer values (1, 2, 4, 8); fractional LMUL is not
    permitted.

Core floating-point multiply-accumulate (Zvvfmm)

  • vfmmacc.vv (W=1), vfwmmacc.vv (W=2), vfqwmmacc.vv (W=4) covering
    OFP4, OFP8, FP16, BF16, FP32, and FP64 input/accumulator combinations.
  • Mixed-format inputs (altfmt_A ≠ altfmt_B) are permitted for widening
    instructions; vfmmacc.vv with mixed FP16/BF16 inputs raises an
    illegal-instruction exception.
  • Mixed OFP8 inputs (E4M3 × E5M2) are permitted only with widening
    instructions.

Tile load/store (Zvvmtls)

  • vmtl.v (row- or column-major tile load), vmttl.v (transposing tile
    load), vmts.v (tile store) with leading-dimension stride operand.
  • Tail and mask policy aligned with base V; partial-VL semantics specified for
    embedded streaming use cases.

Microscaling support (Zvvfmm MX variants)

  • vm=0 on FP multiply-accumulate opcodes selects microscaling decode: E8M0 block scale factors in v0, block size selected by bs field in vtype (32 or 16 elements), input formats selected by altfmt_A/altfmt_B.
  • vfwimmacc.vv and vfqwimmacc.vv: integer (MXINT) inputs with FP accumulator, microscaling only; altfmt_A=1 and altfmt_B=1 are reserved (MXINT inputs are always signed).
  • Full subextension coverage: MXFP4, MXFP8, MXINT8, MXINT4 with BS=32 (Zvvfmmmx*) and BS=16 (Zvvfmmnx*) variants.
  • Scale data layout in v0 is tile-strided with row stride R = λ × SEW ÷ 16; M × R = VLEN ÷ 16 always fills exactly one register.

W=8 octal-widening carve-out

  • Subextension names defined for future W=8 instructions: Zvvmmbd (Int8→Int64), Zvvfmmofp4f (OFP4→FP32), Zvvfmmofp8d` (OFP8→FP64), and MX counterparts for MXFP4→FP32, MXFP8→FP64, and MXINT8→FP64.
  • Encoding space reserved: funct6 = 0x3b in OPIVV (following vmmacc/vwmmacc/vqwmmacc at 0x38–0x3a) and funct6 = 0x17 in OPFVV (following vfmmacc/vfwmmacc/vfqwmmacc at 0x14–0x16).

Supporting material

  • C-language intrinsics: Proto-intrinsic API covering all (SEW, λ, LMUL) combinations for integer, widening, FP, and microscaling variants, using vtype.h to expose the full vtype control surface without requiring compiler support.
  • GEMM pseudocode: Illustrative tiling loop for both FP and integer GEMM using vmtl.v / vfmmacc.vv / vmts.v.
  • Arithmetic considerations: Analysis of rounding, exception flag accumulation, and precision requirements for widening and microscaling paths.
  • Figures: Diagrams for tile load/store geometry (row-major vs. column-major, partial-VL), microscaling v0 layout, and lane-local scale alignment.

Encoding

Instruction funct6 funct3 vm
vmmacc.vv 0x38 OPIVV 1
vwmmacc.vv / vfwimmacc.vv 0x39 OPIVV 1 / 0
vqwmmacc.vv / vfqwimmacc.vv 0x3a OPIVV 1 / 0
(W=8 reserved) 0x3b OPIVV
vfmmacc.vv 0x14 OPFVV 1 (vm=0: MX)
vfwmmacc.vv 0x15 OPFVV 1 (vm=0: MX)
vfqwmmacc.vv 0x16 OPFVV 1 (vm=0: MX)
(W=8 reserved) 0x17 OPFVV