Releases · riscv/integrated-matrix-extension

Release riscv-isa-release-fa55752-2026-05-04

Release riscv-isa-release-eb2e37e-2026-05-03

Release riscv-isa-release-d694183-2026-05-03

Release riscv-isa-release-01076b3-2026-05-03

Release riscv-isa-release-e9ce69a-2026-05-02

Release riscv-isa-release-ca78f49-2026-05-01

Version for committee approval vote

Arithmetic model

Floating-point multiply-accumulate semantics defined by three implementation-defined parameters (G, psm, rnd),
disclosed per (SEW, W, lambda).
- G is the number of sub-dot-products combined into a partial sum.
- psm selects between exact internal computation and a bulk-normalized (RVBNA) representation.
- rnd controls an optional round-to-odd step before accumulation into C.
- Final accumulation into C uses the dynamic rounding mode from frm; exception flags accumulate into fflags.

Interrupt and execution model

vstart = 0 required for all 11 GEMM multiply-accumulate instructions (illegal-instruction if non-zero). Tile
load/store instructions continue to honor vstart through their body loops.

Dimensional consistency

N_tile divisor unified to lambda × LMUL across all integer and floating-point GEMM instructions.
MUL_C requirements tightened: lambda^2 must divide VLEN/SEW, and the C register group index must be a
multiple of MUL_C.

Terminology

"Widening" (arithmetic) and "packing" (storage layout) distinguished consistently, with both coupled through
parameter W.

Format support

Mixed-format floating-point inputs permitted on the non-widening vfmmacc.vv (E4M3 × E5M2, FP16 × BF16); prior
restriction removed.
Mixed OFP8 inputs permitted on all four floating-point multiply-accumulate instructions (vfmmacc.vv,
vfwmmacc.vv, vfqmmacc.vv, vf8wmmacc.vv).

Editorial

vsetivli mentioned alongside vsetvli where altfmt_A/altfmt_B/bs sit outside the immediate field.
Assorted normative cross-references added; consistent use of "non-widening"/"widening" throughout.
SAIL tile load/store: LD declared mutable (var) since it is reassigned within the body.

Release riscv-isa-release-44e4b76-2026-04-13

Zvvm specification for Internal Review #2 (Mar 30)

This release was created by: ptomsich
Release of RISC-V ISA, built from commit c37ab37, is now available.

IME Specification Release Notes

Changes since 2026-03-09

New instructions

v8wmmacc.vv (funct6=0x3b, OPIVV): Integer 8× widening matrix multiply-accumulate. Inputs at SEW/8, accumulator at SEW. Only SEW ∈ {32, 64} is valid. Fills the encoding gap for Zvvi4i32mm (Int4→Int32) and Zvvi8i64mm (Int8→Int64).
vf8wmmacc.vv (funct6=0x17, OPFVV): Floating-point 8× widening matrix multiply-accumulate. Supports microscaling via vm=0 with E8M0 block scales from v0. Fills the gap for Zvvofp4fp32mm (OFP4→FP32) and Zvvofp8fp64mm (OFP8→FP64).
vf8wimmacc.vv (funct6=0x3b, OPIVV, vm=0): Integer-input 8× widening FP-accumulate with E8M0 microscaling. Shares its opcode with v8wmmacc.vv; vm=0 selects the MX form. Supports Zvvxi4fp32mm (MXINT4→FP32) and Zvvxi8fp64mm (MXINT8→FP64).

Normative changes

vtype field positions finalised: altfmt_A moved tovtype[XLEN-5], altfmt_B to vtype[XLEN-6], bs to vtype[XLEN-7] — immediately below lambda[2:0]. All three fields are now outside the vsetvli immediate range and require vsetvl or vsetivli to configure. The provisional editorial notes about future relocation have been removed.
Subextension dependencies relaxed: The blanket Zve64ddependency has been replaced with the minimum Zve subset per subextension. Integer-only subextensions with accumulators ≤ 32-bit now depend on Zve32x; 64-bit integer accumulators on Zve64x; FP subextensions with accumulators ≤ 32-bit on Zve32f. Only FP64-accumulator subextensions retain Zve64d.
Round-to-odd for partial sums: The accumulation rounding model now specifies that the optional rounding of the partial sum S must use round-to-odd (RTO) mode: C ← round_frm(C + round_rto(S)). The final accumulation into C continues to use the dynamic rounding mode from frm.
vfmmacc.vv microscaling removed: Microscaling (vm=0) is no longer supported for non-widening FP multiply-accumulate. vm=0 is reserved for vfmmacc.vv (W=1). The contradictory "When vm=0" exception clauses and the dead microscaling SAIL code have been removed.
λ terminology clarified: λ is now described as a "tile-layout parameter" rather than "the K dimension." Two occurrences of the conflation have been corrected; K_eff = λ × W × LMUL is consistently presented as the derived effective K dimension.
MXINT4 defined: MXINT4 (E8M0-scaled signed 4-bit integer) is now explicitly defined by this specification as analogous to OCP MX's MXINT8 but with 4-bit elements. Corresponding MX subextensions Zvvxi4fp32mm and Zvvxni4fp32mm have been added to the microscaling subextension table.
OCP MX citation: Added a proper normative reference to the OCP Microscaling Formats (MX) v1.0 Specification with URL.
Zvvmttls separated: Transposing tile load/store instructions (vmttl.v, vmtts.v) are now in their own sub-extension Zvvmttls, separate from the order-preserving Zvvmtls.
Missing subextension restored: Zvvofp8fp32mm (OFP8 × OFP8 → FP32) added to the computational subextension table.

Encoding map updates

Added FP encoding map entries for vf8wmmacc.vv (W=8): SEW=32 (OFP4→FP32) and SEW=64 (OFP8→FP64), with all altfmt combinations and microscaling columns.
Added integer encoding map entries for v8wmmacc.vv (W=8): SEW=32 (Int4→Int32) and SEW=64 (Int8→Int64).
Added integer MX encoding map entries for vf8wimmacc.vv (W=8, vm=0): SEW=32 (MXINT4→FP32) and SEW=64 (MXINT8→FP64).
...

Release Notes — Zvvm Family of Integrated Matrix Extensions

Scope: All changes to src/integrated-matrix.adoc since the repository
baseline at f83dff03.

New specification: Zvvm Family of Integrated Matrix Extensions

This release introduces the complete specification for the Zvvm family of
Integrated Matrix Extensions, a set of RISC-V ISA extensions that add matrix
multiply-accumulate instructions operating entirely within the standard V
register file, with tile geometry derived algebraically from existing vtype
fields (VLEN, SEW, LMUL) and a new aspect-ratio field λ.

Specification content

Core integer multiply-accumulate (Zvvmm)

vmmacc.vv (W=1), vwmmacc.vv (W=2), vqwmmacc.vv (W=4) covering the
full integer subextension family from Zvvmmi4b (Int4×Int4→Int8) through
Zvvmmd (Int64×Int64→Int64).
Independent altfmt_A / altfmt_B fields in vtype select signed or
unsigned interpretation of each input independently; the accumulator is
always signed.
Vector masking (vm=0) is explicitly reserved for the integer family; all
integer multiply-accumulate instructions require vm=1.
LMUL is restricted to integer values (1, 2, 4, 8); fractional LMUL is not
permitted.

Core floating-point multiply-accumulate (Zvvfmm)

vfmmacc.vv (W=1), vfwmmacc.vv (W=2), vfqwmmacc.vv (W=4) covering
OFP4, OFP8, FP16, BF16, FP32, and FP64 input/accumulator combinations.
Mixed-format inputs (altfmt_A ≠ altfmt_B) are permitted for widening
instructions; vfmmacc.vv with mixed FP16/BF16 inputs raises an
illegal-instruction exception.
Mixed OFP8 inputs (E4M3 × E5M2) are permitted only with widening
instructions.

Tile load/store (Zvvmtls)

vmtl.v (row- or column-major tile load), vmttl.v (transposing tile
load), vmts.v (tile store) with leading-dimension stride operand.
Tail and mask policy aligned with base V; partial-VL semantics specified for
embedded streaming use cases.

Microscaling support (Zvvfmm MX variants)

vm=0 on FP multiply-accumulate opcodes selects microscaling decode: E8M0 block scale factors in v0, block size selected by bs field in vtype (32 or 16 elements), input formats selected by altfmt_A/altfmt_B.
vfwimmacc.vv and vfqwimmacc.vv: integer (MXINT) inputs with FP accumulator, microscaling only; altfmt_A=1 and altfmt_B=1 are reserved (MXINT inputs are always signed).
Full subextension coverage: MXFP4, MXFP8, MXINT8, MXINT4 with BS=32 (Zvvfmmmx*) and BS=16 (Zvvfmmnx*) variants.
Scale data layout in v0 is tile-strided with row stride R = λ × SEW ÷ 16; M × R = VLEN ÷ 16 always fills exactly one register.

W=8 octal-widening carve-out

Subextension names defined for future W=8 instructions: Zvvmmbd (Int8→Int64), Zvvfmmofp4f (OFP4→FP32), Zvvfmmofp8d` (OFP8→FP64), and MX counterparts for MXFP4→FP32, MXFP8→FP64, and MXINT8→FP64.
Encoding space reserved: funct6 = 0x3b in OPIVV (following vmmacc/vwmmacc/vqwmmacc at 0x38–0x3a) and funct6 = 0x17 in OPFVV (following vfmmacc/vfwmmacc/vfqwmmacc at 0x14–0x16).

Supporting material

C-language intrinsics: Proto-intrinsic API covering all (SEW, λ, LMUL) combinations for integer, widening, FP, and microscaling variants, using vtype.h to expose the full vtype control surface without requiring compiler support.
GEMM pseudocode: Illustrative tiling loop for both FP and integer GEMM using vmtl.v / vfmmacc.vv / vmts.v.
Arithmetic considerations: Analysis of rounding, exception flag accumulation, and precision requirements for widening and microscaling paths.
Figures: Diagrams for tile load/store geometry (row-major vs. column-major, partial-VL), microscaling v0 layout, and lane-local scale alignment.

Encoding

Instruction	funct6	funct3	vm
`vmmacc.vv`	0x38	OPIVV	1
`vwmmacc.vv` / `vfwimmacc.vv`	0x39	OPIVV	1 / 0
`vqwmmacc.vv` / `vfqwimmacc.vv`	0x3a	OPIVV	1 / 0
(W=8 reserved)	0x3b	OPIVV	—
`vfmmacc.vv`	0x14	OPFVV	1 (vm=0: MX)
`vfwmmacc.vv`	0x15	OPFVV	1 (vm=0: MX)
`vfqwmmacc.vv`	0x16	OPFVV	1 (vm=0: MX)
(W=8 reserved)	0x17	OPFVV	—