Releases · riscv/integrated-matrix-extension

3 min read Original article ↗

Zvvm release for Internal Review (IME TG)

Release Notes — Zvvm Family of Integrated Matrix Extensions

Scope: All changes to src/integrated-matrix.adoc since the repository
baseline at f83dff03.


New specification: Zvvm Family of Integrated Matrix Extensions

This release introduces the complete specification for the Zvvm family of
Integrated Matrix Extensions
, a set of RISC-V ISA extensions that add matrix
multiply-accumulate instructions operating entirely within the standard V
register file, with tile geometry derived algebraically from existing vtype
fields (VLEN, SEW, LMUL) and a new aspect-ratio field λ.


Specification content

Core integer multiply-accumulate (Zvvmm)

  • vmmacc.vv (W=1), vwmmacc.vv (W=2), vqwmmacc.vv (W=4) covering the
    full integer subextension family from Zvvmmi4b (Int4×Int4→Int8) through
    Zvvmmd (Int64×Int64→Int64).
  • Independent altfmt_A / altfmt_B fields in vtype select signed or
    unsigned interpretation of each input independently; the accumulator is
    always signed.
  • Vector masking (vm=0) is explicitly reserved for the integer family; all
    integer multiply-accumulate instructions require vm=1.
  • LMUL is restricted to integer values (1, 2, 4, 8); fractional LMUL is not
    permitted.

Core floating-point multiply-accumulate (Zvvfmm)

  • vfmmacc.vv (W=1), vfwmmacc.vv (W=2), vfqwmmacc.vv (W=4) covering
    OFP4, OFP8, FP16, BF16, FP32, and FP64 input/accumulator combinations.
  • Mixed-format inputs (altfmt_A ≠ altfmt_B) are permitted for widening
    instructions; vfmmacc.vv with mixed FP16/BF16 inputs raises an
    illegal-instruction exception.
  • Mixed OFP8 inputs (E4M3 × E5M2) are permitted only with widening
    instructions.

Tile load/store (Zvvmtls)

  • vmtl.v (row- or column-major tile load), vmttl.v (transposing tile
    load), vmts.v (tile store) with leading-dimension stride operand.
  • Tail and mask policy aligned with base V; partial-VL semantics specified for
    embedded streaming use cases.

Microscaling support (Zvvfmm MX variants)

  • vm=0 on FP multiply-accumulate opcodes selects microscaling decode: E8M0 block scale factors in v0, block size selected by bs field in vtype (32 or 16 elements), input formats selected by altfmt_A/altfmt_B.
  • vfwimmacc.vv and vfqwimmacc.vv: integer (MXINT) inputs with FP accumulator, microscaling only; altfmt_A=1 and altfmt_B=1 are reserved (MXINT inputs are always signed).
  • Full subextension coverage: MXFP4, MXFP8, MXINT8, MXINT4 with BS=32 (Zvvfmmmx*) and BS=16 (Zvvfmmnx*) variants.
  • Scale data layout in v0 is tile-strided with row stride R = λ × SEW ÷ 16; M × R = VLEN ÷ 16 always fills exactly one register.

W=8 octal-widening carve-out

  • Subextension names defined for future W=8 instructions: Zvvmmbd (Int8→Int64), Zvvfmmofp4f (OFP4→FP32), Zvvfmmofp8d` (OFP8→FP64), and MX counterparts for MXFP4→FP32, MXFP8→FP64, and MXINT8→FP64.
  • Encoding space reserved: funct6 = 0x3b in OPIVV (following vmmacc/vwmmacc/vqwmmacc at 0x38–0x3a) and funct6 = 0x17 in OPFVV (following vfmmacc/vfwmmacc/vfqwmmacc at 0x14–0x16).

Supporting material

  • C-language intrinsics: Proto-intrinsic API covering all (SEW, λ, LMUL) combinations for integer, widening, FP, and microscaling variants, using vtype.h to expose the full vtype control surface without requiring compiler support.
  • GEMM pseudocode: Illustrative tiling loop for both FP and integer GEMM using vmtl.v / vfmmacc.vv / vmts.v.
  • Arithmetic considerations: Analysis of rounding, exception flag accumulation, and precision requirements for widening and microscaling paths.
  • Figures: Diagrams for tile load/store geometry (row-major vs. column-major, partial-VL), microscaling v0 layout, and lane-local scale alignment.

Encoding

Instruction funct6 funct3 vm
vmmacc.vv 0x38 OPIVV 1
vwmmacc.vv / vfwimmacc.vv 0x39 OPIVV 1 / 0
vqwmmacc.vv / vfqwimmacc.vv 0x3a OPIVV 1 / 0
(W=8 reserved) 0x3b OPIVV
vfmmacc.vv 0x14 OPFVV 1 (vm=0: MX)
vfwmmacc.vv 0x15 OPFVV 1 (vm=0: MX)
vfqwmmacc.vv 0x16 OPFVV 1 (vm=0: MX)
(W=8 reserved) 0x17 OPFVV