Zvvm release for Internal Review (IME TG)
Release Notes — Zvvm Family of Integrated Matrix Extensions
Scope: All changes to src/integrated-matrix.adoc since the repository
baseline at f83dff03.
New specification: Zvvm Family of Integrated Matrix Extensions
This release introduces the complete specification for the Zvvm family of
Integrated Matrix Extensions, a set of RISC-V ISA extensions that add matrix
multiply-accumulate instructions operating entirely within the standard V
register file, with tile geometry derived algebraically from existing vtype
fields (VLEN, SEW, LMUL) and a new aspect-ratio field λ.
Specification content
Core integer multiply-accumulate (Zvvmm)
vmmacc.vv(W=1),vwmmacc.vv(W=2),vqwmmacc.vv(W=4) covering the
full integer subextension family fromZvvmmi4b(Int4×Int4→Int8) through
Zvvmmd(Int64×Int64→Int64).- Independent
altfmt_A/altfmt_Bfields invtypeselect signed or
unsigned interpretation of each input independently; the accumulator is
always signed. - Vector masking (
vm=0) is explicitly reserved for the integer family; all
integer multiply-accumulate instructions requirevm=1. - LMUL is restricted to integer values (1, 2, 4, 8); fractional LMUL is not
permitted.
Core floating-point multiply-accumulate (Zvvfmm)
vfmmacc.vv(W=1),vfwmmacc.vv(W=2),vfqwmmacc.vv(W=4) covering
OFP4, OFP8, FP16, BF16, FP32, and FP64 input/accumulator combinations.- Mixed-format inputs (
altfmt_A ≠ altfmt_B) are permitted for widening
instructions;vfmmacc.vvwith mixed FP16/BF16 inputs raises an
illegal-instruction exception. - Mixed OFP8 inputs (E4M3 × E5M2) are permitted only with widening
instructions.
Tile load/store (Zvvmtls)
vmtl.v(row- or column-major tile load),vmttl.v(transposing tile
load),vmts.v(tile store) with leading-dimension stride operand.- Tail and mask policy aligned with base V; partial-VL semantics specified for
embedded streaming use cases.
Microscaling support (Zvvfmm MX variants)
vm=0on FP multiply-accumulate opcodes selects microscaling decode: E8M0 block scale factors inv0, block size selected bybsfield invtype(32 or 16 elements), input formats selected byaltfmt_A/altfmt_B.vfwimmacc.vvandvfqwimmacc.vv: integer (MXINT) inputs with FP accumulator, microscaling only;altfmt_A=1andaltfmt_B=1are reserved (MXINT inputs are always signed).- Full subextension coverage: MXFP4, MXFP8, MXINT8, MXINT4 with BS=32 (
Zvvfmmmx*) and BS=16 (Zvvfmmnx*) variants. - Scale data layout in
v0is tile-strided with row stride R = λ × SEW ÷ 16; M × R = VLEN ÷ 16 always fills exactly one register.
W=8 octal-widening carve-out
- Subextension names defined for future W=8 instructions:
Zvvmmbd(Int8→Int64),Zvvfmmofp4f(OFP4→FP32), Zvvfmmofp8d` (OFP8→FP64), and MX counterparts for MXFP4→FP32, MXFP8→FP64, and MXINT8→FP64. - Encoding space reserved: funct6 = 0x3b in OPIVV (following
vmmacc/vwmmacc/vqwmmaccat 0x38–0x3a) and funct6 = 0x17 in OPFVV (followingvfmmacc/vfwmmacc/vfqwmmaccat 0x14–0x16).
Supporting material
- C-language intrinsics: Proto-intrinsic API covering all (SEW, λ, LMUL) combinations for integer, widening, FP, and microscaling variants, using
vtype.hto expose the fullvtypecontrol surface without requiring compiler support. - GEMM pseudocode: Illustrative tiling loop for both FP and integer GEMM using
vmtl.v/vfmmacc.vv/vmts.v. - Arithmetic considerations: Analysis of rounding, exception flag accumulation, and precision requirements for widening and microscaling paths.
- Figures: Diagrams for tile load/store geometry (row-major vs. column-major, partial-VL), microscaling
v0layout, and lane-local scale alignment.
Encoding
| Instruction | funct6 | funct3 | vm |
|---|---|---|---|
vmmacc.vv |
0x38 | OPIVV | 1 |
vwmmacc.vv / vfwimmacc.vv |
0x39 | OPIVV | 1 / 0 |
vqwmmacc.vv / vfqwimmacc.vv |
0x3a | OPIVV | 1 / 0 |
| (W=8 reserved) | 0x3b | OPIVV | — |
vfmmacc.vv |
0x14 | OPFVV | 1 (vm=0: MX) |
vfwmmacc.vv |
0x15 | OPFVV | 1 (vm=0: MX) |
vfqwmmacc.vv |
0x16 | OPFVV | 1 (vm=0: MX) |
| (W=8 reserved) | 0x17 | OPFVV | — |