Release riscv-isa-release-fa55752-2026-05-04
Release riscv-isa-release-eb2e37e-2026-05-03
Release riscv-isa-release-d694183-2026-05-03
Release riscv-isa-release-01076b3-2026-05-03
Release riscv-isa-release-e9ce69a-2026-05-02
Release riscv-isa-release-ca78f49-2026-05-01
Version for committee approval vote
Arithmetic model
- Floating-point multiply-accumulate semantics defined by three implementation-defined parameters
(G, psm, rnd),
disclosed per(SEW, W, lambda).Gis the number of sub-dot-products combined into a partial sum.psmselects between exact internal computation and a bulk-normalized (RVBNA) representation.rndcontrols an optional round-to-odd step before accumulation into C.- Final accumulation into C uses the dynamic rounding mode from
frm; exception flags accumulate intofflags.
Interrupt and execution model
vstart = 0required for all 11 GEMM multiply-accumulate instructions (illegal-instruction if non-zero). Tile
load/store instructions continue to honorvstartthrough their body loops.
Dimensional consistency
N_tiledivisor unified tolambda × LMULacross all integer and floating-point GEMM instructions.MUL_Crequirements tightened:lambda^2must divideVLEN/SEW, and the C register group index must be a
multiple ofMUL_C.
Terminology
- "Widening" (arithmetic) and "packing" (storage layout) distinguished consistently, with both coupled through
parameterW.
Format support
- Mixed-format floating-point inputs permitted on the non-widening
vfmmacc.vv(E4M3 × E5M2, FP16 × BF16); prior
restriction removed. - Mixed OFP8 inputs permitted on all four floating-point multiply-accumulate instructions (
vfmmacc.vv,
vfwmmacc.vv,vfqmmacc.vv,vf8wmmacc.vv).
Editorial
vsetivlimentioned alongsidevsetvliwherealtfmt_A/altfmt_B/bssit outside the immediate field.- Assorted normative cross-references added; consistent use of "non-widening"/"widening" throughout.
- SAIL tile load/store:
LDdeclared mutable (var) since it is reassigned within the body.
Release riscv-isa-release-44e4b76-2026-04-13
Zvvm specification for Internal Review #2 (Mar 30)
This release was created by: ptomsich
Release of RISC-V ISA, built from commit c37ab37, is now available.
IME Specification Release Notes
Changes since 2026-03-09
New instructions
-
v8wmmacc.vv(funct6=0x3b, OPIVV): Integer 8× widening matrix multiply-accumulate. Inputs at SEW/8, accumulator at SEW. Only SEW ∈ {32, 64} is valid. Fills the encoding gap for Zvvi4i32mm (Int4→Int32) and Zvvi8i64mm (Int8→Int64). -
vf8wmmacc.vv(funct6=0x17, OPFVV): Floating-point 8× widening matrix multiply-accumulate. Supports microscaling viavm=0with E8M0 block scales fromv0. Fills the gap for Zvvofp4fp32mm (OFP4→FP32) and Zvvofp8fp64mm (OFP8→FP64). -
vf8wimmacc.vv(funct6=0x3b, OPIVV, vm=0): Integer-input 8× widening FP-accumulate with E8M0 microscaling. Shares its opcode withv8wmmacc.vv;vm=0selects the MX form. Supports Zvvxi4fp32mm (MXINT4→FP32) and Zvvxi8fp64mm (MXINT8→FP64).
Normative changes
-
vtypefield positions finalised:altfmt_Amoved tovtype[XLEN-5],altfmt_Btovtype[XLEN-6],bstovtype[XLEN-7]— immediately belowlambda[2:0]. All three fields are now outside thevsetvliimmediate range and requirevsetvlorvsetivlito configure. The provisional editorial notes about future relocation have been removed. -
Subextension dependencies relaxed: The blanket
Zve64ddependency has been replaced with the minimumZvesubset per subextension. Integer-only subextensions with accumulators ≤ 32-bit now depend onZve32x; 64-bit integer accumulators onZve64x; FP subextensions with accumulators ≤ 32-bit onZve32f. Only FP64-accumulator subextensions retainZve64d. -
Round-to-odd for partial sums: The accumulation rounding model now specifies that the optional rounding of the partial sum S must use round-to-odd (RTO) mode: C ← roundfrm(C + roundrto(S)). The final accumulation into C continues to use the dynamic rounding mode from
frm. -
vfmmacc.vvmicroscaling removed: Microscaling (vm=0) is no longer supported for non-widening FP multiply-accumulate.vm=0is reserved forvfmmacc.vv(W=1). The contradictory "When vm=0" exception clauses and the dead microscaling SAIL code have been removed. -
λ terminology clarified: λ is now described as a "tile-layout parameter" rather than "the K dimension." Two occurrences of the conflation have been corrected; K_eff = λ × W × LMUL is consistently presented as the derived effective K dimension.
-
MXINT4 defined: MXINT4 (E8M0-scaled signed 4-bit integer) is now explicitly defined by this specification as analogous to OCP MX's MXINT8 but with 4-bit elements. Corresponding MX subextensions Zvvxi4fp32mm and Zvvxni4fp32mm have been added to the microscaling subextension table.
-
OCP MX citation: Added a proper normative reference to the OCP Microscaling Formats (MX) v1.0 Specification with URL.
-
Zvvmttls separated: Transposing tile load/store instructions (
vmttl.v,vmtts.v) are now in their own sub-extensionZvvmttls, separate from the order-preservingZvvmtls. -
Missing subextension restored: Zvvofp8fp32mm (OFP8 × OFP8 → FP32) added to the computational subextension table.
Encoding map updates
- Added FP encoding map entries for
vf8wmmacc.vv(W=8): SEW=32 (OFP4→FP32) and SEW=64 (OFP8→FP64), with all altfmt combinations and microscaling columns. - Added integer encoding map entries for
v8wmmacc.vv(W=8): SEW=32 (Int4→Int32) and SEW=64 (Int8→Int64). - Added integer MX encoding map entries for
vf8wimmacc.vv(W=8, vm=0): SEW=32 (MXINT4→FP32) and SEW=64 (MXINT8→FP64).
...
Zvvm release for Internal Review (IME TG)
Release Notes — Zvvm Family of Integrated Matrix Extensions
Scope: All changes to src/integrated-matrix.adoc since the repository
baseline at f83dff03.
New specification: Zvvm Family of Integrated Matrix Extensions
This release introduces the complete specification for the Zvvm family of
Integrated Matrix Extensions, a set of RISC-V ISA extensions that add matrix
multiply-accumulate instructions operating entirely within the standard V
register file, with tile geometry derived algebraically from existing vtype
fields (VLEN, SEW, LMUL) and a new aspect-ratio field λ.
Specification content
Core integer multiply-accumulate (Zvvmm)
vmmacc.vv(W=1),vwmmacc.vv(W=2),vqwmmacc.vv(W=4) covering the
full integer subextension family fromZvvmmi4b(Int4×Int4→Int8) through
Zvvmmd(Int64×Int64→Int64).- Independent
altfmt_A/altfmt_Bfields invtypeselect signed or
unsigned interpretation of each input independently; the accumulator is
always signed. - Vector masking (
vm=0) is explicitly reserved for the integer family; all
integer multiply-accumulate instructions requirevm=1. - LMUL is restricted to integer values (1, 2, 4, 8); fractional LMUL is not
permitted.
Core floating-point multiply-accumulate (Zvvfmm)
vfmmacc.vv(W=1),vfwmmacc.vv(W=2),vfqwmmacc.vv(W=4) covering
OFP4, OFP8, FP16, BF16, FP32, and FP64 input/accumulator combinations.- Mixed-format inputs (
altfmt_A ≠ altfmt_B) are permitted for widening
instructions;vfmmacc.vvwith mixed FP16/BF16 inputs raises an
illegal-instruction exception. - Mixed OFP8 inputs (E4M3 × E5M2) are permitted only with widening
instructions.
Tile load/store (Zvvmtls)
vmtl.v(row- or column-major tile load),vmttl.v(transposing tile
load),vmts.v(tile store) with leading-dimension stride operand.- Tail and mask policy aligned with base V; partial-VL semantics specified for
embedded streaming use cases.
Microscaling support (Zvvfmm MX variants)
vm=0on FP multiply-accumulate opcodes selects microscaling decode: E8M0 block scale factors inv0, block size selected bybsfield invtype(32 or 16 elements), input formats selected byaltfmt_A/altfmt_B.vfwimmacc.vvandvfqwimmacc.vv: integer (MXINT) inputs with FP accumulator, microscaling only;altfmt_A=1andaltfmt_B=1are reserved (MXINT inputs are always signed).- Full subextension coverage: MXFP4, MXFP8, MXINT8, MXINT4 with BS=32 (
Zvvfmmmx*) and BS=16 (Zvvfmmnx*) variants. - Scale data layout in
v0is tile-strided with row stride R = λ × SEW ÷ 16; M × R = VLEN ÷ 16 always fills exactly one register.
W=8 octal-widening carve-out
- Subextension names defined for future W=8 instructions:
Zvvmmbd(Int8→Int64),Zvvfmmofp4f(OFP4→FP32), Zvvfmmofp8d` (OFP8→FP64), and MX counterparts for MXFP4→FP32, MXFP8→FP64, and MXINT8→FP64. - Encoding space reserved: funct6 = 0x3b in OPIVV (following
vmmacc/vwmmacc/vqwmmaccat 0x38–0x3a) and funct6 = 0x17 in OPFVV (followingvfmmacc/vfwmmacc/vfqwmmaccat 0x14–0x16).
Supporting material
- C-language intrinsics: Proto-intrinsic API covering all (SEW, λ, LMUL) combinations for integer, widening, FP, and microscaling variants, using
vtype.hto expose the fullvtypecontrol surface without requiring compiler support. - GEMM pseudocode: Illustrative tiling loop for both FP and integer GEMM using
vmtl.v/vfmmacc.vv/vmts.v. - Arithmetic considerations: Analysis of rounding, exception flag accumulation, and precision requirements for widening and microscaling paths.
- Figures: Diagrams for tile load/store geometry (row-major vs. column-major, partial-VL), microscaling
v0layout, and lane-local scale alignment.
Encoding
| Instruction | funct6 | funct3 | vm |
|---|---|---|---|
vmmacc.vv |
0x38 | OPIVV | 1 |
vwmmacc.vv / vfwimmacc.vv |
0x39 | OPIVV | 1 / 0 |
vqwmmacc.vv / vfqwimmacc.vv |
0x3a | OPIVV | 1 / 0 |
| (W=8 reserved) | 0x3b | OPIVV | — |
vfmmacc.vv |
0x14 | OPFVV | 1 (vm=0: MX) |
vfwmmacc.vv |
0x15 | OPFVV | 1 (vm=0: MX) |
vfqwmmacc.vv |
0x16 | OPFVV | 1 (vm=0: MX) |
| (W=8 reserved) | 0x17 | OPFVV | — |