GitHub - schachmadinejad/erxi

5 min read Original article ↗

EXI 1.0 (W3C Second Edition) implementation in Rust. Complete spec coverage as both a library and a CLI tool.

EXI (Efficient XML Interchange) is a binary XML format that encodes XML documents compactly and fast--typically smaller than gzip and faster to parse than text XML.

Features

  • Complete EXI 1.0 Second Edition implementation
  • Schema-less and schema-informed encoding/decoding
  • All alignment modes: BitPacked, ByteAlignment, PreCompression, Compression (DEFLATE)
  • Strict mode and Fragment mode
  • Fidelity options: Comments, PIs, DTD, Prefixes, Lexical Values
  • String Table with configurable partitions
  • Self-Contained elements
  • Datatype Representation Mapping
  • Streaming encode for large files (no intermediate Vec)
  • Iterator-based decoding
  • XSD schema parser with xs:import support
  • Parallel DEFLATE compression for large streams
  • High test coverage (unit + interop + cross-RTT)

Installation

# Build from the repository
cargo build --release

# Library-only (no CLI)
cargo build --release --no-default-features

Cargo Features

Feature Default Description
cli yes CLI binary (erxi encode / erxi decode)
cli-fast yes Faster CLI: zlib-ng + mimalloc
fast-deflate yes zlib-ng backend for DEFLATE
fast-alloc yes mimalloc allocator

CLI

Encode

# Simple encoding (BitPacked, default)
erxi encode -i document.xml -o document.exi

# With schema
erxi encode -i document.xml -o document.exi -s schema.xsd

# With compression
erxi encode -i document.xml -o document.exi --compression

# Byte-aligned with options in header
erxi encode -i document.xml -o document.exi --byte-aligned --include-options

# All fidelity options
erxi encode -i document.xml -o document.exi \
  --preserve-comments --preserve-pis --preserve-dtd --preserve-prefixes

# Large files: parallel DEFLATE
 erxi encode -i large.xml -o large.exi --compression --parallel-deflate

# Read from stdin
cat document.xml | erxi encode -i - -o document.exi

# Auto output (document.xml -> document.exi)
erxi encode -i document.xml

# To stdout
erxi encode -i document.xml -o -

Decode

# Auto output (document.exi -> document.xml)
erxi decode -i document.exi

# To file
erxi decode -i document.exi -o document.xml

# To stdout
erxi decode -i document.exi -o -

# With schema (when options are not in the header)
erxi decode -i document.exi -o document.xml -s schema.xsd

# Pretty-printed XML output
erxi decode -i document.exi --pretty

JSON (EXI4JSON)

# Encode JSON -> EXI4JSON (output auto: input.json -> input.exi)
erxi json encode -i data.json

# Encode JSON -> EXI4JSON to stdout
erxi json encode -i data.json -o -

# Decode EXI4JSON -> JSON (output auto: input.exi -> input.json)
erxi json decode -i data.exi

# Decode EXI4JSON -> JSON to stdout
erxi json decode -i data.exi -o -

# Pretty-printed JSON output
erxi json decode -i data.exi --pretty

# Enable EXI4JSON <other> heuristics (date/time/base64/integer/decimal)
erxi json encode -i data.json --exi4json-other

All CLI Options

Option Description
-i, --input <FILE> Input file (- for stdin)
-o, --output <FILE> Output file (optional; without -o auto-derived, -o - = stdout)
-s, --schema <FILE> XSD schema file
--schema-id <ID> Schema ID in EXI header
--pretty Pretty-printed output (only decode / json decode)
--schema-id-none Schema ID = None (xsi:nil=true)
--schema-id-builtin Schema ID = BuiltinOnly (empty string)
--byte-aligned Byte alignment
--pre-compression Pre-compression alignment
--compression DEFLATE compression
--strict Strict mode
--fragment Fragment mode
--preserve-comments Preserve comment events
--preserve-pis Preserve processing instructions
--preserve-dtd Preserve DOCTYPE / entity reference events
--preserve-prefixes Preserve namespace prefixes
--preserve-lexical Preserve lexical values
--preserve-whitespace Preserve insignificant whitespace
--self-contained Enable self-contained fragments
--self-contained-qname <URI> <LOCAL> Self-contained only for specific elements (repeatable)
--include-options Write options in EXI header
--include-cookie Write "$EXI" cookie
--parallel-deflate Parallel DEFLATE compression
--block-size <N> Compression block size (default: 1,000,000)
--value-max-length <N> String Table max value length
--value-capacity <N> String Table partition capacity
--dtrm <TYPE_URI> <TYPE_LOCAL> <REP_URI> <REP_LOCAL> Datatype Representation Map entry (repeatable)

Large Files

  • Encode without Compression/PreCompression writes streaming output and flushes periodically (no growing output buffer).
  • Decode from files uses memory mapping (feature mmap); stdin is read fully.

Spec Coverage

erxi implements the full EXI 1.0 Second Edition specification:

Spec Section Module Description
4 event.rs EXI event model (12 event types)
5 header.rs, options.rs, options_codec.rs EXI header and options
6 encoder/, decoder/ Encoding/decoding EXI streams
7.1 bitstream.rs, string.rs, integer.rs, ... Built-in EXI datatypes
7.2 enumeration.rs Enumerations
7.3 string_table.rs String Table
8.1-8.4 grammar.rs, event_code.rs Built-in XML grammars
8.5 grammar.rs, proto_grammar.rs, xsd/ Schema-informed grammars
9 encoder/compression.rs, decoder/compression.rs EXI compression
10 tests/conformance.rs Conformance
Appendix B examples/full_cross_test.rs (infoset suite) Infoset mapping
Appendix C options_codec.rs Options header schema
Appendix D string_table.rs Initial String Table entries
Appendix E rcs.rs Restricted Character Sets

Known deviations: docs/interop-deviations.md

Tests

# Unit and integration tests
cargo test

# Setup + full matrix + verify
./scripts/complete.sh

# Setup only (exports EXI_TESTSUITE_DIR/EXIFICIENT_JAR)
source ./scripts/setup.sh

# Run full matrix (requires EXI_TESTSUITE_DIR/EXIFICIENT_JAR)
./scripts/run.sh

# Verify results against expectations
./scripts/verify.sh

# Individual suites
cargo test --test conformance       # Spec 10 conformance
cargo test --test cli_e2e           # CLI end-to-end tests
cargo test --test full_cross_matrix # Full cross matrix (via scripts/run.sh)

Test Prerequisites

  • W3C EXI Test Suite: EXI_TESTSUITE_DIR must be set (setup script downloads by default).
  • Exificient (Java): setup script builds a fat JAR and compiles tools/ExifBatch.java.

Cross-Implementation Tests

erxi is tested against Exificient (Java reference implementation). The cross-matrix checks 8030 combinations of fixtures, alignments, and encode/decode directions.

Expected results: tests/cross_matrix_expectations.tsv

Benchmarks

Benchmarks and profiling live in the separate erxi-benchmark repo.

Documentation

Document Description
docs/architecture.md Layer model, pipeline, module overview
docs/interop-deviations.md Intentional spec deviations for Exificient interop
docs/infoset-mapping.md XML infoset mapping (Appendix B)
docs/byte-diff-analyse.md Byte-difference analysis vs Exificient

License

PolyForm Noncommercial 1.0.0 -- free for non-commercial use (research, education, hobby, public institutions). Commercial use requires a separate license.