GitHub - StratCraftsAI/NexusFix: A zero-alloc, compile-time hardened FIX engine built for sub-100ns execution.

Ultra-Low Latency FIX Protocol Engine for High-Frequency Trading

Modern C++23 | Zero-Copy | SIMD-Accelerated | 3x Faster than QuickFIX

Why NexusFIX?

NexusFIX is a high-performance FIX protocol (Financial Information eXchange) engine built for ultra-low latency quantitative trading, sub-microsecond algorithmic execution, and high-frequency trading (HFT) systems. It solves the performance bottlenecks of traditional FIX engines by utilizing hardware-aware C++ programming.

NexusFIX serves as a modern, faster alternative to QuickFIX with zero heap allocations on the critical path.

"If you're building a low-latency trading system and QuickFIX is your bottleneck, NexusFIX is your solution."

Performance

NexusFIX vs QuickFIX Benchmark

Tested on Linux with GCC 13.3, 100,000 iterations:

Metric	QuickFIX	NexusFIX	Improvement
ExecutionReport Parse	730 ns	246 ns	3.0x faster
NewOrderSingle Parse	661 ns	229 ns	2.9x faster
Field Lookup (O(1) post-parse, 4 fields)	31 ns	11 ns	2.9x faster
Parse Throughput	1.19M msg/sec	4.17M msg/sec	3.5x higher
P99 Parse Latency	784 ns	258 ns	3.0x lower

Why is NexusFIX Faster?

Technique	QuickFIX	NexusFIX
Memory	Heap allocation per message	Zero-copy `std::span` views
Field Lookup	O(log n) `std::map`	O(1) direct array indexing
Parsing	Byte-by-byte scanning	AVX2 SIMD vectorized
Field Offsets	Runtime calculation	`consteval` compile-time
Enum/Type Conversion	Runtime switch chains (~300 branches)	22 compile-time lookup tables (55-97% faster)
Error Handling	Exceptions	`std::expected` (no throw)

Zero Allocation Proof

Parsing a NewOrderSingle message on the hot path:

Operation	QuickFIX	NexusFIX
Heap Allocations	~12 (`std::string`, `std::map` nodes)	0
Field Storage	`std::map<int, std::string>` copies	`std::span` views into original buffer
Parsing Logic	Runtime map insertion	Compile-time offset table
Memory Footprint	Dynamic, unpredictable	Static, pre-allocated PMR pool
Destructor Overhead	~12 `std::string` destructors	0 (no owned memory)

Verified via custom allocator instrumentation. See Optimization Diary.

For kernel bypass (DPDK/AF_XDP) and FPGA acceleration, see Roadmap.

Architecture Influences

NexusFIX stands on the shoulders of giants. We systematically studied 11 industry-leading Modern C++ libraries and applied their techniques to ultra-low latency FIX processing. Below is our learning journey: what we learned, what we built, and what improvement we measured.

Learning → Implementation → Verification

Source Library	Engineering Evaluation	What We Changed	Benchmark Result
hffix	O(n) iterator-based field lookup is suboptimal for dense FIX packets; lacks compile-time optimization and type safety	`[Optimized]` `consteval` field offsets + `std::span` zero-copy views + O(1) direct indexing	14ns field access vs ~50ns iterator scan
Abseil	Swiss Tables offer SIMD-accelerated probing with 7-bit H2 fingerprints; superior cache locality for session maps	`[Adopted]` `absl::flat_hash_map` for session store	31% faster (20ns → 15ns)
Quill	Lock-free SPSC queue with deferred formatting; only viable approach for hot-path logging without blocking	`[Adopted]` Quill as logging backend	8ns median latency; zero blocking
NanoLog	Binary encoding + background thread achieves 7ns; compile-time format validation essential for type safety	`[Synthesized]` `DeferredProcessor<T>` with static type-safe binary serialization	84% reduction (75ns → 12ns)
liburing	`DEFER_TASKRUN` defers completion to userspace, eliminating kernel task wakeups; registered buffers avoid per-op mapping	`[Adopted]` io_uring + DEFER_TASKRUN + registered buffers + multishot	7-27% faster; ~30% fewer syscalls
Highway	Portable SIMD abstraction across AVX2/AVX-512/NEON/SVE; slight overhead vs direct intrinsics	`[Evaluated]` Retained hand-tuned intrinsics for FIX-specific patterns	13x throughput; Highway deferred for ARM
Seastar	Share-nothing reactor optimal for high-concurrency I/O; high abstraction overhead for single-threaded tick-to-trade paths	`[Influenced]` Extracted core-pinning + lock-free pipelining without framework	8% P99 improvement (18.8ns → 17.3ns)
Folly	Advanced memory fencing patterns and lock-free primitives; `folly::Function` overhead acceptable for cold path only	`[Influenced]` Native SPSC queue + bit-masking for tag validation	Comparable performance; zero dependency
Rigtorp	Cache-line padding (`alignas(64)`) eliminates false sharing; simplest correct SPSC implementation	`[Synthesized]` Native `SPSCQueue` with identical techniques	88M ops/sec; 11ns median
xsimd	Generic SIMD wrappers useful for math, but FIX parsing requires byte-level shuffle control	`[Evaluated]` Direct Intel intrinsics for SOH/delimiter scanning	2x faster than generic wrappers
Boost.PMR	Standard allocators induce non-deterministic jitter; monotonic buffer enables arena allocation per message	`[Adopted]` `std::pmr::monotonic_buffer_resource`	Zero heap allocation on hot path

What We Built

Component	Inspired By	Implementation
`TagOffsetMap`	hffix	Compile-time generated O(1) field lookup table
`DeferredProcessor<T>`	NanoLog	SPSC queue + background thread for async processing
`ThreadLocalPool<T>`	NanoLog, Folly	Per-thread object pool, zero lock contention
`SPSCQueue<T>`	Rigtorp, Folly	Cache-line aligned lock-free queue
`simd_scanner`	xsimd (concept)	Hand-tuned AVX2/AVX-512 SOH and delimiter scanning
`IoUringTransport`	liburing	DEFER_TASKRUN + registered buffers + multishot recv
`CpuAffinity`	Seastar	Thread-to-core pinning utility

Cumulative Impact

Metric	Before	After	Improvement
ExecutionReport Parse	730 ns	246 ns	3.0x faster
Hot Path Latency	361 ns	213 ns	41% reduction
SIMD SOH Scan	~150 ns	11.8 ns	~13x faster
Hash Map Lookup	20 ns	15 ns	31% faster
P99 Tail Latency	784 ns	258 ns	3.0x lower

Detailed benchmarks: Optimization Summary

Attribution

NexusFIX is MIT licensed. We gratefully acknowledge these open source projects:

Dependency	License	Usage
Abseil	Apache 2.0	`flat_hash_map` for session lookups
Quill	MIT	Async logging infrastructure
liburing	MIT/LGPL	io_uring C wrapper

Features

Core Capabilities

Zero-Copy Parsing - std::span<const char> views into original buffer, no memcpy
Message Encoding - Builder pattern with constexpr serializer for constructing FIX messages
SIMD Acceleration - AVX2/AVX-512 instructions for delimiter scanning
Compile-Time Optimization - consteval field offsets, 22 lookup tables for enum/type conversion, ~300 runtime branches eliminated
O(1) Field Lookup - Pre-indexed lookup table by FIX tag number (post-parse)
Zero Heap Allocation - PMR pools and stack allocation on hot path
Session Management - Full session lifecycle: Logon, Logout, Heartbeat, sequence number tracking, reconnect logic
Type-Safe API - Strong types for Price, Quantity, Side, OrdType

Modern C++23

std::expected for error handling (no exceptions on hot path)
std::span for zero-copy data views
Concepts for compile-time interface validation
consteval for compile-time computation
[[likely]] / [[unlikely]] branch hints

Supported FIX Versions

Version	Status	Notes
FIX 4.4	Full Support	Most common in production
FIX 5.0 + FIXT 1.1	Full Support	Only 2% overhead vs 4.4

Supported Message Types

MsgType	Name	Category
A	Logon	Session
5	Logout	Session
0	Heartbeat	Session
D	NewOrderSingle	Order Entry
F	OrderCancelRequest	Order Entry
8	ExecutionReport	Order Entry
V	MarketDataRequest	Market Data
W	MarketDataSnapshotFullRefresh	Market Data
X	MarketDataIncrementalRefresh	Market Data

Optimization Guide

How we achieved sub-300ns latency with Modern C++23:

Optimization Diary - Step-by-step journey from 730ns to 246ns
Modern C++ Quant Techniques - Cache-line alignment, SIMD, PMR strategies, branch hints

Quick Start

Installation

git clone https://github.com/StratCraftsAI/NexusFIX.git
cd NexusFIX
./start.sh build

Requirements

C++23 compiler: GCC 13+ or Clang 17+
CMake: 3.20+
OS: Linux (io_uring optional), macOS, Windows

Basic Usage

#include <nexusfix/nexusfix.hpp>

using namespace nfx;
using namespace nfx::fix44;

// Connect to broker
TcpTransport transport;
transport.connect("fix.broker.com", 9876);

// Configure session
SessionConfig config{
    .sender_comp_id = "MY_CLIENT",
    .target_comp_id = "BROKER",
    .heartbeat_interval = 30
};
SessionManager session{transport, config};
session.initiate_logon();

// Send order (zero allocation)
MessageAssembler asm_;
NewOrderSingle::Builder order;
auto msg = order
    .cl_ord_id("ORD001")
    .symbol("AAPL")
    .side(Side::Buy)
    .order_qty(Qty::from_int(100))
    .ord_type(OrdType::Limit)
    .price(FixedPrice::from_double(150.00))
    .build(asm_);
transport.send(msg);

Parse Incoming Messages

// Zero-copy parsing
FixParser parser;
auto result = parser.parse(raw_buffer);

if (result) {
    auto& msg = *result;
    auto order_id = msg.get_string(Tag::OrderID);    // O(1) lookup
    auto exec_type = msg.get_char(Tag::ExecType);    // No allocation
    auto fill_qty = msg.get_qty(Tag::LastQty);       // Type-safe
}

Documentation

CHANGELOG.md for release history and upgrade notes
BENCHMARK_REPRODUCTION.md for reproducing published measurements
CONTRIBUTING.md for contribution boundaries and code standards
SECURITY.md for coordinated vulnerability disclosure
SUPPORT.md for bug reports, usage questions, and response expectations
ROADMAP.md for near-term and mid-term open-source priorities
docs/COVERAGE_LIMITATIONS.md for coverage-build caveats and usage boundaries
docs/compare/ for benchmark reports and optimization writeups
docs/design/ for architecture notes and design tickets that are public

Community

Support: SUPPORT.md
Contributing: CONTRIBUTING.md
Security: SECURITY.md
Code of Conduct: CODE_OF_CONDUCT.md

Build Options

CMake Option	Default	Description
`NFX_ENABLE_SIMD`	ON	AVX2/AVX-512 SIMD acceleration
`NFX_ENABLE_IO_URING`	OFF	Linux io_uring transport
`NFX_BUILD_BENCHMARKS`	ON	Build benchmark suite
`NFX_BUILD_TESTS`	ON	Build unit tests
`NFX_BUILD_EXAMPLES`	ON	Build examples
`NFX_ENABLE_COVERAGE`	OFF	Coverage instrumentation for CI/local test analysis only; not for production or benchmarks

# Build with all optimizations
cmake -B build -DCMAKE_BUILD_TYPE=Release -DNFX_ENABLE_SIMD=ON
cmake --build build -j

# Run benchmarks
./start.sh bench 100000

# Compare with QuickFIX
./start.sh compare 100000

Benchmarking

Verify performance claims by running benchmarks yourself.

Quick Start

# Run parser and session benchmarks
./start.sh bench 100000

# Example output:
# [BENCHMARK] ExecutionReport Parse
#   Iterations: 100000
#   Mean: 246 ns
#   P50:  245 ns
#   P99:  258 ns

QuickFIX Comparison

Compare NexusFIX against QuickFIX (requires QuickFIX installed):

# Install QuickFIX first
# Ubuntu: sudo apt install libquickfix-dev
# Or build from source: https://github.com/quickfix/quickfix

# Run comparison
./start.sh compare 100000

Full Reproduction Guide

For detailed instructions on reproducing benchmark results, including:

Environment setup (CPU governor, pinning, priority)
Build configuration options
Interpreting results
Troubleshooting

See BENCHMARK_REPRODUCTION.md

Technical References

API Reference - Complete API documentation
Implementation Guide - Architecture overview
Benchmark Report - Detailed performance analysis
Modern C++ Techniques - Optimization techniques used

Project Structure

nexusfix/
├── include/nexusfix/
│   ├── parser/           # Zero-copy FIX parser (SIMD)
│   ├── session/          # Session state machine
│   ├── transport/        # TCP / io_uring / Winsock transport
│   ├── platform/         # Cross-platform abstraction
│   ├── types/            # Strong types (Price, Qty, Side)
│   ├── memory/           # PMR buffer pools
│   ├── store/            # Message store (PMR-optimized)
│   ├── serializer/       # Message serialization
│   ├── util/             # Utilities (diagnostics, formatting)
│   ├── messages/fix44/   # FIX 4.4 message builders
│   └── interfaces/       # Concepts and interfaces
├── benchmarks/           # Performance benchmarks
├── tests/                # Unit tests
├── examples/             # Example programs
└── docs/                 # Documentation

FAQ

How does NexusFIX achieve zero-copy parsing?

NexusFIX uses std::span<const char> to create views into the original network buffer. Field values are never copied - the parser returns spans pointing to the exact byte range in the source buffer. This eliminates all memcpy and heap allocation overhead.

Is NexusFIX compatible with QuickFIX?

NexusFIX implements the same FIX 4.4/5.0 protocol standards but with a different API optimized for performance. It is wire-compatible with any FIX counterparty, including systems using QuickFIX.

What latency can I expect in production?

In our benchmarks: ~250 nanoseconds for ExecutionReport parsing. Actual production latency depends on network, kernel configuration, and hardware. NexusFIX is designed to minimize the application-layer overhead.

Does NexusFIX support FIX Repeating Groups?

Yes. Repeating groups are parsed with the same zero-copy approach. Group iteration is O(1) per entry.

Use Cases

NexusFIX is designed for:

High-Frequency Trading (HFT) - Sub-microsecond message processing
Algorithmic Trading Systems - Low-latency order routing
Market Making - High-throughput quote updates
Smart Order Routing (SOR) - Multi-venue connectivity
Trading Infrastructure - FIX gateways and bridges

Contact

For questions or collaboration: nonagonal.portal@gmail.com

Development

Built with Modern C++23. Optimized via hardware-aware high-performance patterns including cache-line alignment, SIMD vectorization, and zero-copy memory design. Verified through rigorous benchmarking and AI-assisted static analysis.

For technical deep-dives on our optimization journey, see Optimization Diary.

Contributing

This project is maintained by StratCraftsAI.

Issues & Discussions: Welcome for bug reports, performance questions, and feature discussions
Pull Requests: Bug fixes and performance optimizations welcome (see CONTRIBUTING.md)
Feature PRs require prior discussion in Issues
Performance PRs must include benchmark data (before/after)

All contributions must follow:

C++23 standards
Zero allocation on hot paths
Include benchmarks for performance changes

License

MIT License - See LICENSE file.

_{Built with Modern C++23 for ultra-low latency quantitative trading}