Ultra-Low Latency FIX Protocol Engine for High-Frequency Trading
Modern C++23 | Zero-Copy | SIMD-Accelerated | 3x Faster than QuickFIX
Performance | Architecture | Features | Quick Start | Docs | Community | Contact
Why NexusFIX?
NexusFIX is a high-performance FIX protocol (Financial Information eXchange) engine built for ultra-low latency quantitative trading, sub-microsecond algorithmic execution, and high-frequency trading (HFT) systems. It solves the performance bottlenecks of traditional FIX engines by utilizing hardware-aware C++ programming.
NexusFIX serves as a modern, faster alternative to QuickFIX with zero heap allocations on the critical path.
"If you're building a low-latency trading system and QuickFIX is your bottleneck, NexusFIX is your solution."
Performance
NexusFIX vs QuickFIX Benchmark
Tested on Linux with GCC 13.3, 100,000 iterations:
| Metric | QuickFIX | NexusFIX | Improvement |
|---|---|---|---|
| ExecutionReport Parse | 730 ns | 246 ns | 3.0x faster |
| NewOrderSingle Parse | 661 ns | 229 ns | 2.9x faster |
| Field Lookup (O(1) post-parse, 4 fields) | 31 ns | 11 ns | 2.9x faster |
| Parse Throughput | 1.19M msg/sec | 4.17M msg/sec | 3.5x higher |
| P99 Parse Latency | 784 ns | 258 ns | 3.0x lower |
Why is NexusFIX Faster?
| Technique | QuickFIX | NexusFIX |
|---|---|---|
| Memory | Heap allocation per message | Zero-copy std::span views |
| Field Lookup | O(log n) std::map |
O(1) direct array indexing |
| Parsing | Byte-by-byte scanning | AVX2 SIMD vectorized |
| Field Offsets | Runtime calculation | consteval compile-time |
| Enum/Type Conversion | Runtime switch chains (~300 branches) | 22 compile-time lookup tables (55-97% faster) |
| Error Handling | Exceptions | std::expected (no throw) |
Zero Allocation Proof
Parsing a NewOrderSingle message on the hot path:
| Operation | QuickFIX | NexusFIX |
|---|---|---|
| Heap Allocations | ~12 (std::string, std::map nodes) |
0 |
| Field Storage | std::map<int, std::string> copies |
std::span views into original buffer |
| Parsing Logic | Runtime map insertion | Compile-time offset table |
| Memory Footprint | Dynamic, unpredictable | Static, pre-allocated PMR pool |
| Destructor Overhead | ~12 std::string destructors |
0 (no owned memory) |
Verified via custom allocator instrumentation. See Optimization Diary.
For kernel bypass (DPDK/AF_XDP) and FPGA acceleration, see Roadmap.
Architecture Influences
NexusFIX stands on the shoulders of giants. We systematically studied 11 industry-leading Modern C++ libraries and applied their techniques to ultra-low latency FIX processing. Below is our learning journey: what we learned, what we built, and what improvement we measured.
Learning → Implementation → Verification
| Source Library | Engineering Evaluation | What We Changed | Benchmark Result |
|---|---|---|---|
| hffix | O(n) iterator-based field lookup is suboptimal for dense FIX packets; lacks compile-time optimization and type safety | [Optimized] consteval field offsets + std::span zero-copy views + O(1) direct indexing |
14ns field access vs ~50ns iterator scan |
| Abseil | Swiss Tables offer SIMD-accelerated probing with 7-bit H2 fingerprints; superior cache locality for session maps | [Adopted] absl::flat_hash_map for session store |
31% faster (20ns → 15ns) |
| Quill | Lock-free SPSC queue with deferred formatting; only viable approach for hot-path logging without blocking | [Adopted] Quill as logging backend |
8ns median latency; zero blocking |
| NanoLog | Binary encoding + background thread achieves 7ns; compile-time format validation essential for type safety | [Synthesized] DeferredProcessor<T> with static type-safe binary serialization |
84% reduction (75ns → 12ns) |
| liburing | DEFER_TASKRUN defers completion to userspace, eliminating kernel task wakeups; registered buffers avoid per-op mapping |
[Adopted] io_uring + DEFER_TASKRUN + registered buffers + multishot |
7-27% faster; ~30% fewer syscalls |
| Highway | Portable SIMD abstraction across AVX2/AVX-512/NEON/SVE; slight overhead vs direct intrinsics | [Evaluated] Retained hand-tuned intrinsics for FIX-specific patterns |
13x throughput; Highway deferred for ARM |
| Seastar | Share-nothing reactor optimal for high-concurrency I/O; high abstraction overhead for single-threaded tick-to-trade paths | [Influenced] Extracted core-pinning + lock-free pipelining without framework |
8% P99 improvement (18.8ns → 17.3ns) |
| Folly | Advanced memory fencing patterns and lock-free primitives; folly::Function overhead acceptable for cold path only |
[Influenced] Native SPSC queue + bit-masking for tag validation |
Comparable performance; zero dependency |
| Rigtorp | Cache-line padding (alignas(64)) eliminates false sharing; simplest correct SPSC implementation |
[Synthesized] Native SPSCQueue with identical techniques |
88M ops/sec; 11ns median |
| xsimd | Generic SIMD wrappers useful for math, but FIX parsing requires byte-level shuffle control | [Evaluated] Direct Intel intrinsics for SOH/delimiter scanning |
2x faster than generic wrappers |
| Boost.PMR | Standard allocators induce non-deterministic jitter; monotonic buffer enables arena allocation per message | [Adopted] std::pmr::monotonic_buffer_resource |
Zero heap allocation on hot path |
What We Built
| Component | Inspired By | Implementation |
|---|---|---|
TagOffsetMap |
hffix | Compile-time generated O(1) field lookup table |
DeferredProcessor<T> |
NanoLog | SPSC queue + background thread for async processing |
ThreadLocalPool<T> |
NanoLog, Folly | Per-thread object pool, zero lock contention |
SPSCQueue<T> |
Rigtorp, Folly | Cache-line aligned lock-free queue |
simd_scanner |
xsimd (concept) | Hand-tuned AVX2/AVX-512 SOH and delimiter scanning |
IoUringTransport |
liburing | DEFER_TASKRUN + registered buffers + multishot recv |
CpuAffinity |
Seastar | Thread-to-core pinning utility |
Cumulative Impact
| Metric | Before | After | Improvement |
|---|---|---|---|
| ExecutionReport Parse | 730 ns | 246 ns | 3.0x faster |
| Hot Path Latency | 361 ns | 213 ns | 41% reduction |
| SIMD SOH Scan | ~150 ns | 11.8 ns | ~13x faster |
| Hash Map Lookup | 20 ns | 15 ns | 31% faster |
| P99 Tail Latency | 784 ns | 258 ns | 3.0x lower |
Detailed benchmarks: Optimization Summary
Attribution
NexusFIX is MIT licensed. We gratefully acknowledge these open source projects:
| Dependency | License | Usage |
|---|---|---|
| Abseil | Apache 2.0 | flat_hash_map for session lookups |
| Quill | MIT | Async logging infrastructure |
| liburing | MIT/LGPL | io_uring C wrapper |
Features
Core Capabilities
- Zero-Copy Parsing -
std::span<const char>views into original buffer, nomemcpy - Message Encoding - Builder pattern with
constexprserializer for constructing FIX messages - SIMD Acceleration - AVX2/AVX-512 instructions for delimiter scanning
- Compile-Time Optimization -
constevalfield offsets, 22 lookup tables for enum/type conversion, ~300 runtime branches eliminated - O(1) Field Lookup - Pre-indexed lookup table by FIX tag number (post-parse)
- Zero Heap Allocation - PMR pools and stack allocation on hot path
- Session Management - Full session lifecycle: Logon, Logout, Heartbeat, sequence number tracking, reconnect logic
- Type-Safe API - Strong types for Price, Quantity, Side, OrdType
Modern C++23
std::expectedfor error handling (no exceptions on hot path)std::spanfor zero-copy data views- Concepts for compile-time interface validation
constevalfor compile-time computation[[likely]]/[[unlikely]]branch hints
Supported FIX Versions
| Version | Status | Notes |
|---|---|---|
| FIX 4.4 | Full Support | Most common in production |
| FIX 5.0 + FIXT 1.1 | Full Support | Only 2% overhead vs 4.4 |
Supported Message Types
| MsgType | Name | Category |
|---|---|---|
| A | Logon | Session |
| 5 | Logout | Session |
| 0 | Heartbeat | Session |
| D | NewOrderSingle | Order Entry |
| F | OrderCancelRequest | Order Entry |
| 8 | ExecutionReport | Order Entry |
| V | MarketDataRequest | Market Data |
| W | MarketDataSnapshotFullRefresh | Market Data |
| X | MarketDataIncrementalRefresh | Market Data |
Optimization Guide
How we achieved sub-300ns latency with Modern C++23:
- Optimization Diary - Step-by-step journey from 730ns to 246ns
- Modern C++ Quant Techniques - Cache-line alignment, SIMD, PMR strategies, branch hints
Quick Start
Installation
git clone https://github.com/StratCraftsAI/NexusFIX.git
cd NexusFIX
./start.sh buildRequirements
- C++23 compiler: GCC 13+ or Clang 17+
- CMake: 3.20+
- OS: Linux (io_uring optional), macOS, Windows
Basic Usage
#include <nexusfix/nexusfix.hpp> using namespace nfx; using namespace nfx::fix44; // Connect to broker TcpTransport transport; transport.connect("fix.broker.com", 9876); // Configure session SessionConfig config{ .sender_comp_id = "MY_CLIENT", .target_comp_id = "BROKER", .heartbeat_interval = 30 }; SessionManager session{transport, config}; session.initiate_logon(); // Send order (zero allocation) MessageAssembler asm_; NewOrderSingle::Builder order; auto msg = order .cl_ord_id("ORD001") .symbol("AAPL") .side(Side::Buy) .order_qty(Qty::from_int(100)) .ord_type(OrdType::Limit) .price(FixedPrice::from_double(150.00)) .build(asm_); transport.send(msg);
Parse Incoming Messages
// Zero-copy parsing FixParser parser; auto result = parser.parse(raw_buffer); if (result) { auto& msg = *result; auto order_id = msg.get_string(Tag::OrderID); // O(1) lookup auto exec_type = msg.get_char(Tag::ExecType); // No allocation auto fill_qty = msg.get_qty(Tag::LastQty); // Type-safe }
Documentation
- CHANGELOG.md for release history and upgrade notes
- BENCHMARK_REPRODUCTION.md for reproducing published measurements
- CONTRIBUTING.md for contribution boundaries and code standards
- SECURITY.md for coordinated vulnerability disclosure
- SUPPORT.md for bug reports, usage questions, and response expectations
- ROADMAP.md for near-term and mid-term open-source priorities
- docs/COVERAGE_LIMITATIONS.md for coverage-build caveats and usage boundaries
docs/compare/for benchmark reports and optimization writeupsdocs/design/for architecture notes and design tickets that are public
Community
- Support: SUPPORT.md
- Contributing: CONTRIBUTING.md
- Security: SECURITY.md
- Code of Conduct: CODE_OF_CONDUCT.md
Build Options
| CMake Option | Default | Description |
|---|---|---|
NFX_ENABLE_SIMD |
ON | AVX2/AVX-512 SIMD acceleration |
NFX_ENABLE_IO_URING |
OFF | Linux io_uring transport |
NFX_BUILD_BENCHMARKS |
ON | Build benchmark suite |
NFX_BUILD_TESTS |
ON | Build unit tests |
NFX_BUILD_EXAMPLES |
ON | Build examples |
NFX_ENABLE_COVERAGE |
OFF | Coverage instrumentation for CI/local test analysis only; not for production or benchmarks |
# Build with all optimizations cmake -B build -DCMAKE_BUILD_TYPE=Release -DNFX_ENABLE_SIMD=ON cmake --build build -j # Run benchmarks ./start.sh bench 100000 # Compare with QuickFIX ./start.sh compare 100000
Benchmarking
Verify performance claims by running benchmarks yourself.
Quick Start
# Run parser and session benchmarks ./start.sh bench 100000 # Example output: # [BENCHMARK] ExecutionReport Parse # Iterations: 100000 # Mean: 246 ns # P50: 245 ns # P99: 258 ns
QuickFIX Comparison
Compare NexusFIX against QuickFIX (requires QuickFIX installed):
# Install QuickFIX first # Ubuntu: sudo apt install libquickfix-dev # Or build from source: https://github.com/quickfix/quickfix # Run comparison ./start.sh compare 100000
Full Reproduction Guide
For detailed instructions on reproducing benchmark results, including:
- Environment setup (CPU governor, pinning, priority)
- Build configuration options
- Interpreting results
- Troubleshooting
Technical References
- API Reference - Complete API documentation
- Implementation Guide - Architecture overview
- Benchmark Report - Detailed performance analysis
- Modern C++ Techniques - Optimization techniques used
Project Structure
nexusfix/
├── include/nexusfix/
│ ├── parser/ # Zero-copy FIX parser (SIMD)
│ ├── session/ # Session state machine
│ ├── transport/ # TCP / io_uring / Winsock transport
│ ├── platform/ # Cross-platform abstraction
│ ├── types/ # Strong types (Price, Qty, Side)
│ ├── memory/ # PMR buffer pools
│ ├── store/ # Message store (PMR-optimized)
│ ├── serializer/ # Message serialization
│ ├── util/ # Utilities (diagnostics, formatting)
│ ├── messages/fix44/ # FIX 4.4 message builders
│ └── interfaces/ # Concepts and interfaces
├── benchmarks/ # Performance benchmarks
├── tests/ # Unit tests
├── examples/ # Example programs
└── docs/ # Documentation
FAQ
How does NexusFIX achieve zero-copy parsing?
NexusFIX uses std::span<const char> to create views into the original network buffer. Field values are never copied - the parser returns spans pointing to the exact byte range in the source buffer. This eliminates all memcpy and heap allocation overhead.
Is NexusFIX compatible with QuickFIX?
NexusFIX implements the same FIX 4.4/5.0 protocol standards but with a different API optimized for performance. It is wire-compatible with any FIX counterparty, including systems using QuickFIX.
What latency can I expect in production?
In our benchmarks: ~250 nanoseconds for ExecutionReport parsing. Actual production latency depends on network, kernel configuration, and hardware. NexusFIX is designed to minimize the application-layer overhead.
Does NexusFIX support FIX Repeating Groups?
Yes. Repeating groups are parsed with the same zero-copy approach. Group iteration is O(1) per entry.
Use Cases
NexusFIX is designed for:
- High-Frequency Trading (HFT) - Sub-microsecond message processing
- Algorithmic Trading Systems - Low-latency order routing
- Market Making - High-throughput quote updates
- Smart Order Routing (SOR) - Multi-venue connectivity
- Trading Infrastructure - FIX gateways and bridges
Contact
For questions or collaboration: nonagonal.portal@gmail.com
Development
Built with Modern C++23. Optimized via hardware-aware high-performance patterns including cache-line alignment, SIMD vectorization, and zero-copy memory design. Verified through rigorous benchmarking and AI-assisted static analysis.
For technical deep-dives on our optimization journey, see Optimization Diary.
Contributing
This project is maintained by StratCraftsAI.
- Issues & Discussions: Welcome for bug reports, performance questions, and feature discussions
- Pull Requests: Bug fixes and performance optimizations welcome (see CONTRIBUTING.md)
- Feature PRs require prior discussion in Issues
- Performance PRs must include benchmark data (before/after)
All contributions must follow:
- C++23 standards
- Zero allocation on hot paths
- Include benchmarks for performance changes
License
MIT License - See LICENSE file.
Built with Modern C++23 for ultra-low latency quantitative trading