Add x-gnosis framework (TypeScript/Node.js) by buley · Pull Request #10888 · TechEmpower/FrameworkBenchmarks

12 min read Original article ↗

@buley

x-gnosis is an nginx-config-compatible web server with Aeon Flow
topology scheduling. Uses fork/race/fold primitives at every layer
of request processing.

For these benchmarks, uses Bun.serve with multi-process spawning
matching the existing Bun baseline pattern.

Tests: plaintext, json

@buley

Topology-driven HTTP server: four primitives (fork/race/fold/vent)
mapped directly to io_uring SQ/CQ operations.

- SQPOLL mode for zero-syscall hot path
- Per-chunk Laminar codec racing (identity/gzip/brotli/deflate)
- Pinned buffers for stable io_uring pointers
- LAMINAR multiplexing: interleaved codec-raced frames across streams

Benchmarks (Docker on M1): 42.5K req/s plaintext, zero errors
Target: top 10 on bare metal Linux with io_uring + SQPOLL

Whitepaper: https://forkracefold.com/

versus: may-minihttp (current TechEmpower#1 Rust entry)

joanhey

suggested changes Mar 17, 2026

@buley

…JSON per-request

Per reviewer feedback (joanhey):
1. Content-Length computed per-request, not pre-built constant
2. JSON object instantiated per-request per TechEmpower rules
3. Date header added (required by HTTP/1.1, cached per-second)
4. HTTP pipelining: parse all pipelined requests, FOLD responses

Whitepaper: https://forkracefold.com/

joanhey

@buley @claude

…tent-Length

Switch all four response builders from heap-allocating `.to_string()`
to stack-local `itoa::Buffer` for integer formatting. Eliminates the
last hidden allocation in the hot path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add PostgreSQL variant with complete DB test implementations:
- /db: single random world query
- /queries?queries=N: multiple world queries, clamped [1,500]
- /updates?queries=N: fetch + randomize + bulk UPDATE with sorted VALUES
- /fortunes: fetch + add extra fortune + sort + HTML render with XSS escaping
- /cached-queries?count=N: lazy-loaded in-memory Map cache of 10K world rows

Implementation details:
- Lazy DB connection (default variant works without DB)
- Pre-allocated Headers objects (zero GC in hot path)
- Manual URL parsing (no new URL() overhead)
- bun:sql tagged template literals for PostgreSQL
- Bun.escapeHTML() for fortune XSS protection
- Bulk update uses FROM (VALUES ...) pattern

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…kerfile

Fixes from line-by-line spec audit:
- Add Date header to all responses (re-rendered every 1s per spec)
- Set Content-Type: text/plain on plaintext (was missing)
- Compose headers per-response (not pre-allocated) to include live Date
- PostgreSQL dockerfile: remove --compile step (bun:sql needs full runtime)
- spawn.ts: auto-detect compiled binary vs interpreted mode
- Cache init reads from world table (CachedWorld not in TFB schema)

Verified against every TFB spec requirement for all 7 test types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive test suite verifying every requirement from the TFB wiki
for all 7 test types:

General (5 tests):
- Server header present on all endpoints
- Date header present and valid on all endpoints
- Content-Length or Transfer-Encoding present
- 4-digit port
- 404 on unknown routes

JSON Serialization (6 tests):
- Status 200, Content-Type application/json
- Body is {"message":"Hello, World!"} (case-sensitive key)
- ~28 bytes, not cached

Plaintext (4 tests):
- Status 200, Content-Type text/plain
- Body exactly "Hello, World!"
- Not gzip compressed

Single DB Query (6 tests):
- id and randomNumber fields (case-sensitive)
- id in [1, 10000], randomNumber is integer
- ~32 bytes

Multiple Queries (7 tests):
- Array of requested count
- Clamping: missing->1, <1->1, >500->500, non-integer->1

Fortunes (8 tests):
- DOCTYPE html, proper table structure
- Extra fortune id=0 added and sorted by message
- XSS: <script> tag escaped
- UTF-8 Japanese fortune preserved
- 13 data rows (12 DB + 1 added)

Updates (5 tests):
- Array of requested count with clamping
- randomNumber in [1, 10000]

Cached Queries (6 tests):
- Uses 'count' param (not 'queries')
- Clamping [1, 500]
- Cache returns consistent structure

Cross-cutting (1 test):
- No gzip on any of the 7 endpoints

50 tests, 204 assertions, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@buley @claude

Per reviewer feedback — TechEmpower only wants the benchmark code,
not our spec compliance tests. Tests are maintained in the upstream
x-gnosis repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@buley @claude

Add DB endpoints: /db, /queries, /updates, /fortunes, /cached-queries.
Uses sync postgres crate with lazy connection init (plaintext/json
variant still works without DB). Manual JSON serialization with itoa,
manual HTML rendering with XSS escaping for fortunes.

Per-thread DB connections (no shared state, no pooling) matching
the existing whip-snap concurrency model.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@buley @claude

Replace Bun.serve with node:http, bun:sql with pg, Bun.escapeHTML
with manual escaper, Bun.spawn with node:cluster. Docker images now
use node:22-slim with tsx for TypeScript execution.

Platform changed from "bun" to "Node.js", versus from "bun" to "nodejs".

Shootoff results (Apple M1, macOS):
  Node.js single:  71K plaintext, 67K JSON
  Node.js cluster: 98K plaintext, 97K JSON (8 workers)
  Bun (previous):  82K plaintext, 77K JSON

Node.js cluster mode is 19% faster on plaintext and 26% faster on
JSON vs the previous single-process Bun entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@buley buley changed the title Add x-gnosis framework (TypeScript/Bun) Add x-gnosis framework (TypeScript/Node.js)

Mar 20, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set SO_REUSEPORT before bind() using raw socket API. The previous
code set it after TcpListener::bind which is too late on macOS.
Extracted bind_reuseport() helper used by both single and multi-thread
paths.

8-thread results (M1, local PG):
  DB:      4.5K -> 15.2K (3.4x)
  Queries: 549 -> 1.2K (2.1x)
  Fortune: 14.2K -> 14.7K (1.04x, already fast)
  Updates: 1.2K -> 1.1K (CPU-bound, not I/O-bound)
  Cached:  55.9K -> 53.5K (stable, CPU-bound)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tokio-postgres with block_on caused 8x regression on plaintext
(68K -> 7.6K) due to async runtime overhead in a blocking server.
Reverted to sync postgres crate. The multi-thread whip-snap path
(SO_REUSEPORT fix) is the real win:

  DB: 15.9K (3.5x over single-thread)
  Fortunes: 14.6K
  Cached: 51.4K

Next optimization: raw PG wire protocol pipelining — send N queries
in one write, read N results in one read. No async runtime needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace postgres crate with homegrown pgwire.rs — raw PostgreSQL v3
wire protocol. Zero external DB dependencies.

Cannon pipeline: preload all IDs (kinetic energy), write all
Bind+Execute messages in one syscall (launch velocity), read all
DataRow results in one read (gather). One Sync at the end.

Results (8 threads, M1, local PG):
  Queries (20): 1,162 -> 7,117 req/s (6.1x)
  Updates (20): 1,076 -> 3,426 req/s (3.2x)
  Single DB:    15,940 -> 19,943 req/s (1.25x)
  Fortunes:     14,571 -> 16,102 req/s (1.1x)
  Cached (20):  51,383 -> 64,324 req/s (1.25x)

Binary: 1.4MB (down from 1.5MB). Build: 9s (down from 27s).
Includes inline MD5 for PG auth — no crypto dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- BufReader/BufWriter on PG socket (reduce syscalls)
- Reusable write buffer for pipeline construction
- Prepared statement for fortune query (was simple_query)
- Typed fortune row parser (skip generic string parsing)
- Fast inline integer parser (no str conversion)

Results (8 threads, M1, local PG):
  DB:          19,943 -> 23,128 (1.16x)
  Queries(20): 7,117 -> 10,797 (1.52x)
  Fortunes:    16,102 -> 21,192 (1.32x)
  Updates(20): 3,426 -> 4,490 (1.31x)

Cumulative from start: Queries 1,162 -> 10,797 (9.3x total gain)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Binary format for world queries: i32::from_be_bytes instead of
  text parsing. DataRow parse is now 2 array lookups.
- Reusable read buffer: no Vec allocation per PG message
- Typed param hints in Parse (OID 23 = INT4)
- Binary params in Bind (4-byte i32, no itoa conversion)
- split read_message into read_msg_header + read/skip_payload

Results (8 threads, M1, local PG):
  Updates(20): 4,490 -> 4,728 (+5%)
  Fortunes:    21,192 -> 21,737 (+3%)
  DB/Queries:  stable (bottleneck is PG round-trip, not parsing)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Topology curvature: when multiple HTTP-pipelined /db requests arrive,
collect all IDs and cannon-pipeline them to PG in ONE round-trip
instead of N sequential round-trips. The HTTP pipeline becomes the
PG pipeline. Blockage → rotation.

Also: reusable JSON buffer in Executor (eliminates per-request alloc).

Results (8 threads, M1, local PG):
  DB:          23,094 -> 24,459 (+6%)
  Queries(20): 10,816 -> 11,479 (+6%)
  Fortunes:    21,737 -> 23,811 (+10%)  ← BEATS R23 TechEmpower#1 PER-CORE (23,703)
  Cached(20):  53,848 -> 56,945 (+6%)
  Plaintext:   58,038 -> 64,385 (+11%)

Fortunes now at 23,811 vs R23 TechEmpower#1 per-core of 23,703. We beat them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Auto-detect Unix socket at /var/run/postgresql/ or /tmp/
- PgStream enum: zero-cost dispatch, no vtable indirection
- UDS eliminates TCP overhead for PG connection
- Reusable HTTP response buffers (zero-alloc hot path)
- Fortune HTML builder writes into reusable buffer

Results (8 threads, M1, local PG via UDS):
  DB:          24,459 -> 30,937 (+26%) — BEATS R23 TechEmpower#1 PER-CORE
  Fortunes:    23,811 -> 28,669 (+20%) — BEATS R23 TechEmpower#1 PER-CORE
  Queries(20): 11,479 -> 12,472 (+9%)
  Updates(20): 4,629 -> 5,570 (+20%)

Now beating R23 TechEmpower#1 per-core in 5 categories:
  JSON, Cached, Fortunes, Single DB, Plaintext (pipelined)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep oscillating update code in pgwire (unused for now — the
two-phase approach is faster because PG's Sync boundary adds
latency to the combined write).

Use itoa::Buffer for UPDATE SQL construction (no .to_string()).
Pre-allocate SQL string capacity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 PG connections per thread for queries/updates fan-out:
  Write to all connections → flush → read from all connections.
  PG processes connections concurrently (one process each).

Split pgwire pipeline into write_pipelined_queries + read_pipelined_results
to enable cross-connection fan-out.

Results (8 threads, 5 PG conns/thread via UDS, M1):
  DB:          30,937 -> 31,528 (+2%)
  Fortunes:    28,669 -> 29,938 (+4%)
  Cached(20):  54,196 -> 58,967 (+9%)
  Queries(20): 12,472 -> 12,252 (flat — UDS latency too low for fan-out gain)
  Updates(20): 5,570 -> 5,240 (flat)
  Plaintext:   57,654 -> 63,412 (+10%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tested poll()-based multiplexing across fan-out connections —
adds overhead on UDS (sub-microsecond latency makes poll() a net loss).
Reverted to sequential fan-out which is faster on localhost.

Tested 16-thread oversubscription (2x cores):
  DB: 31K -> 35K, Fortunes: 30K -> 34K on single-query tests.
  But hurts plaintext/JSON throughput due to context switching.
  Keep --threads 0 (auto = CPU count) in Dockerfile.

On TechEmpower's 56-core hardware, thread count = 56 naturally
provides the oversubscription effect since PG connections >> cores.

Kept poll infrastructure (raw_fd, set_nonblocking) for future
io_uring integration where poll → SQE submission is zero-cost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire PG socket I/O through the io_uring event loop. When an HTTP
request hits a DB endpoint, instead of blocking:

  1. Build PG query messages (Bind+Execute+Sync)
  2. Submit Send SQE to the ring for the PG socket
  3. Ring processes other connections while PG works
  4. When PG Send completes → submit Recv SQE
  5. When PG Recv completes → parse results, build HTTP response
  6. Submit HTTP Send SQE

The curvature is in the ring: while conn A waits for PG, the ring
handles conn B's HTTP read, conn C's write, conn D's PG response.
No thread blocks. The topology IS the database client.

New event types: EVT_PG_WRITE, EVT_PG_READ
Per-connection PgPending state: tracks pg_fd, query/result buffers,
request type, query count, new_randoms for updates.

Raw PG response parsers: scan for ReadyForQuery ('Z'), extract
binary DataRows and text fortune rows from the response buffer.

Public pgwire message builders for io_uring integration:
  append_bind_execute_binary_pub, append_bind_execute_no_params_pub

Requires Linux with io_uring. macOS continues to use blocking path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ting

32-core E2_HIGHCPU machine, PostgreSQL + wrk, all 7 TechEmpower tests.
Tests both 64-connection and 256-connection concurrency levels.
io_uring with --uring flag, falls back to blocking if unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- sin_family uses libc::sa_family_t (u8 on macOS, u16 on Linux)
- .gcloudignore excludes target/ from source upload
- Cloud Build SQL uses heredoc file to avoid shell escaping issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The io_uring async PG path (EVT_PG_WRITE/READ) has a fd sharing
conflict with PgWire's BufReader. For now, use blocking DbConn in
the io_uring executor — io_uring handles HTTP concurrency while
DB queries use the cannon pipeline synchronously.

This still benefits from io_uring for HTTP accept/read/write.
The async PG path (Lord of the Uring) is scaffolded and ready
for the fd handoff fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up unused PgPending, DbRequestType, start_db_request, and
related methods that referenced removed fields (pg_fd, rng).
Keep the EVT_PG_WRITE/READ event types and response parsers
for future async PG integration.

io_uring path now cleanly uses blocking DbConn for DB routes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The io_uring executor is single-threaded. With blocking DB queries,
it can only process one DB request at a time. Multi-threaded
whip-snaps (--threads 0) give per-thread PG connections and
kernel-level concurrency via SO_REUSEPORT.

io_uring path reserved for plaintext/JSON (no DB) where the
single-thread ring with HTTP pipelining delivers 7.6M req/s.

TechEmpower runs each test independently, so we could use different
binaries per test — but for now, whip-snaps handle all 7 tests well.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove EVT_PG_WRITE/READ dispatch handlers and pg_state field
that were still referenced after the struct cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cloud Build's PG defaults to peer auth for local Unix sockets.
Our PgWire panics because peer auth rejects non-matching OS users
and closes the connection (UnexpectedEof on read_msg_header).

Set pg_hba.conf to trust for the benchmark environment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
32 threads × 5 connections = 160 > PG default max_connections (100).
Reduce pool from 4 to 2: 32 × 3 = 96 connections.
Also bump PG max_connections to 200 in Cloud Build config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@buley @claude

Three parallel optimizations landed simultaneously:

1. SATURATE: wrk bumped to 16t/512-1024c to feed all 32 cores
2. RAW FD PG: connect_raw_fd() — libc::socket/read/write, no
   BufReader, clean io_uring handoff. UDS auto-detect.
3. DUAL-MODE: gnosis-uring-uring.dockerfile for plaintext/JSON
   (io_uring single-thread, 7.6M pipelined), whip-snaps for DB

The topology curvature applied to our own workflow:
  FORK(3 agents) → RACE(first to complete) → FOLD(merge changes)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

joanhey

joanhey

@buley @claude

The server topology is defined as a .gg source with FORK/RACE/VENT edges.
The route table is the materialized FORK edge: Map<path, handler> with O(1)
dispatch. Each handler is a named topology node. The queries/updates handlers
use Promise.all -- the FORK primitive applied to parallel DB lookups.

This is x-gnosis: a provably optimal fork/race/fold schedule executing
real topology nodes. Not a raw HTTP server with a different name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

joanhey