The FIX Triage Agent: Building a Local Reasoning Engine for Trading Infrastructure

13 min read Original article ↗
Code mentioned here is available to browse in full on github.

I’ve been a production support engineer at multiple fintech firms for the better part of a decade now. It’s a bit of a niche to fill, but no matter what hats you are wearing in this type of role, FIX protocol will present itself to you in all it’s idiosyncratic glory. In no time, you may find yourself in front of terminals, grepping FIX logs, correlating sequence numbers across two counterparties’ systems to figure out whose session manager decided to fall over this time. And you may ask yourself “How did I get here??” guitars

There is a certain artisan satisfaction in that detective work, but in an era of exploding complexity, “manual triage” is a diminishing return. To stay relevant, I’m pivoting from performing the investigation to encoding it. I’m going to chronicle my pivot into what I believe is being called “Agentic DevOps”. Building local, sovereign AI tools that handle the high-friction, low-creativity labor of log analysis for me. This is a journal of the first two nights of that project.

The Digital Transformation Of Accounting And Finance - Artificial  Intelligence, Robots And Chatbots
This is you.

I started on the ThinkPad X13 I use for everything else. i7, 16GB RAM, Linux Mint. The first question was: can I actually run useful inference on this thing? Answer: nah.

Even though this was obvious to me after thinking about it for 200ms, I ran a few quick tests. Mainly because I was already on the couch with the ThinkPad and I wasn’t getting up yet. Ollama installs in one curl-to-shell command and runs quantized models on CPU, but latency will bring iteration time to a crawl. A 3B model replies in seconds; a 7B model replies in tens of seconds. Fine for a “hello world” but untenable for the kind of tight develop-test loop I want when building an agent that makes multiple LLM calls per investigation.

Then I remembered my old gaming rig with an RTX 3070 in it. Sitting there, lonely. Running Windows (mainly for DAW audio work). Not being used for much gaming these days.

I remote into the Windows box, install Ollama and point it at the GPU, and quickly realize the developer experience of SSH-ing into PowerShell and writing Python in a foreign environment is going to be an ugly mess. I had a spare SSD sitting in the machine. The path of least resistance became: dual boot Linux on the extra drive and turn the GPU box into a proper AI workstation. The only drawback was having to get up and walk into the next room and start fiddling with USB sticks. But fine.

I off-shored thinking about distro choice to the homeboy Claude. He initially recommended Pop!_OS 22.04 based on stale training data and an impulse to be trendy and fit in. A quick web search confirmed System76 had shipped 24.04 LTS earlier in the year, finally bundling their long-gestating COSMIC desktop environment. Claude said “good catch”, corrected recommendation, moved on. Normal AI stuff, but worth noting: even the best LLMs will confidently serve you a year-old answer unless you force them to verify. This will matter later.

I went with Pop 24.04 despite the COSMIC newness. The case for playing it safe with familiar Ubuntu was real and the NVIDIA release would save me a few minutes of setup. COSMIC is a v1.0 desktop, new desktops have sharp edges (even though this one has rounded corners), and I’m trying to build an agent not evaluate window managers. But I’m exactly this kind of dork. I’ll take the interesting option.

Install was painless. Pop 24.04 NVIDIA edition sees the 3070 out of the box, nvidia-smi works, Ollama installs the same way it did on the laptop and this time actually uses the GPU.

One small stumble: right after driver install, nvidia-smi threw a “Driver/library version mismatch” error. Kernel module from before my most recent apt upgrade, userspace libraries from after. Real linux shit. Welcome to the hood. Rebooted, fixed. This is the minor friction you get when you run Linux, but yeah I prefer it to Windows silently deciding to install a broken driver update at 2 AM. Speaking of 2 AM, it was now much later than that and I was losing steam. Bedtime for the evening. Good Night.

With Ollama serving qwen2.5:7b-instruct-q4_K_M on the GPU machine, the real work started. First task: get the Pop box talking to my laptop so I can sit on the couch and develop remotely. SSH setup, key exchange, done. Then the Ollama API itself — by default it binds to localhost only, which makes sense for security but is inconvenient when you want to hit it from another machine on the LAN. This caught me.

systemctl edit ollama to add an environment variable binding it to 0.0.0.0 instead of 127.0.0.1. The edit appeared to save. It didn’t. Systemd’s edit command silently discarded my override. More accurately it’s because I don’t remember how to save properly with nano. It’s embarrassingly trivial but I’ll admit it here. This is a safe space. There is no sporto drill-sargent CTO watching to conflate a boneheaded move like this with actual incompetence. I spent a few minutes confused why my laptop couldn’t reach the API. Diag was straightforward once I ran ss -tlnp | grep 11434 and saw it was still bound to localhost only. Frustrated with society and myself, I reviewed the config file and just installed neovim to edit it. The broader lesson here: trust the diagnostics, and try to be less of an idiot.

With Ollama reachable, I stood up a Python venv, pip installed LangChain’s Ollama integration, and wrote a hello-world script that sends a single AI-Generated FIX message to the local model and asks for a triage:

First output was coherent. The robutt had correctly identified a session level Logout message with a sequence number complaint, and suggested the right thing to check next. Nice job dude.

Not revolutionary, but it proved the wiring works end-to-end. Infrastructure done. Sick.

From what I can tell, most of the agentic work being done in this space is being done for the traders. They need alpha. And cologne. I’ve seen plenty of agent driven executionbots, but not much out there that touches on the actual plumbing that keeps electronic trading running: FIX session management, order routing, execution reporting, and all the hairy log triage that comes with it. I’m in what feels like uncharted territory. Cool. Scary. Exciting.

It’s now around 11 PM. Thunder and Lightning storms out here in Arkansas are not like they are back in New York. They are visceral, loud, violent and beautiful. The sky is bigger out here. I notice it’s raining heavy outside. I open the window directly in front of my desk, switch from the overhead lights in the room to a dim desk bulb, and swivel my monitor to the side so the backlight isn’t killing my view out the window. I start this album, which is perfect against the sound of rain and thunder.:

The shape of what I need becomes obvious fast: structured data. An LLM staring at raw FIX strings is going to hallucinate and equivocate. An LLM handed a structured record of what happened, with parsed tags, sequence state, session lifecycle, can reason about it.

I started by standing up QuickFIX/J in Docker to generate realistic test logs. For the record: I don’t write Java, don’t want to, and have no interest in starting now, thanks. Writing a test harness as Python scripts was tempting but also wrong; I’d be re-implementing the FIX session state machine badly for sure. QuickFIX/J already has decades of correctness baked in.

So I asked the homeboy Claude to write all the Java-side configuration and harness code, reviewed it for sanity, and got out of its way. This is one of the genuinely nice things about working with an AI collaborator — using a mature toolkit written in a language I don’t like to just fuckin’ get it done. I’ll focus on the Python parsing and agent logic where I actually care about stuff.

The harness is two containers: an acceptor that simulates a venue, an initiator that acts as a client. They exchange real FIX 4.4 messages over a TCP session — logon, new order, execution report, logout.

With that scaffolding, I generated some shell scripts to reproduce five different scenarios, which I’ll outline below. These matter because each one produces a distinct signature in the logs, and the signatures reflect what’s actually happening at the network and session layers. A huge part of this job is pattern recognition1; if I can familiarize an agent with these scenarios the same way I’ve internalized them over the last decade I can build something that can more effectively replace myself in the workforce. Err.. I mean “codify my own role far more effectively”. Instead of occultist rituals to transfer my conscious mind into one of an artificial intelligence, lets just do this with code.

The baseline. TRADER logs on (34=1), VENUE accepts (34=1), TRADER sends a NewOrderSingle (34=2), VENUE sends an ExecutionReport (34=2), both sides log off (34=3). Clean, symmetric sequence progression, no retransmissions, no gaps. This is what any old limit order spot session should look like if nothing is broken. Your eye learns to scan for deviations from this pattern.

The initiator disconnects, then reconnects with a sequence number that’s jumped ahead — for example, it was at 9, and it logs back on with seq 15. This simulates what happens when state on the client side drifts from what the server expects, usually because a restart loaded stale sequence numbers or the client’s store got corrupted somehow. The acceptor detects the jump, sends a ResendRequest (35=2) asking for the missing messages, and the initiator responds with a SequenceReset/GapFill (35=4) that says “treat those messages as filled, move on.” The session stabilizes. What makes this pattern recognizable: a clean sequence progression, then a jump, then a specific two-message recovery dance. If you see 35=2 followed by 35=4, you’re looking at gap recovery.

The initiator gets killed mid-session and restarts, repeatedly. In the logs, you see multiple logon/logoff pairs stacked close in time, often with sequence number resets on each reconnection (because the config has ResetOnLogon=Y). This pattern is characteristic of an unstable client process — maybe it’s crashing, maybe a supervisor keeps restarting it, maybe there’s a deployment loop gone wrong. The giveaway is the temporal clustering of logons. A healthy session logs on once and stays up. Three logons in two minutes means something’s wrong on the other side.

This one is subtler and genuinely interesting. I use tc netem on the loopback interface to drop half the packets between the two containers. The session still kinda works — TCP retransmits, QuickFIX/J’s application-layer acknowledgments do their job, orders get delivered, executions come back. But timing is visibly degraded. Messages that normally exchange in under 100ms take several seconds. Heartbeats occasionally miss. In a real production log this would look like elevated latency without explicit error messages, which is exactly the pattern that makes network degradation hard to spot for an AI — nothing is broken, but messaging is slow. An agent trained to detect this has to reason about timing deltas between messages, not just flag explicit tag 58 error messages.

Here I use iptables to block traffic in one direction only — acceptor to initiator — for 15 seconds. The initiator stops receiving heartbeats, but its outbound messages still reach the acceptor. This is the pattern I chose to reveal one-way network failures, which are maddening in production because each side thinks the other stopped responding.

They aren’t good at dank memes yet.

In the log: heartbeats from TRADER continue on schedule, heartbeats from VENUE are missing. Eventually VENUE sends a TestRequest (35=1) asking “are you there?” — which TRADER never receives, so it never replies. When the block lifts, both sides detect the gap and go through a ResendRequest/SequenceReset recovery. The distinguishing feature is asymmetry: one direction of heartbeats stops while the other continues. In a real incident, that asymmetry is the tell that points you at routing or a intra-session firewall change rather than at either endpoint being unhealthy.

The fault injection uses tc netem and iptables on loopback — real network manipulation, scoped to lo so it can’t affect anything outside the box. Each script has a trap cleanup to remove the rules even if it crashes halfway through. You could argue this should have been done in isolated containers to protect the host from system-level changes, and you’d be right. But you’re likely reading this at a sensible hour with your full faculties intact.

A quick pythonic parser written by Claude now walks the log directories for these scripts, normalizes QuickFIX/J’s FileLog format into JSONL, labels each message with its scenario, and dumps one JSONL dataset for the agent to learn from. Selected Ambient failure modes, reproducible on demand, labeled, and parseable by an agent.

The foundation is in place and I now have training data. My thoughts on where to go from here:

The state reconstructor maybe first — deterministic Python that walks the parsed logs and builds a session-level view of what happened. Sequence progressions per party, heartbeat intervals, recovery events. No LLM in this layer.

Then the anomaly detector tailing logs. Rule-based, not AI either. “Seq gap without ResendRequest within 2 seconds.” “Heartbeat missed twice in a row.” These are the triggers that fire an investigation.

Then the agent itself: LangChain tool-calling loop with functions the model can invoke. get_session_state, get_messages_in_window, check_known_issues. The LLM orchestrates the investigation, the tools do the work.

Also, maybe eventually this wants the ability for primal knowledge retrieval — a collection of runbook notes, prior incidents, counterparty quirks — so the agent doesn’t just reason generically but reasons from the specific history of this environment. Your team probably has this stored in their brains but not in the docs.

The honest observation from the first two nights, and something I am kind of surprised by: maybe 20% of this project is AI. The other 80% is ordinary engineering. Log parsers, shell scripts, systemd overrides, Docker networking, state machines. I’ve been doing that work for a decade. The LLM is the last-mile reasoning layer on top of it, not a replacement for it.

I’m thinking about how old school car mechanics in the 70s must have felt being introduced to OBD diagnostic computers and check engine lights. It seems like magic at first, at least at this level, but it’s just another tool. The models are impressive. They’re also not magic. You still have to build the thing.

Discussion about this post

Ready for more?