GitHub - stefanwebb/named-pipes: Low-latency IPC library for building persistent agentic tool servers (LLM inference, TTS, vector search, browser automation) over named pipes on the same machine.

3 min read Original article ↗

Named Pipes as Agentic Tools

Low-latency IPC for persistent AI tool servers — LLM inference, TTS, STT, vector search, and more — all on one machine, no network stack required.


✨ Highlights

  • Persistent servers — model weights and state stay loaded between calls; no per-request startup cost
  • Kernel-speed IPC — named pipes route through kernel memory, not a network stack; lower latency than local HTTP
  • Multi-client fanout — one server handles many concurrent clients; each gets its own downstream pipe
  • Decorator API — register command handlers with a single @ch.handler("CMD") line
  • cpipe CLI — send ad-hoc commands to any running server from the terminal, like curl for pipes
  • Claude Code skill — an included skill teaches the assistant to discover and query live servers without leaving the session
  • Ready-made servers — drop-in pipes for LLM chat, text-to-speech, and speech-to-text

Overview

This library uses named pipes as the transport layer for agentic tool servers — persistent background processes that expose capabilities such as LLM inference, text-to-speech, vector search, or browser automation to a Python orchestrator running on the same machine.

Because named pipes route data through kernel memory rather than a network stack, they offer lower latency than local HTTP and far less complexity than shared memory — a practical sweet spot for real-time applications like voice agents.

The same servers can be driven directly from Claude Code. An included agent skill teaches the assistant how to discover running pipe servers with cpipe --list, inspect their capabilities, and send commands.

For a deeper look at the design decisions and API reference, see DOCS.md.

Installation

# Core library only
pip install -e .

# With LLM inference support
pip install -e ".[llm]"

# With TTS support (macOS: mlx-audio + sounddevice)
pip install -e ".[tts]"

# With STT support (sounddevice; Voxtral weights vendored)
pip install -e ".[stt]"

Requires Python 3.11+. See DOCS.md for platform-specific dependency details.

Quick start

1. Start a server (Terminal 1):

conda activate named-pipes
cpipe --serve chat   # LLM server on /tmp/tool-chat

2. Query it from the CLI (Terminal 2):

cpipe /tmp/tool-chat chat --data '{"messages": [{"role":"user","content":"Hello!"}]}'

3. Or write a client in Python:

from named_pipes.tool_client import ToolClient
import threading

class _ChatClient(ToolClient):
    def on_message(self, msg):
        if msg.get("done") is not True:
            print(msg.get("result", ""), end="", flush=True)

done = threading.Event()
with _ChatClient("chat") as ch:
    ch.send_command("chat", messages=[{"role": "user", "content": "Hello!"}])
    done.wait(timeout=30)

Examples

Start order matters — server first, then client (server creates the FIFOs).

# LLM chat
cpipe --serve chat                      # Terminal 1
python src/examples/chat_client.py     # Terminal 2

# LLM → TTS pipeline (spoken output)
cpipe --serve chat                      # Terminal 1: LLM  (/tmp/tool-chat)
cpipe --serve tts                       # Terminal 2: TTS  (/tmp/tool-tts)
python src/examples/tts_client.py      # Terminal 3: pipeline client

# Speech-to-text
cpipe --serve stt                       # Terminal 1: STT  (/tmp/tool-stt)
python src/examples/stt_client.py      # Terminal 2: subscriber

cpipe — CLI tool

cpipe /tmp/tool-chat chat --data '{"messages": [{"role":"user","content":"Hello"}]}'

cpipe --version  # show installed version
cpipe --list     # discover running ToolServer instances (tool-* pipes)
cpipe --pid      # same, plus PIDs that have each pipe open
cpipe --clear    # delete orphaned tool pipes

See DOCS.md for all options and the full protocol reference.

Claude Code skill

An included skill at .claude/skills/cpipe/SKILL.md teaches Claude Code how to use cpipe to discover, inspect, and interact with live servers — so the LLM can query a local inference server or trigger TTS playback without leaving the coding session.

Resources