Lost in Technopolis - NFHN Reader

I’ve been spending most of my coding time inside Claude Code, Anthropic’s agentic CLI tool. Out of the box, it’s already quite capable: it reads your codebase, edits files, runs commands, handles git workflows, and spawns sub-agents for parallel work. But the real payoff comes from the community tools that have grown up around it. Over the past several months I’ve assembled a set of plugins and companion utilities that, taken together, have changed how I work.

Claude Code and Agent Teams

The feature I rely on most heavily is Agent Teams. It’s still experimental, but it lets you coordinate multiple Claude Code instances working in parallel. One session acts as the team lead, creating a shared task list and spawning teammates that each get their own context window. Teammates can message each other directly and claim tasks independently. I use Agent Teams for code reviews (security, performance, and test coverage each handled by a separate reviewer), for debugging with competing hypotheses, and for building new features where frontend, backend, and tests can each be owned by a different agent.

Agent Teams differ from sub-agents in an important way: sub-agents report results back to the parent and never talk to each other, while teammates share a task list, claim work, and communicate directly. The coordination overhead is real, and token usage scales with the number of active teammates, so I reach for them only when the work genuinely benefits from parallel exploration.

claude-prompts

My claude-prompts repository is a collection of commands, agents, and skills that I’ve written or gathered for use with Claude Code. It currently contains about 30 commands, 14 agent definitions, and 12 skills. The three categories serve different purposes:

Commands are workflow instructions. Examples include commit (which teaches Claude how to produce atomic, logically-sequenced commits), code-review, push, fix-github-issue, and nix-rebuild.
Agents are expert personas. I have language-specific agents for Python, TypeScript, C++, Rust, Haskell, Emacs Lisp, SQL, and Nix, plus specialized roles like prompt-engineer and web-searcher.
Skills are modular instruction sets. The claude-code skill, for instance, primes every session with protocols for using the other tools described in this post: how to search memory before re-investigating, how to save significant findings, how to use the context guard, and how to consult external models.

Each skill follows a standardized format with a SKILL.md file containing YAML frontmatter (name, description, optional tool bindings, temperature) and Markdown instructions. This makes them easy to share. The repository is public, and I encourage others to contribute or fork it.

claude-mem

Claude-mem solves what I consider the biggest limitation of working with LLMs: the loss of context between sessions. It’s an MCP plugin that captures everything Claude does during a session, compresses it using AI, and injects relevant context back into future sessions.

It uses a three-step search workflow that’s roughly ten times more token-efficient than fetching everything at once:

search(query) returns a compact index with observation IDs at about 50–100 tokens per result.
timeline(anchor=ID) shows what was happening before and after a specific observation, providing narrative context without full details.
get_observations([IDs]) fetches the complete observation records only for the IDs you actually need.

Under the hood, claude-mem runs a worker service on localhost:37777 backed by SQLite (with FTS5 full-text indexing) and Chroma for semantic vector search. It hooks into Claude Code’s lifecycle events: SessionStart initializes tracking, UserPromptSubmit injects past context, PostToolUse captures tool results, FileEdit tracks modifications, and SessionEnd generates a summary. There’s also a web-based viewer for browsing the memory timeline.

The protocol I follow is: search memory before re-investigating anything, and save significant findings (discoveries, decisions, completed work, bug root causes, gotchas) after doing the work. Future sessions see the title and token cost of each observation in a context index and can decide whether to fetch the full record.

Cozempic

While claude-mem solves the cross-session memory problem, within a single session there’s a different threat. Cozempic is a context-weight management tool for a problem specific to long Claude Code sessions: context bloat. A typical session accumulates 8–46MB of dead weight, including hundreds of progress tick messages, repeated thinking blocks, stale file reads superseded by later edits, duplicate document blocks, and metadata noise. When context gets too large, Claude’s auto-compaction summarizes away critical state. For Agent Teams, this is disastrous: the lead agent’s context gets compacted, team messages are discarded, the lead forgets teammates exist, and sub-agents become orphaned.

It has 13 pruning strategies organized into three prescription tiers:

Gentle (under 5MB savings): progress collapse, file-history dedup, metadata strip.
Standard (5–20MB savings): adds thinking block removal, tool output trimming, stale read removal, system-reminder dedup.
Aggressive (over 20MB savings): adds error-retry collapse, document dedup, mega-block trim, envelope strip.

The guard daemon starts automatically at session init and provides multi-layer protection for Agent Teams: continuous checkpoints, hook-driven checkpoints after every task spawn, tiered pruning at soft and hard thresholds, and reactive overflow recovery with a circuit breaker. It never prunes task-related tool calls (Task, TaskCreate, TaskUpdate, TeamCreate, SendMessage) and maintains the conversation DAG.

Cozempic has zero external dependencies: it’s pure Python 3.10+ stdlib. I run it with cozempic init once per machine, and it wires its hooks into .claude/settings.json automatically.

agnix

Agnix is a linter and language server for AI agent configurations. It validates the configuration files that Claude Code, Cursor, GitHub Copilot, Gemini CLI, and other tools rely on, catching misconfigurations before they silently fail.

It ships with 169 rules across eleven tools. For Claude Code specifically, it validates CLAUDE.md (project memory), SKILL.md (skill definitions), .claude/settings.json (hooks), agent definitions, and plugin manifests. It covers all 14 valid hook events and checks exit code behavior, frontmatter structure, required fields, and cross-file consistency.

Agnix supports three auto-fix confidence levels (HIGH, MEDIUM, LOW) and has a dry-run mode with inline diff preview. I run it as part of my CI pipeline via the GitHub Action:

- name: Validate agent configs
  uses: avifenesh/agnix@v0
  with:
    target: 'claude-code'

The tool is written in Rust as a workspace of six crates (rules, core, CLI, LSP, MCP server, and WASM bindings) and is available through npm, Homebrew, and Cargo. Editor extensions exist for VS Code, JetBrains, Neovim, and Zed.

Beads

Beads is a distributed, git-backed issue tracker designed for AI agents. It replaces the messy markdown plans that tend to proliferate during long-running AI-assisted work with a proper dependency-aware graph database that persists across sessions and travels with your code.

A few design decisions make it work well for agents: hash-based IDs prevent merge collisions when multiple agents or branches operate concurrently; the bd ready command finds tasks with no open blockers; bd update <id> --claim atomically claims a task to prevent race conditions; and all commands support --json for programmatic use.

Beads uses a three-layer data model: SQLite for fast local queries, JSONL for git-tracked portability (one entity per line, merge-friendly diffs), and git itself for distribution. It also supports Dolt as an alternative backend with cell-level version control and native branching.

The Claude Code integration works through bd prime, which generates roughly 1–2k tokens of workflow context (ready work, recent changes, blockers) and injects it via a SessionStart hook. The design intentionally avoids MCP or Claude Skills: bd prime is cheaper in tokens and works across all editors.

It’s built up quite a community – over 30 companion tools at last count, including terminal UIs, web dashboards, VS Code extensions, and a Jira sync.

git-ai

Git-AI is a Git extension that tracks which code was AI-generated and which was hand-written. It records which AI tool, model, and prompt generated each line of code and stores this metadata in Git Notes (refs/notes/ai), ensuring that attributions survive rebases, squashes, and cherry-picks.

Git-AI supports nearly 20 coding agents, including Claude Code, Cursor, GitHub Copilot, Google Gemini CLI, and OpenAI Codex. It uses pre-edit and post-edit hooks to mark AI contributions automatically. On commit, each authorship log is attached to the new commit as a Git Note addressed by commit SHA.

The commands I use most:

git-ai blame to see AI authorship attribution alongside standard git blame.
git-ai show-prompt to retrieve the conversation context that produced specific code.
git-ai stats to analyze what percentage of the codebase was AI-generated.

The tool is written in Rust, requires no per-repository setup (you install it once per machine), and adds negligible overhead to Git operations. It follows the open Git AI Standard v3.0.0 specification for cross-agent compatibility.

For teams, git-ai shows how much AI-generated code is in the codebase and who is using which tools – useful for understanding productivity and for the regulatory requirements around AI disclosure that are starting to appear.

TaskMaster.ai

TaskMaster.ai is a task management system that integrates with Claude Code through MCP. You feed it a Product Requirements Document, and it parses the requirements into structured, sequenced tasks with dependency tracking and complexity scores. It then guides the AI agent through implementation one focused task at a time, which helps avoid the quality drop you get when an agent tries to do too much at once.

The workflow follows a cycle: parse requirements, break into tasks, implement incrementally, verify progress with human review, and proceed to the next task based on dependencies. TaskMaster can use web search to inform its task breakdown and will adjust upcoming tasks when the implementation takes a different direction than originally planned.

The MCP server has 36 tools, with selective loading to keep context window usage down (you can go from the full 21k tokens to a much leaner subset). It works with multiple AI providers (Anthropic, OpenAI, Google Gemini, Perplexity) and supports three configurable model roles: main, research, and fallback. Since the MCP server runs through your Claude Code subscription, there’s no additional API cost.

You can see how I actually run TaskMaster by looking at my run-taskmaster script, which is meant to be paired with my claude-prompts repository, mentioned above.

Wispr Flow

Wispr Flow is a voice-to-text dictation tool. It’s the one tool in this list that isn’t Claude Code-specific, but it’s become essential to my workflow. Instead of typing prompts into Claude Code (or anything else on my Mac), I speak them.

It’s not ordinary dictation, though. It understands developer terminology, handles camelCase and snake_case correctly, removes filler words automatically, and adjusts formatting based on the application context. In a coding environment, “create a function called calculate total” becomes calculateTotal. It supports over 100 languages with automatic detection.

The speed gain matters – about 175–220 words per minute by voice versus my 45 or so by keyboard – but the bigger difference is cognitive. When typing a prompt, I tend to self-edit and compress. When speaking, I provide more detail and context, which produces better results from the model. I use Wispr Flow for dictating prompts, writing commit messages, drafting documentation, and communicating with teammates on Slack.

Wispr Flow works across Mac, Windows, and iPhone with synced personal dictionaries. Data is never used for model training unless you opt in.

MCP servers: PAL, Sequential Thinking, Context7, and Perplexity

Claude Code discovers MCP servers at startup and exposes their tools alongside its built-in capabilities. Four MCP servers have become part of my regular workflow.

PAL MCP

PAL is a multi-model collaboration server. It lets Claude consult other AI models – Gemini, GPT, Grok, or local Ollama instances – without leaving the session. I use it primarily through my heavy command in claude-prompts, which instructs Claude to confer with Gemini and ChatGPT before committing to a plan of action. The result is a form of multi-model consensus: Claude proposes an approach, PAL routes the question to the other models, and the responses come back into Claude’s context for synthesis.

PAL supports other patterns too: code review across models, structured debates where models argue opposing positions, and phased planning where different models handle different stages. It manages conversation context across providers so that a follow-up question to Gemini can reference what GPT said earlier in the same session. Configuration is through environment variables for each provider’s API key, and you can selectively enable or disable tools to control token usage.

Sequential Thinking

The Sequential Thinking MCP server provides a structured way for Claude to think through problems step by step. When Claude encounters a problem that benefits from deliberate step-by-step analysis, it can externalize its reasoning through this tool rather than attempting to hold the entire chain in a single response.

Each thought is a structured record with a thought number, an estimate of total thoughts needed, and a flag indicating whether more thinking is required. The server tracks the sequence and allows Claude to revise earlier steps, branch into alternative hypotheses, or extend the chain when the problem turns out to be more complex than initially estimated. Unlike Claude’s internal extended thinking, the sequential thinking trace is visible in the conversation and persists across turns, making it useful for problems where I want to see and audit the reasoning process.

I reach for sequential thinking when debugging complex issues, planning multi-step refactors, or working through architectural decisions where the reasoning matters as much as the conclusion.

Context7

Context7 provides up-to-date library documentation. LLMs are trained on snapshots of the world; when an API changes after the training cutoff, the model confidently generates code against the old interface. Context7 solves this by fetching current documentation on demand.

The workflow has two steps: resolve-library-id takes a library name and returns a Context7-compatible identifier, and query-docs fetches documentation for that identifier, optionally scoped to a specific topic. The results are version-specific code examples and API references pulled from source repositories. I find it most useful when working with frameworks that release frequently (React, Next.js, FastAPI) or when I need to verify that a function signature hasn’t changed since the model’s training data was collected.

Perplexity

The Perplexity MCP server gives Claude access to AI-powered web search. Under the hood, it uses Perplexity’s Sonar models, which synthesize information from multiple web sources into grounded responses with citations.

I use Perplexity when Claude needs current information that falls outside its training data: checking whether a bug has been reported upstream, finding the latest release notes for a dependency, or researching a technique I haven’t encountered before. The recency filter is particularly useful – I can restrict results to the past day or week when investigating a newly discovered issue.

How they fit together

These tools work as a system. Each one fills a specific gap in the overall workflow.

Here’s what a typical debugging session looks like: I describe the bug by voice (Wispr Flow), Claude searches memory (claude-mem) for prior context on that area of the code, creates a task in Beads, and spawns a debug team (Agent Teams) with competing hypotheses. The Cozempic guard keeps context healthy throughout. When the fix is ready, git-ai records the attribution and TaskMaster marks the task complete. If Claude needs to check a library’s current API, it queries Context7; if it needs to research an unfamiliar error, it searches with Perplexity; and if the fix is architecturally significant, PAL routes it to Gemini and GPT for a second opinion.

The claude-code skill in claude-prompts ties the loop together by documenting the protocols for all of these tools in a single session-priming document. When Claude Code starts, it reads the context index from claude-mem, checks that the Cozempic guard is running, and knows how to use PAL for multi-model consensus, git-ai for commit attribution, Beads for task tracking, and TaskMaster for structured work. Each session begins with the accumulated knowledge of all previous sessions and the structured discipline to maintain that knowledge going forward.

This setup isn’t free – between API costs for the various model providers and the time spent configuring everything, it’s an investment that pays off for heavy daily use but would be overkill for casual sessions. I don’t claim this is the only way to work with Claude Code, or even the best way. But it’s a setup that has made agentic coding feel more like working with a team that remembers what it did yesterday, and less like talking to a blank slate every morning. If you’re spending a lot of time in Claude Code, some of these might be worth trying.