Show HN: AgentDX – Open-source linter and LLM benchmark for MCP servers

1 points by yamarldfst 4 months ago · 0 comments · 1 min read

Reader

MCP servers are proliferating fast, but most have vague tool descriptions and incomplete schemas that make LLMs pick the wrong tool or fill parameters incorrectly.

AgentDX is a CLI that measures this. Two commands:

- `npx agentdx lint` — static analysis of tool descriptions, schemas, and naming. 18 rules, zero config, no API key. Produces a lint score.

- `npx agentdx bench` — sends your tool definitions to an LLM (Anthropic, OpenAI, or Ollama) and evaluates tool selection accuracy, parameter correctness, ambiguity handling, multi-tool orchestration, and error recovery. Produces an Agent DX Score (0-100).

It auto-detects the server entry point, spawns it, connects as an MCP client, and reads tools via the protocol. Bench auto-generates test scenarios from your tool definitions.

Built in TypeScript, MIT licensed. Early alpha — the bench command works but is slow (sequential LLM calls, parallelization is next). Feedback welcome.

No comments yet.

Settings

Show HN: AgentDX – Open-source linter and LLM benchmark for MCP servers

Keyboard Shortcuts