Show HN: CKB – Code intelligence for AI assistants (impact, dead code, security)

2 points by SimplyLiz 3 months ago · 4 comments · 2 min read

Reader

I've been building AI-assisted coding tools for the past year and kept running into the same problem: AI assistants are blind to code structure.

When you ask "what breaks if I change this function?", the AI greps for text patterns, reads random files hoping to find context, and eventually gives up and asks you to provide more context.

CKB gives AI tools the knowledge they're missing. It indexes your codebase and exposes 80+ MCP tools for:

- Impact analysis: blast radius with risk scores before you touch anything - Dead code detection: confidence-scored candidates based on call graphs + optional telemetry - Security scanning: 26 patterns for exposed secrets (API keys, tokens, credentials) - Ownership: CODEOWNERS + git-blame fusion with time decay - Affected tests: run only what matters instead of the full suite - Multi-repo federation: query across all your repositories

Works with Claude Code, Cursor, Windsurf, VS Code, and anything that speaks MCP. Also has CLI and HTTP API for CI/CD integration.

Technical details: - Written in Go - Uses SCIP indexes for precise symbol resolution - Incremental indexing (updates in seconds) - Presets system for token optimization (load 14-81 tools based on task) - Three-tier caching with auto-invalidation

Install: `npm install -g @tastehub/ckb && ckb init`

Free for personal use and small teams. Source on GitHub.

Would love feedback, especially on the MCP tool design and what's missing for your workflows.

justinlords 3 months ago

This is exactly what I've been screaming about, AI coding assistants are basically playing guess the impact with our production code. The fact that you're exposing actual call graphs and blast radius through MCP tools instead of making Claude hallucinate dependencies is huge from my pov. Installing this now to test with our multi-repo setup. Does the telemetry integration for dead code detection require specific instrumentation? does it hook into existing APM tools?

SimplyLizOP 3 months ago

Thanks! For multi-repo, check out the federation features (--preset federation) It handles cross-repo symbol resolution and blast radius across service boundaries.
See docs: https://codeknowledge.dev/docs/Federation
On dead code detection: CKB has two modes:
1. Static analysis (findDeadCode tool, v7.6+) - requires zero instrumentation. Uses the SCIP index to find symbols with no inbound references in the codebase. Good for finding obviously dead exports, unused internal functions, etc. No telemetry needed. 2. Telemetry-enhanced (findDeadCodeCandidates, v6.4+) - ingests runtime call data to find code that exists but is never executed in production. This is where APM integration comes in.
For the telemetry integration: It hooks into any OTEL-compatible collector. No custom instrumentation required, it parses standard OTLP metrics:
- span.calls, http.server.request.count, rpc.server.duration_count, grpc.server.duration_count - Extracts function/namespace/file from span attributes (configurable via telemetry.attributes.functionKeys, etc.)
You'd configure a pipeline from your APM (Datadog, Honeycomb, Jaeger, whatever) to forward aggregated call counts to CKB's ingest endpoint. The matcher then correlates runtime function names to SCIP symbol
IDs with confidence scoring (exact: file+function+line, strong: file+function, weak: namespace+function only).
Full setup: https://codeknowledge.dev/docs/Telemetry
The static analysis mode is probably enough to start with. Telemetry integration is for when you want "this code hasn't been called in 90 days" confidence rather than "this code has no static references."

storystarling 3 months ago

Curious how the token optimization presets balance context window costs against the depth of call graph analysis. I've found that aggressively pruning context to save on input tokens often degrades reasoning quality pretty quickly when dealing with complex dependencies.

SimplyLizOP 3 months ago

The architecture separates tool availability from result depth, which addresses exactly that concern.
Presets control tool availability, not output truncation. The core preset exposes 19 tools (~12k tokens for definitions) vs full with 50+ tools. This affects what the AI can ask for, not what it gets back. The AI can dynamically call expandToolset mid-session to unlock additional tools when needed.
Depth parameters control which analyses run, not result pruning. For compound tools like explore: - shallow: 5 key symbols, skips dependency/change/hotspot analysis entirely - standard: 10 key symbols, includes deps + recent changes, parallel execution - deep: 20 key symbols, full analysis including hotspots and coupling
This is additive query selection. The call graph depth (1-4 levels) is passed through unchanged to the underlying traversal—if you ask for depth 3, you get full depth 3, not a truncated version.
On token optimization specifically: CKB tracks token usage at the response level using WideResultMetrics (measures JSON size, estimates tokens at ~4 bytes/token for structured data). When truncation does occur (explicit limits like maxReferences), responses include transparent TruncationInfo metadata with reason, originalCount, returnedCount, and droppedCount. The AI sees exactly what was cut and why.
The compound tools (explore, understand, prepareChange) reduce tool calls by 60-70% by aggregating what would be sequential queries into parallel internal execution. This preserves reasoning depth while cutting round-trip overhead. The AI can always fall back to granular tools (getCallGraph, findReferences) when it needs explicit control over traversal parameters.

Settings

Show HN: CKB – Code intelligence for AI assistants (impact, dead code, security)

Keyboard Shortcuts