@portkey-ai/mcp-tool-filter
Ultra-fast semantic tool filtering for MCP (Model Context Protocol) servers using embedding similarity. Reduce your tool context from 1000+ tools down to the most relevant 10-20 tools in under 10ms.
Features
- ⚡ Lightning Fast: <10ms filtering latency for 1000+ tools with built-in optimizations
- 🚀 Performance Optimized: 6-8x faster dot product, smart top-K selection, true LRU cache
- 🎯 Semantic Understanding: Uses embeddings for intelligent tool matching
- 📦 Zero Dependencies on Runtime: Only requires an embedding provider API
- 🔄 Flexible Input: Accepts chat completion messages or raw strings
- 💾 Smart Caching: Caches embeddings and context for optimal performance
- 🎛️ Configurable: Tune scoring thresholds, top-k, and always-include tools
- 📊 Performance Metrics: Built-in timing for optimization
Installation
npm install @portkey-ai/mcp-tool-filter
Quick Start
import { MCPToolFilter } from '@portkey-ai/mcp-tool-filter'; // 1. Initialize the filter (choose embedding provider) // Option A: Local Embeddings (RECOMMENDED for low latency < 5ms) const filter = new MCPToolFilter({ embedding: { provider: 'local', } }); // Option B: API Embeddings (for highest accuracy) const filter = new MCPToolFilter({ embedding: { provider: 'openai', apiKey: process.env.OPENAI_API_KEY, } }); // 2. Load your MCP servers (one-time setup) await filter.initialize(mcpServers); // 3. Filter tools based on context const result = await filter.filter( "Search my emails for the Q4 budget discussion" ); // 4. Use the filtered tools in your LLM request console.log(result.tools); // Top 20 most relevant tools console.log(result.metrics.totalTime); // e.g., "2ms" for local, "500ms" for API
Embedding Provider Options
Local Embeddings (Recommended)
Pros:
- ⚡ Ultra-fast: 1-5ms latency
- 🔒 Private: No data sent to external APIs
- 💰 Free: No API costs
- 🌐 Offline: Works without internet
Cons:
- Slightly lower accuracy than API models
- First initialization downloads model (~25MB)
const filter = new MCPToolFilter({ embedding: { provider: 'local', model: 'Xenova/all-MiniLM-L6-v2', // Optional: default model quantized: true, // Optional: use quantized model for speed (default: true) } });
Available Models:
Xenova/all-MiniLM-L6-v2(default) - 384 dimensions, very fastXenova/all-MiniLM-L12-v2- 384 dimensions, more accurateXenova/bge-small-en-v1.5- 384 dimensions, good balanceXenova/bge-base-en-v1.5- 768 dimensions, higher quality
Performance:
- Initialization: 100ms-4s (one-time, downloads model)
- Filter request: 1-5ms
- Cached request: <1ms
API Embeddings
For highest accuracy, use OpenAI or other API providers:
const filter = new MCPToolFilter({ embedding: { provider: 'openai', apiKey: process.env.OPENAI_API_KEY, model: 'text-embedding-3-small', // Optional dimensions: 384, // Optional: match local model for fair comparison } });
Pros:
- 🎯 Highest accuracy: 5-15% better than local
- 🔄 Easy to switch models
- 🌐 No local resources needed
Cons:
- 🐌 Slow: 400-800ms per request
- 💰 Costs money: ~$0.02 per 1M tokens
- 🔒 Data sent to external API
- 📶 Requires internet connection
Performance:
- Initialization: 200ms-60s (depends on tool count)
- Filter request: 400-800ms
- Cached request: 1-3ms
Quick Comparison
| Aspect | Local | API | Winner |
|---|---|---|---|
| Speed | 1-5ms | 400-800ms | 🏆 Local (200x faster) |
| Accuracy | Good (85-90%) | Best (100%) | 🏆 API |
| Cost | Free | ~$0.02/1M tokens | 🏆 Local |
| Privacy | Fully local | Data sent to API | 🏆 Local |
| Offline | ✅ Works offline | ❌ Needs internet | 🏆 Local |
| Setup | Zero config | Needs API key | 🏆 Local |
📊 See TRADEOFFS.md for detailed analysis
MCP Server JSON Format
The library expects an array of MCP servers with the following structure:
[
{
"id": "gmail",
"name": "Gmail MCP Server",
"description": "Email management tools",
"categories": ["email", "communication"],
"tools": [
{
"name": "search_gmail_messages",
"description": "Search and find email messages in Gmail inbox. Use when user wants to find, search, look up emails...",
"keywords": ["email", "search", "inbox", "messages"],
"category": "email-search",
"inputSchema": {
"type": "object",
"properties": {
"q": { "type": "string" }
}
}
}
]
}
]Field Descriptions
Required Fields:
id: Unique identifier for the servername: Human-readable server nametools: Array of tool definitionsname: Unique tool namedescription: Rich description of what the tool does and when to use it
Optional but Recommended:
description: Server-level descriptioncategories: Array of category tags for hierarchical filteringkeywords: Array of synonym/related terms for better matchingcategory: Tool-level categoryinputSchema: JSON schema for parameters (parameter names are used for matching)
Tips for Best Results
-
Rich Descriptions: Write detailed descriptions with use cases
"description": "Search emails in Gmail. Use when user wants to find, lookup, or retrieve messages, correspondence, or mail."
-
Add Keywords: Include synonyms and variations
"keywords": ["email", "mail", "inbox", "messages", "correspondence"]
-
Mention Use Cases: Explicitly state when to use the tool
"description": "... Use when user wants to draft, compose, write, or prepare an email to send later."
API Reference
MCPToolFilter
Main class for tool filtering.
Constructor
new MCPToolFilter(config: MCPToolFilterConfig)
Config Options:
{ embedding: { // Local embeddings (recommended) provider: 'local', model?: string, // Default: 'Xenova/all-MiniLM-L6-v2' quantized?: boolean, // Default: true // OR API embeddings provider: 'openai' | 'voyage' | 'cohere', apiKey: string, model?: string, // Default: 'text-embedding-3-small' dimensions?: number, // Default: 1536 (or 384 for local) baseURL?: string, // For custom endpoints }, defaultOptions?: { topK?: number, // Default: 20 minScore?: number, // Default: 0.3 contextMessages?: number, // Default: 3 alwaysInclude?: string[], // Always include these tools exclude?: string[], // Never include these tools maxContextTokens?: number, // Default: 500 }, includeServerDescription?: boolean, // Default: false (see below) debug?: boolean // Enable debug logging }
About includeServerDescription:
When enabled, this option includes the MCP server description in the tool embeddings, providing additional context about the domain/category of tools.
// Enable server descriptions in embeddings const filter = new MCPToolFilter({ embedding: { provider: 'local' }, includeServerDescription: true // Default: false });
Tradeoffs:
- ✅ Helps: General intent queries like "manage my local files" (+25% improvement)
- ❌ Hurts: Specific tool queries like "Execute this SQL query" (-50% degradation)
- ≈ Neutral: Overall impact is neutral (0% change)
Recommendation: Keep this disabled (default: false) unless your use case primarily involves high-level intent queries. See examples/benchmark-server-description.ts for detailed benchmarks.
Methods
initialize(servers: MCPServer[]): Promise<void>
Initialize the filter with MCP servers. This precomputes and caches all tool embeddings.
Note: Call this once during startup. It's an async operation that may take a few seconds depending on the number of tools.
await filter.initialize(servers);
filter(input: FilterInput, options?: FilterOptions): Promise<FilterResult>
Filter tools based on the input context.
Input Types:
// String input await filter.filter("Search my emails about the project"); // Chat messages await filter.filter([ { role: 'user', content: 'What meetings do I have today?' }, { role: 'assistant', content: 'Let me check your calendar.' } ]);
Options (all optional, override defaults):
{ topK?: number, // Max tools to return minScore?: number, // Minimum similarity score (0-1) contextMessages?: number, // How many recent messages to use alwaysInclude?: string[], // Tool names to always include exclude?: string[], // Tool names to exclude maxContextTokens?: number, // Max context size }
Returns:
{ tools: ScoredTool[], // Filtered and ranked tools metrics: { totalTime: number, // Total time in ms embeddingTime: number, // Time to embed context similarityTime: number, // Time to compute similarities toolsEvaluated: number, // Total tools evaluated } }
getStats()
Get statistics about the filter state.
const stats = filter.getStats(); // { // initialized: true, // toolCount: 25, // cacheSize: 5, // embeddingDimensions: 1536 // }
clearCache()
Clear the context embedding cache.
Performance Optimization
Built-in Optimizations
The library includes several performance optimizations out of the box:
- 🚀 Loop-Unrolled Dot Product - Vector similarity computation is 6-8x faster through CPU pipeline optimization
- 📊 Smart Top-K Selection - Hybrid algorithm uses fast built-in sort for typical workloads, switches to heap-based selection for 500+ tools
- 💾 True LRU Cache - Intelligent cache eviction based on access patterns, not just insertion order
- 🎯 In-Place Operations - Reduced memory allocations through in-place vector normalization
- ⚡ Set-Based Lookups - O(1) exclusion checking instead of O(n) array scanning
These optimizations are automatic and transparent - no configuration needed!
Latency Breakdown
Typical performance for 1000 tools:
Building context: <1ms
Embedding API call: 3-5ms (cached: 0ms)
Similarity computation: 1-2ms (6-8x faster with optimizations)
Sorting/filtering: <1ms (hybrid algorithm)
─────────────────────────────
Total: 5-9ms
User Configuration Tips
-
Use Smaller Embeddings: 512 or 1024 dimensions for faster computation
embedding: { provider: 'openai', model: 'text-embedding-3-small', dimensions: 512 // Faster than 1536 }
-
Reduce Context Size: Fewer messages = faster embedding
defaultOptions: { contextMessages: 2, // Instead of 3-5 maxContextTokens: 300 }
-
Leverage Caching: Identical contexts reuse cached embeddings (0ms)
-
Tune topK: Request fewer tools if you don't need 20
await filter.filter(input, { topK: 10 });
Performance Benchmarks
Micro-benchmarks showing optimization improvements:
Dot Product (1536 dims): 0.001ms vs 0.006ms (6x faster)
Vector Normalization: 0.003ms vs 0.006ms (2x faster)
Top-K Selection (<500 tools): Uses optimized built-in sort
Top-K Selection (500+ tools): O(n log k) heap-based selection
LRU Cache Access: True access-order tracking
See the existing benchmark examples for end-to-end performance testing:
npx ts-node examples/benchmark.ts
Integration Examples
With Portkey AI Gateway
import Portkey from 'portkey-ai'; import { MCPToolFilter } from '@portkey-ai/mcp-tool-filter'; const portkey = new Portkey({ apiKey: '...' }); const filter = new MCPToolFilter({ /* ... */ }); await filter.initialize(mcpServers); // Filter tools based on conversation const { tools } = await filter.filter(messages); // Convert to OpenAI tool format const openaiTools = tools.map(t => ({ type: 'function', function: { name: t.toolName, description: t.tool.description, parameters: t.tool.inputSchema, } })); // Make LLM request with filtered tools const completion = await portkey.chat.completions.create({ model: 'gpt-4', messages: messages, tools: openaiTools, });
With LangChain
import { ChatOpenAI } from 'langchain/chat_models/openai'; import { MCPToolFilter } from '@portkey-ai/mcp-tool-filter'; const filter = new MCPToolFilter({ /* ... */ }); await filter.initialize(mcpServers); // Create a custom tool selector async function selectTools(messages) { const { tools } = await filter.filter(messages); return tools.map(t => convertToLangChainTool(t)); } // Use in your agent const model = new ChatOpenAI(); const tools = await selectTools(messages); const response = await model.invoke(messages, { tools });
Caching Strategy
// Recommended: Initialize once at startup let filterInstance: MCPToolFilter; async function getFilter() { if (!filterInstance) { filterInstance = new MCPToolFilter({ /* ... */ }); await filterInstance.initialize(mcpServers); } return filterInstance; } // Use in request handlers app.post('/chat', async (req, res) => { const filter = await getFilter(); const result = await filter.filter(req.body.messages); // ... use filtered tools });
Benchmarks
Performance on various tool counts (M1 Max):
Local Embeddings (Xenova/all-MiniLM-L6-v2):
| Tools | Initialization | Filter (Cold) | Filter (Cached) |
|---|---|---|---|
| 10 | ~100ms | 2ms | <1ms |
| 100 | ~500ms | 3ms | <1ms |
| 500 | ~2s | 4ms | 1ms |
| 1000 | ~4s | 5ms | 1ms |
| 5000 | ~20s | 8ms | 2ms |
API Embeddings (OpenAI text-embedding-3-small):
| Tools | Initialization | Filter (Cold) | Filter (Cached) |
|---|---|---|---|
| 10 | ~200ms | 500ms | 1ms |
| 100 | ~1.5s | 550ms | 2ms |
| 500 | ~6s | 600ms | 2ms |
| 1000 | ~12s | 650ms | 3ms |
| 5000 | ~60s | 800ms | 4ms |
Key Takeaways:
- 🚀 Local embeddings are 200-300x faster for filter requests
- ✅ Local embeddings meet the <50ms target easily
- 💰 Local embeddings have no API costs
- 📊 API embeddings may have slightly higher accuracy
- ⚡ Both benefit significantly from caching
Note: Initialization is a one-time cost. Choose local embeddings for low latency, API embeddings for maximum accuracy.
When to Use Local vs API Embeddings
Use Local Embeddings when:
- ⚡ You need ultra-low latency (<10ms)
- 🔒 Privacy is important (no external API calls)
- 💰 You want zero API costs
- 🌐 You need offline operation
- 📊 "Good enough" accuracy is acceptable
Use API Embeddings when:
- 🎯 You need maximum accuracy
- 🌍 You have good internet connectivity
- 💵 API costs are not a concern
- 📈 You're dealing with complex/nuanced queries
Recommendation: Start with local embeddings. Only switch to API if accuracy is insufficient.
Testing Local vs API
Compare performance for your use case:
npx ts-node examples/test-local-embeddings.ts
This will benchmark both providers and show you:
- Initialization time
- Average filter time
- Cached filter time
- Speed comparison
Debugging & Performance Monitoring
Enable Debug Logging
To see detailed timing logs for each request, enable debug mode:
const filter = new MCPToolFilter({ embedding: { /* ... */ }, debug: true // Enable detailed timing logs });
This will output detailed logs for each filter request:
=== Starting filter request ===
[1/5] Options merged: 0.12ms
[2/5] Context built (156 chars): 0.34ms
[3/5] Cache MISS (lookup: 0.08ms)
→ Embedding generated: 1247.56ms
[4/5] Similarities computed: 1.23ms (25 tools, 0.049ms/tool)
[5/5] Tools selected & ranked: 0.15ms (5 tools returned)
=== Total filter time: 1249.48ms ===
Breakdown: merge=0.12ms, context=0.34ms, cache=0.08ms, embedding=1247.56ms, similarity=1.23ms, selection=0.15ms
Timing Breakdown
Each filter request logs 5 steps:
- Options Merging (
merge): Merge provided options with defaults - Context Building (
context): Build the context string from input messages - Cache Lookup & Embedding (
cache+embedding):- Cache HIT: 0ms embedding time (reuses cached embedding)
- Cache MISS: Calls embedding API (typically 200-2000ms depending on provider)
- Similarity Computation (
similarity): Compute cosine similarity for all tools- Also shows per-tool average time
- Tool Selection (
selection): Filter by score and select top-K tools
Example: Testing Timings
See examples/test-timings.ts for a complete example:
export OPENAI_API_KEY=your-key-here
npx ts-node examples/test-timings.tsThis will run multiple filter requests showing:
- Cache miss vs cache hit performance
- Different query types
- Chat message context handling
Performance Metrics
Every filter request returns detailed metrics:
const result = await filter.filter(input); console.log(result.metrics); // { // totalTime: 1249.48, // Total request time in ms // embeddingTime: 1247.56, // Time spent on embedding API // similarityTime: 1.23, // Time computing similarities // toolsEvaluated: 25 // Number of tools evaluated // }
Monitoring in Production
const result = await filter.filter(messages); // Log metrics for monitoring logger.info('Tool filter performance', { totalTime: result.metrics.totalTime, embeddingTime: result.metrics.embeddingTime, cached: result.metrics.embeddingTime === 0, toolsReturned: result.tools.length, }); // Alert if too slow if (result.metrics.totalTime > 5000) { logger.warn('Slow filter request', result.metrics); }
Advanced Usage
Two-Stage Filtering
For very large tool sets, use hierarchical filtering:
// Stage 1: Filter by server categories const relevantServers = mcpServers.filter(server => server.categories?.some(cat => userIntent.includes(cat)) ); // Stage 2: Filter tools within relevant servers const result = await filter.filter(messages);
Custom Scoring
Combine embedding similarity with keyword matching:
const { tools } = await filter.filter(input); // Boost tools with exact keyword matches const boostedTools = tools.map(tool => { const hasKeywordMatch = tool.tool.keywords?.some(kw => input.toLowerCase().includes(kw.toLowerCase()) ); return { ...tool, score: hasKeywordMatch ? tool.score * 1.2 : tool.score }; }).sort((a, b) => b.score - a.score);
Always-Include Power Tools
Always include certain essential tools:
const filter = new MCPToolFilter({ // ... defaultOptions: { alwaysInclude: [ 'web_search', // Always useful 'conversation_search', // Access to context ], } });
Troubleshooting
Slow First Request
Problem: First filter call is slow.
Solution: The embedding API call takes 3-5ms. Subsequent calls with similar context are cached and much faster.
// Warm up the cache await filter.filter("hello"); // ~5ms await filter.filter("hello"); // ~1ms (cached)
Poor Tool Selection
Problem: Wrong tools are being selected.
Solutions:
- Improve tool descriptions with more keywords and use cases
- Lower the
minScorethreshold - Increase
topKto include more tools - Add important tools to
alwaysInclude
Memory Usage
Problem: High memory usage with many tools.
Solution: Use smaller embedding dimensions:
embedding: { dimensions: 512 // Instead of 1536 }
This reduces memory by ~66% with minimal accuracy loss.
License
MIT
Contributing
Contributions welcome! Please open an issue or PR.
Support
- GitHub Issues: github.com/portkey-ai/mcp-tool-filter
- Email: support@portkey.ai