Shebe
Fast Code Search via BM25
Shebe is a fast and simple local code-search tool powered by BM25. No embeddings, No GPU, No cloud.
Research shows 70-85% of developer code search value comes from keyword-based queries. Developers tend to search with exact terms they know: function names, API calls, error messages. BM25 excels at this.
Trade-offs:
- Repositories must be cloned locally before indexing (no remote URL support)
- No semantic similarity: "login" does not match "authenticate". However, BM25
supports multi-term queries without performance degradation - agents quickly
learn to include synonyms (e.g.,
login OR authenticate OR sign-in). For true semantic search, pair with vector tools. See detailed analysis.
Capabilities:
- 2ms query latency
- 2k-12k files/sec indexing (6k files in 0.5s)
- 200-700 tokens/query
- Full UTF-8 support (emoji, CJK, special characters)
- 14 MCP tools for coding agents (claude, codex etc) (reference)
Size:
- ~10k lines of Rust source code (and another ~10k LoC test code).
- 2 binaries (cli and mcp) each at ~8MB.
Positioning: Complements structural tools (Serena MCP) with content search. Coding agents learn tool selection quickly:
- grep/ripgrep - Exact regex patterns, exhaustive matches, small codebases
- Shebe - Ranked results, large codebases (1k+ files), polyglot search, boolean queries
- Serena - Symbol refactoring, AST-aware edits, type-safe renaming
Alternatives: Cloud solutions like turbopuffer and nia come at a premium. Shebe is a free, local-only alternative. See WHY_SHEBE.md for benchmarks.
Table of Contents
- Quick Start
- Common Tasks
- Refactoring Workflow
- Configuration
- Documentation
- Performance
- Architecture
- Troubleshooting
- Project Status
- License
- Contributing
Quick Start
1. Install
Homebrew (macOS and Linux):
brew tap shebe-oss/tap brew install shebe
See the homebrew-tap repository for supported platforms and troubleshooting.
Manual download (Linux x86_64):
export SHEBE_VERSION=v0.5.8 curl -LO "https://github.com/shebe-oss/shebe-releases/releases/download/${SHEBE_VERSION}/shebe-${SHEBE_VERSION}-linux-x86_64.tar.gz" curl -LO "https://github.com/shebe-oss/shebe-releases/releases/download/${SHEBE_VERSION}/shebe-${SHEBE_VERSION}-linux-x86_64.tar.gz.sha256" sha256sum -c shebe-${SHEBE_VERSION}-linux-x86_64.tar.gz.sha256 tar -xzf shebe-${SHEBE_VERSION}-linux-x86_64.tar.gz sudo mv shebe shebe-mcp /usr/local/bin/
Verify:
2. Index a Repository
# Clone a test repository git clone --depth 1 https://github.com/envoyproxy/envoy.git ~/envoy # Index it (creates session "envoy-v1") shebe index-repository ~/envoy envoy-v1 # Output: Indexed 8,234 files (12,847 chunks) in 2.1s
3. Search Code
# Search for access log formatting shebe search-code envoy-v1 "accesslog format"
Results for "accesslog format" in envoy-v1 (top 10):
1. source/extensions/access_loggers/common/access_log_base.h [0.847]
class AccessLogBase : public AccessLog::Instance {
void formatAccessLog(...);
2. source/common/formatter/substitution_formatter.cc [0.823]
SubstitutionFormatter::format(const StreamInfo& info) {
4. Find References
# Find all references to SubstitutionFormatter shebe find-references envoy-v1 SubstitutionFormatter --symbol-type type
References to "SubstitutionFormatter" (type) - 23 found:
HIGH CONFIDENCE (18):
source/common/formatter/substitution_formatter.h:45
class SubstitutionFormatter : public Formatter {
source/extensions/access_loggers/file/file_access_log.cc:28
std::unique_ptr<SubstitutionFormatter> formatter_;
...
For detailed setup, see INSTALLATION.md.
Common Tasks
Quick links to accomplish specific goals:
| Task | Tool | Guide |
|---|---|---|
| Rename a symbol safely | find_references |
Reference |
| Search polyglot codebase | search_code |
Reference |
| Explore unfamiliar repo | index_repository + search_code |
Quick Start |
| Find files by pattern | find_file |
Reference |
| View file with context | read_file or preview_chunk |
Reference |
| Update stale index | reindex_session |
Reference |
Refactoring Workflow
Shebe's find_references and search_code tools work together to enumerate
all code locations affected by a refactoring task. In this example, Claude Code
uses Shebe to analyze a pagination work plan and identify every file that needs
to change -- completing the full impact analysis in ~1 minute.
View full workflow (6 screenshots)
Step 1: Index repository and run parallel find_references
Step 2: search_code locates CLI routing code
Step 3: Structured analysis -- source files needing changes
Step 4: New modules, wiring and documentation
Step 5: File creation plan and exclusion list
Step 6: Impact summary (~11 files, 1m 14s)
Configuration
Quick Reference
| Variable | Default | Description |
|---|---|---|
SHEBE_INDEX_DIR |
~/.local/state/shebe |
Session storage location |
SHEBE_CHUNK_SIZE |
512 |
Characters per chunk (100-2000) |
SHEBE_OVERLAP |
64 |
Overlap between chunks |
SHEBE_DEFAULT_K |
10 |
Default search results count |
SHEBE_MAX_K |
100 |
Maximum search results allowed |
Configuration File
Create shebe.toml in your working directory or ~/.config/shebe/shebe.toml:
[indexing] chunk_size = 512 overlap = 64 max_file_size = 10485760 # 10MB [search] default_k = 10 max_k = 100
See CONFIGURATION.md for complete reference.
Documentation
Getting Started
- INSTALLATION.md - Installation and setup guide
- Quick Start Guide - 5-minute setup for Claude Code
Reference
- MCP Tools Reference - Complete API for all 14 tools
- CONFIGURATION.md - All configuration options
- Performance Benchmarks - Detailed performance data
Development
- ARCHITECTURE.md - Developer guide (where/how to change code)
- CONTRIBUTING.md - How to contribute
- CODE_OF_CONDUCT.md - Community guidelines
- SECURITY.md - Security policy and reporting
Performance
Validated on Istio (5,605 files, Go-heavy) and OpenEMR (6,364 files, PHP polyglot):
| Metric | Result |
|---|---|
| Query latency | 2ms (consistent across all query types) |
| Indexing (Istio) | 11,210 files/sec (0.5s for 5,605 files) |
| Indexing (OpenEMR) | 1,928 files/sec (3.3s for 6,364 files) |
| Token usage | 210-650 tokens/query |
| Polyglot coverage | 11 file types in single query |
See docs/Performance.md for detailed benchmarks.
Architecture
See ARCHITECTURE.md for developer guide.
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| "Session not found" | Session doesn't exist or typo | Run list_sessions to see available sessions |
| "Schema version mismatch" | Session from older Shebe version | Run upgrade_session to migrate |
| Slow indexing | Disk I/O or large files | Exclude node_modules/, target/, check disk |
| No search results | Empty session or wrong query | Verify with get_session_info, check query syntax |
| "File not found" in read_file | File deleted since indexing | Run reindex_session to update |
| High token usage | Too many results | Reduce k parameter (default: 10) |
For detailed troubleshooting, see docs/guides/mcp-setup-guide.md.
Project Status
Version: v0.5.X
Status: Release Candidate
Testing: 76% coverage
Next: Pagination for list_dir and read_file when more than 500 files match a search term
See CHANGELOG.md for version history.
License
See LICENSE.
Contributing
We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.






