vibed-categorizer
vibed-categorizer is a powerful, local-first CLI utility for categorizing text using vector embeddings. It pairs a local embedding provider (like LM Studio) with SQLite and sqlite-vec to create a fast, private, and efficient semantic categorization engine.
Built for developers who want to organize data (like bank transactions, logs, or notes) without sending it to the cloud.
Features
- 🔒 Local & Private: Runs entirely on your machine. No data leaves your system.
- 🚀 High Performance: Uses
sqlite-vecfor blazing fast vector similarity search in SQLite. - 🧠 AI-Powered: Connects to any OpenAI-compatible embedding API (e.g., LM Studio, Ollama).
- 🛠 CLI-First: Simple, composable commands for Unix-like workflows.
- 📦 Portable: Data is stored in a single SQLite file (
categories.db) that is easy to backup or move. - 🔄 Import/Export: Full support for JSONL import/export for data migration or manual labeling.
Prerequisites
- Go 1.23+ (Required to compile)
- C Compiler (GCC/Clang) - Required for
sqlite-vecCGO bindings. - Embedding Provider:
Installation
From Source
git clone https://github.com/tadasv/vibed-categorizer.git
cd vibed-categorizer
go build -o vibed-categorizer cmd/vibed-categorizer/main.goConfiguration
-
Start your Embedding Provider:
- LM Studio: Start the Local Server (default:
http://localhost:1234/v1). Load a text embedding model (e.g.,nomic-embed-text-v1.5). - Ollama: Run
ollama pull nomic-embed-textand ensure the server is running.
- LM Studio: Start the Local Server (default:
-
Initialize the Database: You must initialize the database with the vector dimension matching your model.
nomic-embed-text-v1.5: 768 dimensions.text-embedding-3-small: 1536 dimensions.
./vibed-categorizer init --dim 768
Usage
1. Categorize New Data (infer)
Predict the category of a text string based on existing examples.
./vibed-categorizer infer --content "Starbucks Coffee" --show-score # Output: dining (0.1245)
2. Add Training Data (add)
Teach the system by adding labeled examples.
Single Item:
./vibed-categorizer add --content "Starbucks Coffee" --category "dining"
Batch Import (JSONL/CSV):
Supports JSONL ({"content": "...", "category": "..."}) or CSV (content,category).
./vibed-categorizer add --file transactions.csv
3. Search Data (find)
Inspect your database using standard text search.
./vibed-categorizer find --content "Starbucks" # Lists all records matching "Starbucks"
4. Manage Data
Delete a Record:
./vibed-categorizer delete <UUID>
Export All Data: Dump your entire database (including embeddings) to a JSONL file.
./vibed-categorizer export --file backup.jsonlImport Data: Restore from a backup or import pre-computed embeddings.
./vibed-categorizer import --file backup.jsonl
Architecture
- Language: Go
- Database: SQLite3
- Vector Search: sqlite-vec (The successor to
sqlite-vss) via CGO bindings. - Embeddings: OpenAI-compatible API client.
Credits
Designed and implemented by Gemini (Google's multimodal AI model) in collaboration with Tadas Vilkeliskis.