GitHub - tadasv/vibed-categorizer: 100% vibed. vibed-categorizer is a powerful, local-first CLI utility for categorizing text using vector embeddings. It pairs a local embedding provider (like LM Studio) with SQLite and sqlite-vec to create a fast, private, and efficient semantic categorization engine.

2 min read Original article ↗

vibed-categorizer

vibed-categorizer is a powerful, local-first CLI utility for categorizing text using vector embeddings. It pairs a local embedding provider (like LM Studio) with SQLite and sqlite-vec to create a fast, private, and efficient semantic categorization engine.

Built for developers who want to organize data (like bank transactions, logs, or notes) without sending it to the cloud.

Features

  • 🔒 Local & Private: Runs entirely on your machine. No data leaves your system.
  • 🚀 High Performance: Uses sqlite-vec for blazing fast vector similarity search in SQLite.
  • 🧠 AI-Powered: Connects to any OpenAI-compatible embedding API (e.g., LM Studio, Ollama).
  • 🛠 CLI-First: Simple, composable commands for Unix-like workflows.
  • 📦 Portable: Data is stored in a single SQLite file (categories.db) that is easy to backup or move.
  • 🔄 Import/Export: Full support for JSONL import/export for data migration or manual labeling.

Prerequisites

  • Go 1.23+ (Required to compile)
  • C Compiler (GCC/Clang) - Required for sqlite-vec CGO bindings.
  • Embedding Provider:

Installation

From Source

git clone https://github.com/tadasv/vibed-categorizer.git
cd vibed-categorizer
go build -o vibed-categorizer cmd/vibed-categorizer/main.go

Configuration

  1. Start your Embedding Provider:

    • LM Studio: Start the Local Server (default: http://localhost:1234/v1). Load a text embedding model (e.g., nomic-embed-text-v1.5).
    • Ollama: Run ollama pull nomic-embed-text and ensure the server is running.
  2. Initialize the Database: You must initialize the database with the vector dimension matching your model.

    • nomic-embed-text-v1.5: 768 dimensions.
    • text-embedding-3-small: 1536 dimensions.
    ./vibed-categorizer init --dim 768

Usage

1. Categorize New Data (infer)

Predict the category of a text string based on existing examples.

./vibed-categorizer infer --content "Starbucks Coffee" --show-score
# Output: dining (0.1245)

2. Add Training Data (add)

Teach the system by adding labeled examples.

Single Item:

./vibed-categorizer add --content "Starbucks Coffee" --category "dining"

Batch Import (JSONL/CSV): Supports JSONL ({"content": "...", "category": "..."}) or CSV (content,category).

./vibed-categorizer add --file transactions.csv

3. Search Data (find)

Inspect your database using standard text search.

./vibed-categorizer find --content "Starbucks"
# Lists all records matching "Starbucks"

4. Manage Data

Delete a Record:

./vibed-categorizer delete <UUID>

Export All Data: Dump your entire database (including embeddings) to a JSONL file.

./vibed-categorizer export --file backup.jsonl

Import Data: Restore from a backup or import pre-computed embeddings.

./vibed-categorizer import --file backup.jsonl

Architecture

  • Language: Go
  • Database: SQLite3
  • Vector Search: sqlite-vec (The successor to sqlite-vss) via CGO bindings.
  • Embeddings: OpenAI-compatible API client.

Credits

Designed and implemented by Gemini (Google's multimodal AI model) in collaboration with Tadas Vilkeliskis.