GitHub - ravila4/obsidian-semantic-search: Semantic search for Obsidian vaults using LanceDB and cloud or local embedding models

Semantic search for Obsidian vaults. Index your vault into vector embeddings, then search by meaning rather than keywords.

Using this with an AI agent (Claude Code, Cursor, etc.)? See SKILL.md for agent-facing guidance — score interpretation, workflows, and known gotchas.

Install

# As a standalone CLI (recommended)
uv tool install obsidian-semantic

# Or with pipx (also installs into an isolated environment)
pipx install obsidian-semantic

# With Gemini embedder support
uv tool install "obsidian-semantic[gemini]"

Then configure:

obsidian-semantic configure

Configuration is stored in ~/.config/obsidian-semantic/config.yaml. Supports Ollama (local), LM Studio (local), and Gemini embedders.

From source

git clone https://github.com/ravila4/obsidian-semantic-search
cd obsidian-semantic-search
uv sync
uv run obsidian-semantic configure

Usage

Index your vault

obsidian-semantic index                # incremental (new/modified files only)
obsidian-semantic index --full         # reindex everything

Search

obsidian-semantic search "dependency injection patterns"
obsidian-semantic search "python testing" --limit 5
obsidian-semantic search "docker" --folder "Programming/"
obsidian-semantic search "habits" --tag "review"
obsidian-semantic search "fisher" --score-min 0.6     # drop low-relevance hits
obsidian-semantic search "fisher" --per-file 0        # show every matching chunk

By default, results are deduped to one chunk per file. Pass --per-file N to allow up to N chunks per file (or 0 for unlimited).

--score-min thresholds need to account for dedup: the second-best file's surviving chunk often scores ~0.05–0.10 lower than the duplicate chunks it displaced, so a threshold tuned against raw chunk scores can drop relevant notes. Calibrate against the post-dedup output. Useful absolute bands on ollama+nomic are roughly: ≥0.65 strong title-level match, ≥0.5 topical, <0.4 likely noise. Other embedders (qwen3, gemini) sit on different scales.

Find related notes

Find notes similar to a given note, useful for discovering connections, linking, or deduplication.

obsidian-semantic related "Programming/Python/Unit Testing.md"
obsidian-semantic related "Daily/2026-02-05.md" --limit 5

If the note isn't in the index, it's chunked and embedded on the fly.

Show a note

Print the full contents of a note straight to stdout. Accepts a vault-relative path or a bare filename (with or without .md); if the basename is unique, it's resolved automatically. Reads from disk, so it works on un-indexed files too (unlike search).

obsidian-semantic show "Fisher's Exact in Empiroar.md"
obsidian-semantic show "Programming/Python/Unit Testing.md"
obsidian-semantic show "Unit Testing.md#Setup#Installation"   # specific section

Append #Heading (or #Parent#Child for nested sections) to print just that section. Heading paths are matched against the breadcrumb suffix and are case-insensitive; ambiguous headings are listed with line numbers.

Suggest missing links

Find semantically similar notes that aren't linked to each other -- surfaces missing wikilinks and potential duplicates.

obsidian-semantic suggest-links
obsidian-semantic suggest-links --threshold 0.85 --limit 10
obsidian-semantic suggest-links --exclude-same-folder "Daily Log"

Folders to exclude can also be set in config so you don't have to type them every time:

suggest_links:
  exclude_same_folder:
    - "Daily Log"

Status

Options

All commands accept --vault <path> to specify the vault. Alternatively, set OBSIDIAN_VAULT or configure a default with obsidian-semantic configure --vault <path>.

Embedding Backends

Configuration lives in ~/.config/obsidian-semantic/config.yaml. You can also place a .obsidian-semantic.yaml in your vault root to override per-vault.

After changing the embedder or model, reindex with obsidian-semantic index --full.

Ollama with Nomic (default)

Local embeddings with nomic-embed-text (768 dimensions). Uses search_query:/search_document: prefixes for asymmetric retrieval.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: ollama
  model: nomic-embed-text
  dimension: 768
  query_prefix: "search_query: "
  document_prefix: "search_document: "

ollama pull nomic-embed-text

Ollama with Qwen3-embedding

Higher-quality embeddings with qwen3-embedding (4096 dimensions). Uses an instruction prefix for queries to improve retrieval.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: ollama
  model: qwen3-embedding:8b
  dimension: 4096
  query_prefix: "Instruct: Given a search query, retrieve relevant notes\nQuery: "

ollama pull qwen3-embedding:8b

LM Studio

Local embeddings via LM Studio's OpenAI-compatible API (/v1/embeddings on port 1234). Start the server first:

LM Studio with Nomic

vault: ~/Documents/Obsidian-Notes
embedder:
  type: lmstudio
  model: text-embedding-nomic-embed-text-v1.5
  dimension: 768
  query_prefix: "search_query: "
  document_prefix: "search_document: "

lms get -y nomic-ai/nomic-embed-text-v1.5

LM Studio with Qwen3-embedding

Higher-quality embeddings (4096 dimensions). Like the Ollama variant, uses an instruction prefix for queries to improve retrieval.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: lmstudio
  model: text-embedding-qwen3-embedding-8b
  dimension: 4096
  query_prefix: "Instruct: Given a search query, retrieve relevant notes\nQuery: "

Gemini

Cloud embeddings via Google's gemini-embedding-001 (3072 dimensions). Handles query vs. document task types automatically -- no prefix config needed. Requires a GEMINI_API_KEY environment variable.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: gemini
  model: gemini-embedding-001
  dimension: 3072

Advanced Options

Timeout Configuration

The embedder request timeout (default: 30 seconds) can be increased for large files or slower models:

embedder:
  timeout: 60.0  # seconds

If you see timeout errors during indexing, try increasing this value. Very large notes with extensive JSON or code blocks may need 60-120 seconds.

Automatic Indexing

Linux (systemd)

Create a service and timer in ~/.config/systemd/user/:

obsidian-semantic-index.service

[Unit]
Description=Index Obsidian vault for semantic search

[Service]
Type=oneshot
EnvironmentFile=%h/.config/obsidian-semantic/env
ExecStart=/home/youruser/.local/bin/obsidian-semantic index

obsidian-semantic-index.timer

[Unit]
Description=Run Obsidian semantic index hourly

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target

The EnvironmentFile is optional — use it to store secrets like GEMINI_API_KEY outside of the main config.

Enable and start:

systemctl --user enable --now obsidian-semantic-index.timer

Multiple vaults

To index additional vaults, add more ExecStart lines to the service (they run sequentially):

[Service]
Type=oneshot
EnvironmentFile=%h/.config/obsidian-semantic/env
ExecStart=/home/youruser/.local/bin/obsidian-semantic index
ExecStart=/home/youruser/.local/bin/obsidian-semantic index --vault /path/to/second-vault

macOS (launchd)

A ready-to-edit plist + wrapper script lives in scripts/launchd/. The wrapper opportunistically starts the LM Studio server (lms server start) before each run, so the agent works whether or not you remembered to leave the server up.

Install once:

# Make obsidian-semantic available on PATH
uv tool install -e .

# Edit the absolute paths in the plist to match your home directory, then:
cp scripts/launchd/com.ravila.obsidian-semantic-index.plist ~/Library/LaunchAgents/
launchctl load -w ~/Library/LaunchAgents/com.ravila.obsidian-semantic-index.plist

Logs land at ~/Library/Logs/obsidian-semantic-index.log.

To unload or check status:

launchctl list | grep obsidian-semantic
launchctl unload ~/Library/LaunchAgents/com.ravila.obsidian-semantic-index.plist