GitHub - ravila4/obsidian-semantic-search: Semantic search for Obsidian vaults using LanceDB and cloud or local embedding models

5 min read Original article ↗

PyPI Python License: MIT

Semantic search for Obsidian vaults. Index your vault into vector embeddings, then search by meaning rather than keywords.

obsidian-semantic CLI

Using this with an AI agent (Claude Code, Cursor, etc.)? See SKILL.md for agent-facing guidance — score interpretation, workflows, and known gotchas.

Install

# As a standalone CLI (recommended)
uv tool install obsidian-semantic

# Or with pipx (also installs into an isolated environment)
pipx install obsidian-semantic

# With Gemini embedder support
uv tool install "obsidian-semantic[gemini]"

Then configure:

obsidian-semantic configure

Configuration is stored in ~/.config/obsidian-semantic/config.yaml. Supports Ollama (local), LM Studio (local), and Gemini embedders.

From source

git clone https://github.com/ravila4/obsidian-semantic-search
cd obsidian-semantic-search
uv sync
uv run obsidian-semantic configure

Usage

Index your vault

obsidian-semantic index                # incremental (new/modified files only)
obsidian-semantic index --full         # reindex everything

Search

obsidian-semantic search "dependency injection patterns"
obsidian-semantic search "python testing" --limit 5
obsidian-semantic search "docker" --folder "Programming/"
obsidian-semantic search "habits" --tag "review"
obsidian-semantic search "fisher" --score-min 0.6     # drop low-relevance hits
obsidian-semantic search "fisher" --per-file 0        # show every matching chunk

By default, results are deduped to one chunk per file. Pass --per-file N to allow up to N chunks per file (or 0 for unlimited).

--score-min thresholds need to account for dedup: the second-best file's surviving chunk often scores ~0.05–0.10 lower than the duplicate chunks it displaced, so a threshold tuned against raw chunk scores can drop relevant notes. Calibrate against the post-dedup output. Useful absolute bands on ollama+nomic are roughly: ≥0.65 strong title-level match, ≥0.5 topical, <0.4 likely noise. Other embedders (qwen3, gemini) sit on different scales.

Find related notes

Find notes similar to a given note, useful for discovering connections, linking, or deduplication.

obsidian-semantic related "Programming/Python/Unit Testing.md"
obsidian-semantic related "Daily/2026-02-05.md" --limit 5

If the note isn't in the index, it's chunked and embedded on the fly.

Show a note

Print the full contents of a note straight to stdout. Accepts a vault-relative path or a bare filename (with or without .md); if the basename is unique, it's resolved automatically. Reads from disk, so it works on un-indexed files too (unlike search).

obsidian-semantic show "Fisher's Exact in Empiroar.md"
obsidian-semantic show "Programming/Python/Unit Testing.md"
obsidian-semantic show "Unit Testing.md#Setup#Installation"   # specific section

Append #Heading (or #Parent#Child for nested sections) to print just that section. Heading paths are matched against the breadcrumb suffix and are case-insensitive; ambiguous headings are listed with line numbers.

Suggest missing links

Find semantically similar notes that aren't linked to each other -- surfaces missing wikilinks and potential duplicates.

obsidian-semantic suggest-links
obsidian-semantic suggest-links --threshold 0.85 --limit 10
obsidian-semantic suggest-links --exclude-same-folder "Daily Log"

Folders to exclude can also be set in config so you don't have to type them every time:

suggest_links:
  exclude_same_folder:
    - "Daily Log"

Status

Options

All commands accept --vault <path> to specify the vault. Alternatively, set OBSIDIAN_VAULT or configure a default with obsidian-semantic configure --vault <path>.

Embedding Backends

Configuration lives in ~/.config/obsidian-semantic/config.yaml. You can also place a .obsidian-semantic.yaml in your vault root to override per-vault.

After changing the embedder or model, reindex with obsidian-semantic index --full.

Ollama with Nomic (default)

Local embeddings with nomic-embed-text (768 dimensions). Uses search_query:/search_document: prefixes for asymmetric retrieval.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: ollama
  model: nomic-embed-text
  dimension: 768
  query_prefix: "search_query: "
  document_prefix: "search_document: "
ollama pull nomic-embed-text

Ollama with Qwen3-embedding

Higher-quality embeddings with qwen3-embedding (4096 dimensions). Uses an instruction prefix for queries to improve retrieval.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: ollama
  model: qwen3-embedding:8b
  dimension: 4096
  query_prefix: "Instruct: Given a search query, retrieve relevant notes\nQuery: "
ollama pull qwen3-embedding:8b

LM Studio

Local embeddings via LM Studio's OpenAI-compatible API (/v1/embeddings on port 1234). Start the server first:

LM Studio with Nomic

vault: ~/Documents/Obsidian-Notes
embedder:
  type: lmstudio
  model: text-embedding-nomic-embed-text-v1.5
  dimension: 768
  query_prefix: "search_query: "
  document_prefix: "search_document: "
lms get -y nomic-ai/nomic-embed-text-v1.5

LM Studio with Qwen3-embedding

Higher-quality embeddings (4096 dimensions). Like the Ollama variant, uses an instruction prefix for queries to improve retrieval.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: lmstudio
  model: text-embedding-qwen3-embedding-8b
  dimension: 4096
  query_prefix: "Instruct: Given a search query, retrieve relevant notes\nQuery: "

Gemini

Cloud embeddings via Google's gemini-embedding-001 (3072 dimensions). Handles query vs. document task types automatically -- no prefix config needed. Requires a GEMINI_API_KEY environment variable.

vault: ~/Documents/Obsidian-Notes
embedder:
  type: gemini
  model: gemini-embedding-001
  dimension: 3072

Advanced Options

Timeout Configuration

The embedder request timeout (default: 30 seconds) can be increased for large files or slower models:

embedder:
  timeout: 60.0  # seconds

If you see timeout errors during indexing, try increasing this value. Very large notes with extensive JSON or code blocks may need 60-120 seconds.

Automatic Indexing

Linux (systemd)

Create a service and timer in ~/.config/systemd/user/:

obsidian-semantic-index.service

[Unit]
Description=Index Obsidian vault for semantic search

[Service]
Type=oneshot
EnvironmentFile=%h/.config/obsidian-semantic/env
ExecStart=/home/youruser/.local/bin/obsidian-semantic index

obsidian-semantic-index.timer

[Unit]
Description=Run Obsidian semantic index hourly

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target

The EnvironmentFile is optional — use it to store secrets like GEMINI_API_KEY outside of the main config.

Enable and start:

systemctl --user enable --now obsidian-semantic-index.timer

Multiple vaults

To index additional vaults, add more ExecStart lines to the service (they run sequentially):

[Service]
Type=oneshot
EnvironmentFile=%h/.config/obsidian-semantic/env
ExecStart=/home/youruser/.local/bin/obsidian-semantic index
ExecStart=/home/youruser/.local/bin/obsidian-semantic index --vault /path/to/second-vault

macOS (launchd)

A ready-to-edit plist + wrapper script lives in scripts/launchd/. The wrapper opportunistically starts the LM Studio server (lms server start) before each run, so the agent works whether or not you remembered to leave the server up.

Install once:

# Make obsidian-semantic available on PATH
uv tool install -e .

# Edit the absolute paths in the plist to match your home directory, then:
cp scripts/launchd/com.ravila.obsidian-semantic-index.plist ~/Library/LaunchAgents/
launchctl load -w ~/Library/LaunchAgents/com.ravila.obsidian-semantic-index.plist

Logs land at ~/Library/Logs/obsidian-semantic-index.log.

To unload or check status:

launchctl list | grep obsidian-semantic
launchctl unload ~/Library/LaunchAgents/com.ravila.obsidian-semantic-index.plist