GitHub - hemanth/xkcd-search: Reverse lookup XKCD comics using Gemini multimodal embeddings (gemini-embedding-2-preview)

1 min read Original article ↗

Upload an XKCD comic image or describe it in text — instantly find which comic it is.

Powered by gemini-embedding-2-preview multimodal embeddings, ChromaDB for vector storage, and the olivierdehaene/xkcd dataset.

Architecture

reverse-lookup.mov

Setup

# Install dependencies
pip install -r requirements.txt

# Set your Gemini API key
cp .env.example .env
# edit .env with your key

# Fetch comics from HF dataset (default: last 50)
python fetch_xkcd.py 50

# Build the embedding index
python index_comics.py

# Start the server
python app.py

Open http://localhost:8000 and search!

How It Works

  1. Fetch — Loads XKCD metadata from the HF dataset and downloads comic images.
  2. Index — Embeds each comic (image + text) using gemini-embedding-2-preview. Stores vectors in ChromaDB with two collections: xkcd_images and xkcd_text.
  3. Search — Uploaded images or text queries are embedded with the same model. ChromaDB handles cosine similarity. Text queries score against both collections, taking the max.

Because text and images share the same embedding space, you can describe a comic ("someone flying with a python script") and find it just as well as uploading a screenshot.