Disclaimer: This project and documentation were mostly vibe coded with Claude. Proceed accordingly.
Interactive 3D visualization of OpenAI text embeddings. 1002 words across 15 domains are embedded with text-embedding-3-small, reduced to 3D via PCA + UMAP, and projected onto a sphere using UMAP's native spherical output metric.
Live at nofone.io/experiment/3dembed
Quick start
Option A — just run the frontend (data included)
data.json is committed to the repo, so you can visualize immediately without an API key or running any Python.
cd frontend npm install npm run dev # → http://localhost:5173
Option B — generate your own embeddings
Use this if you want to change the word list, tweak UMAP parameters, or regenerate from scratch. Requires Python 3.11 and an OpenAI API key.
cd backend # Add your OpenAI API key echo "OPENAI_API_KEY=sk-..." > .env # Embed 1002 words (~30s, costs <$0.001) uv run python embed.py # Reduce to sphere coordinates (~90s) and overwrite data.json uv run python reduce.py # → writes ../frontend/public/data.json
embed.py is idempotent — it skips words already in the database, so you can safely re-run it after adding new words. Re-run reduce.py anytime to regenerate coordinates without re-embedding.
Verify embeddings were stored:
sqlite3 embeddings.db "SELECT count(*) FROM embeddings" # → 1002
How it works
1002 words × 15 categories
↓
OpenAI text-embedding-3-small → 1536-dim vectors (stored in SQLite)
↓
PCA → 50 dims
↓
UMAP (output_metric="haversine") → (lat, lon) on S²
↓
Spherical → Cartesian → (x, y, z) on unit sphere
↓
React Three Fiber → interactive 3D visualization
Why haversine output? UMAP with output_metric="haversine" treats the output space as a 2-sphere (S²), embedding directly onto the sphere surface rather than flat 3D space. This avoids the clustering artifacts that come from L2-normalizing flat UMAP output (which maps all points to one hemisphere when UMAP output is all-positive).
Project structure
sphere-embed/
├── backend/ # Python pipeline (uv)
│ ├── words.py # 1002 words × 15 categories
│ ├── embed.py # OpenAI → SQLite (multithreaded, idempotent)
│ ├── reduce.py # PCA + UMAP → data.json
│ └── embeddings.db # generated, gitignored
└── frontend/ # Vite + React + TypeScript
├── src/
│ ├── App.tsx
│ ├── components/
│ │ ├── Scene.tsx # R3F Canvas + OrbitControls
│ │ ├── SpherePoints.tsx # InstancedMesh per category
│ │ ├── WireframeSphere.tsx
│ │ ├── Controls.tsx # category toggles + search
│ │ └── Tooltip.tsx
│ └── hooks/
│ └── useEmbeddingData.ts
└── public/
└── data.json # pre-computed, committed to repo
Categories
| Category | Count |
|---|---|
| Animals | 67 |
| Biology | 67 |
| Chemistry | 67 |
| Physics | 67 |
| Mathematics | 67 |
| Philosophy | 67 |
| History | 67 |
| Politics | 67 |
| Business | 67 |
| Technology | 67 |
| Geography | 67 |
| Fashion | 67 |
| Food | 66 |
| Sports | 66 |
| Psychology | 66 |
Tech stack
| Layer | Stack |
|---|---|
| Embedding | OpenAI text-embedding-3-small |
| Dim reduction | scikit-learn PCA + umap-learn |
| Storage | SQLite |
| Visualization | React + Vite + TypeScript |
| 3D rendering | React Three Fiber + Three.js |
| Python tooling | uv |