GitHub - arun1729/text-to-kg: CogDB Demo: Text → Knowledge Graph → Semantic Search

2 min read Original article ↗

Build a knowledge graph from unstructured text using CogDB and OpenAI, then query it with graph traversal, semantic search, and hybrid queries.

Pipeline

Source Text
    ↓  (OpenAI gpt-5)
Entity Extraction        → typed entities (CelestialBody, Moon, Mission, …)
    ↓  (OpenAI gpt-5)
Relationship Extraction  → typed triples (Europa MOON_OF Jupiter, …)
    ↓
Entity Resolution        → normalize aliases, deduplicate
    ↓  (OpenAI text-embedding-3-small)
Embedding Generation     → vector per entity
    ↓
CogDB Graph + Vectors    → triples + embeddings loaded into CogDB
    ↓
Queries                  → traversal, semantic, and hybrid

Quick Start

# 1. Set your OpenAI API key
echo 'OPENAI_API_KEY=sk-...' > .env

# 2. Run (creates venv, installs deps, runs ETL + queries)
./run.sh

On the first run, the pipeline calls OpenAI to extract entities and relationships from planetary-habitability.txt and caches the results in kg_data.json. Subsequent runs skip extraction and go straight to graph construction and queries.

Use ./run.sh --clean to wipe the venv and database and start fresh.

Project Structure

File Purpose
etl.py Extract entities/relationships, resolve, embed, and load into CogDB
query.py Open the CogDB graph and run traversal, semantic, and hybrid queries
run.sh One-command runner: sets up venv, runs etl.py then query.py

What the Queries Demonstrate

Section Technique Example
A. Graph Traversal Structural hops Europa → features, missions targeting habitable bodies
B. Semantic Search Embedding k-nearest Entities closest to "signs of life" or "ocean world"
C. Hybrid Traversal + semantic BFS from Europa → filter by "water and ice" relevance

Requirements