OLTP to OLAP Showcase
Real-time CDC pipeline: PostgreSQL โ Redpanda โ Moose โ ClickHouse
Stream changes from your OLTP database to OLAP analytics in real-time using Change Data Capture. Five ORM implementations showing the same architecture across TypeScript and Python ecosystems!
๐ฏ New Developer? Start Here
This repository demonstrates Change Data Capture (CDC) - the practice of streaming database changes to analytics systems in real-time. Instead of polling or batch ETL jobs, CDC captures every INSERT, UPDATE, and DELETE from PostgreSQL's transaction log and pipes it to ClickHouse for lightning-fast analytics.
Choose your path:
- Learn the concepts โ Read How It Works below
- Try it immediately โ Follow Quick Start (5 minutes)
- Understand the design โ See CDC Transformation Architecture
- Test interactively โ Run the Test Client to see CDC in action
What This Demonstrates
A complete, working example of an OLTP-to-OLAP CDC pipeline with:
- Real-time Change Data Capture from PostgreSQL transaction logs
- Automatic denormalization for fast analytical queries without JOINs
- Five ORM examples showing the same architecture across different frameworks
- Interactive test client with dynamic backend switching
- Star schema design optimized for analytics workloads
- Interactive setup scripts that teach CDC concepts as you configure
Project Status
Production-Ready:
- โ TypeORM Example - TypeScript/Node.js with Express API (port 3000)
- โ SQLModel Example - Python/FastAPI implementation (port 3002)
- โ Test Client - React UI with in-app backend switching
Experimental (Setup Available):
โ ๏ธ Drizzle Example - Basic setup working (port 3003)โ ๏ธ Prisma Example - Basic setup working (port 3004)
Every example includes an interactive setup.sh script that guides you through CDC configuration step-by-step.
Architecture
โโโโโโโโโโโโโโโโโโโ WAL โโโโโโโโโโโโโโโโ Kafka โโโโโโโโโโโโโโโ
โ PostgreSQL โโโโโโโโโโโ>โ Redpanda โโโโโโโโโโโ>โ Moose โ
โ (TypeORM/SA) โ โ Connect โ โ Functions โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโ
โ
Transform
โ
โโโโโโโโโโโโโโโโโโ
โ ClickHouse โ
โ (Analytics) โ
โโโโโโโโโโโโโโโโโโ
Quick Start
Prerequisites:
- Docker and Docker Compose
- Node.js 18+ (for TypeScript examples) or Python 3.10+ (for Python examples)
- Redpanda Enterprise License - Get a free 30-day trial
๐ Recommended: Interactive Setup Script
The easiest way to get started - each ORM example includes an interactive setup.sh script that teaches CDC concepts while configuring your environment:
# Choose any example (TypeORM recommended for beginners) cd apps/typeorm-example # Run interactive setup - it will guide you through: ./setup.sh # 1. Starting PostgreSQL with CDC enabled # 2. Creating database tables # 3. Configuring logical replication # 4. Verifying CDC prerequisites # Or use convenient shortcuts make start # Interactive setup make status # Check what's running make logs-connector # Debug CDC connector
Why use setup.sh:
- ๐ Educational - Learn what CDC is and how it works as you set it up
- โ Debuggable - Clear status checks and error messages at each step
- ๐ง Flexible - Run all steps together or individual steps for troubleshooting
- ๐ Transparent - See exactly what's happening under the hood
After running setup.sh, start the remaining services and test the pipeline (see example-specific instructions below).
TypeScript + TypeORM
cd apps/typeorm-example # Set Redpanda license export REDPANDA_LICENSE="your_license_key_here" pnpm install # Run interactive setup (accept the prompt to run pnpm setup-db) ./setup.sh # Terminal 1: Start Moose (includes CDC connector) moose dev # Terminal 2: Start the API server pnpm dev
Visit http://localhost:3000 for API and http://localhost:3000/reference for docs.
Python + SQLModel
cd apps/sqlmodel-example # Set Redpanda license export REDPANDA_LICENSE="your_license_key_here" # Setup (first time only) python -m venv venv source venv/bin/activate pip install -e . # Run interactive setup (accept the prompt to run python init_db.py) ./setup.sh # Terminal 1: Start Moose (includes CDC connector) moose dev # Terminal 2: Start API (run python init_db.py first if you skipped it) source venv/bin/activate # python init_db.py fastapi dev src/main.py --port 3002
Visit http://localhost:3002/docs for interactive API documentation.
About SQLModel: Created by FastAPI's author, SQLModel combines SQLAlchemy (database) and Pydantic (validation) into a single, type-safe model. One class definition serves as both database table and API validation schema.
Test the Pipeline
cd apps/test-client pnpm install && pnpm dev
Visit http://localhost:3001 to create data and watch CDC in action.
The test client features:
- ๐ Dynamic backend switching - Switch between TypeORM, SQLModel, or other backends using the in-app Settings button
- ๐ฏ Auto-detection - Automatically shows settings modal if it can't connect to the backend
- ๐ฒ Data generation - Create random customers, products, and orders with one click
- ๐ Real-time CDC - Watch changes flow from PostgreSQL to ClickHouse in under 1 second
- โ๏ธ Full CRUD - Create, update, and delete records to test all CDC operations
See the Test Client README for complete usage instructions.
Documentation
Getting Started
- Quick Start Guide - One-page onboarding for any ORM example
- Docker Setup Guide - Ports, containers, and commands
- TypeORM Example README - TypeScript/Node.js implementation (recommended for beginners)
- SQLModel Example README - Python/FastAPI implementation
- Drizzle Example README - Experimental Drizzle ORM implementation
- Prisma Example README - Experimental Prisma ORM implementation
- Test Client README - Interactive React UI
Architecture & Design
- CDC Transformation Architecture (SQLModel)
- OLTP โ OLAP Model Translation (SQLModel)
- Setup Script Walkthrough (SQLModel)
Project Structure
oltp-to-olap-showcase/
โโโ apps/
โ โโโ typeorm-example/ # โ
TypeScript/Node.js (port 3000)
โ โ โโโ src/ # TypeORM entities & Express API
โ โ โโโ moose/ # Moose OLAP tables & transformations
โ โ โโโ config/ # Moose configuration helpers
โ โ โโโ setup.sh # Interactive CDC setup script
โ โ
โ โโโ sqlmodel-example/ # โ
Python/FastAPI (port 3002)
โ โ โโโ src/ # SQLModel models & FastAPI
โ โ โโโ moose/ # Moose OLAP tables & transformations
โ โ โโโ docs/ # Architecture & setup guides
โ โ โโโ setup.sh # Interactive CDC setup script
โ โ
โ โโโ drizzle-example/ # โ ๏ธ Experimental (port 3003)
โ โ โโโ setup.sh # Interactive CDC setup script
โ โ
โ โโโ prisma-example/ # โ ๏ธ Experimental (port 3004)
โ โ โโโ setup.sh # Interactive CDC setup script
โ โ
โ โโโ test-client/ # โ
React UI (port 3001)
โ โโโ src/
โ โ โโโ components/ # Settings modal for backend switching
โ โ โโโ contexts/ # API context for dynamic URL management
โ โโโ docs/ # Settings UI & environment configuration notes
โ โโโ README.md # Test client documentation
โ
โโโ packages/
โ โโโ shared/ # Shared middleware, types, and utilities
โ
โโโ docs/ # Centralized guides & troubleshooting
All examples include interactive setup.sh scripts that guide you through CDC configuration.
How It Works
1. OLTP Models (Normalized)
Your application uses standard ORM models with relationships:
TypeORM:
@Entity() export class Order { @PrimaryGeneratedColumn() id: number; @ManyToOne(() => Customer) customer: Customer; @OneToMany(() => OrderItem, (item) => item.order) items: OrderItem[]; }
SQLModel (Python):
from sqlmodel import SQLModel, Field, Relationship class Order(SQLModel, table=True): id: int | None = Field(default=None, primary_key=True) customer_id: int = Field(foreign_key="customer.id") customer: Customer | None = Relationship(back_populates="orders") # Same class works for DB table AND API validation
2. CDC Capture
PostgreSQL's Write-Ahead Log (WAL) captures every change:
{
"table": "order",
"operation": "insert",
"after": { "id": 1, "customer_id": 123, "status": "pending", "total": 99.99 }
}Redpanda Connect streams these events to Kafka topics in real-time.
3. Transform & Load
Moose functions transform normalized data into denormalized analytics tables:
// Denormalize: join customer data into orders export interface OrderFact { order_id: UInt64; customer_name: string; // From customers table customer_email: string; // From customers table status: string; total: Float64; }
4. Fast Analytics
Query ClickHouse without expensive JOINs:
SELECT customer_name, SUM(total) as revenue FROM order_fact WHERE order_date >= today() - 30 GROUP BY customer_name ORDER BY revenue DESC;
Use Cases
When to use OLTP-to-OLAP CDC:
- Analytical queries slowing down your transactional database
- Building real-time dashboards and reports
- Need to separate operational and analytical workloads
- Want fast aggregations without complex JOINs
- Require historical data analysis alongside live transactions
Example scenarios:
- Revenue and sales analytics
- Customer behavior analysis
- Product performance metrics
- Real-time operational monitoring
- Audit trails and compliance reporting
Technology Stack
Shared Infrastructure:
- PostgreSQL 15 with logical replication
- Redpanda Connect (CDC connector for capturing WAL changes)
- Redpanda (Kafka-compatible streaming platform)
- Moose (stream processing and transformations)
- ClickHouse (columnar analytics database)
ORM Examples:
| Example | Language | ORM | API Framework | Port |
|---|---|---|---|---|
| TypeORM | TypeScript | TypeORM 0.3 | Express + Scalar | 3000 |
| SQLModel | Python | SQLModel (SQLAlchemy 2.0 + Pydantic) | FastAPI | 3002 |
| Drizzle | TypeScript | Drizzle | Express | 3003 |
| Prisma | TypeScript | Prisma | Express | 3004 |
Test Client:
- React 18 + TypeScript + Vite
- shadcn/ui components (Dialog, Button, etc.)
- Tailwind CSS for styling
- React Query for data fetching
- Dynamic backend switching via settings UI
Key Concepts
Change Data Capture (CDC) Captures database changes from PostgreSQL's Write-Ahead Log. No polling, no triggers - log-based replication with minimal overhead.
Denormalization Pre-joining related data for fast analytics. Instead of JOINing orders with customers at query time, customer data is embedded in the order fact table.
Star Schema Fact tables (orders, order items) and dimension tables (customers, products) optimized for analytical queries and aggregations.
Real-time Processing Changes appear in ClickHouse within milliseconds, enabling live dashboards and up-to-the-second reporting.
Contributing
Contributions are welcome! This is an educational project demonstrating CDC patterns across multiple ORMs.
Areas for contribution:
- ๐ Bug fixes - Improvements to existing TypeORM and SQLModel examples
- ๐ Documentation - Clarify setup instructions, add troubleshooting tips
- ๐ง Experimental ORMs - Help stabilize the Drizzle and Prisma examples
- โก Performance - Optimize transformations and data flow
- ๐งช Test scenarios - Add new features to test client (filters, charts, etc.)
- ๐จ UI improvements - Enhance test client user experience
Why contribute?
- Learn CDC patterns hands-on
- See how different ORMs handle the same architecture
- Build portfolio-worthy work with real-world data engineering
- Help others learn from your improvements
See CONTRIBUTING.md for detailed guidelines.
Learn More
Core Technologies:
ORMs & Frameworks:
- TypeORM - TypeScript ORM
- SQLModel - Python ORM combining SQLAlchemy + Pydantic
- SQLAlchemy 2.0 - Python SQL toolkit (SQLModel foundation)
- FastAPI - Modern Python API framework
- Express - Node.js web framework
License
MIT
Ready to Get Started?
Three ways to explore this project:
- Quick Test - Follow Quick Start above for a 5-minute setup
- Deep Dive - Review the SQLModel architecture docs under
apps/sqlmodel-example/docs - Interactive Learning - Run
./setup.shin any example directory for guided CDC configuration
Choose your ORM:
- TypeORM - TypeScript/Node.js (recommended for beginners)
- SQLModel - Python/FastAPI
- Drizzle and Prisma - Experimental setups ready for contributions
Then test with: Test Client - Interactive UI to see CDC in real-time