GitHub - 514-labs/oltp-to-olap-showcase

9 min read Original article โ†—

OLTP to OLAP Showcase

Real-time CDC pipeline: PostgreSQL โ†’ Redpanda โ†’ Moose โ†’ ClickHouse

Stream changes from your OLTP database to OLAP analytics in real-time using Change Data Capture. Five ORM implementations showing the same architecture across TypeScript and Python ecosystems!

๐ŸŽฏ New Developer? Start Here

This repository demonstrates Change Data Capture (CDC) - the practice of streaming database changes to analytics systems in real-time. Instead of polling or batch ETL jobs, CDC captures every INSERT, UPDATE, and DELETE from PostgreSQL's transaction log and pipes it to ClickHouse for lightning-fast analytics.

Choose your path:

  1. Learn the concepts โ†’ Read How It Works below
  2. Try it immediately โ†’ Follow Quick Start (5 minutes)
  3. Understand the design โ†’ See CDC Transformation Architecture
  4. Test interactively โ†’ Run the Test Client to see CDC in action

What This Demonstrates

A complete, working example of an OLTP-to-OLAP CDC pipeline with:

  • Real-time Change Data Capture from PostgreSQL transaction logs
  • Automatic denormalization for fast analytical queries without JOINs
  • Five ORM examples showing the same architecture across different frameworks
  • Interactive test client with dynamic backend switching
  • Star schema design optimized for analytics workloads
  • Interactive setup scripts that teach CDC concepts as you configure

Project Status

Production-Ready:

  • โœ… TypeORM Example - TypeScript/Node.js with Express API (port 3000)
  • โœ… SQLModel Example - Python/FastAPI implementation (port 3002)
  • โœ… Test Client - React UI with in-app backend switching

Experimental (Setup Available):

  • โš ๏ธ Drizzle Example - Basic setup working (port 3003)
  • โš ๏ธ Prisma Example - Basic setup working (port 3004)

Every example includes an interactive setup.sh script that guides you through CDC configuration step-by-step.

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    WAL     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   Kafka    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   PostgreSQL    โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€>โ”‚   Redpanda   โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€>โ”‚    Moose    โ”‚
โ”‚  (TypeORM/SA)   โ”‚            โ”‚   Connect    โ”‚            โ”‚  Functions  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                                   โ”‚
                                                             Transform
                                                                   โ†“
                                                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                                          โ”‚   ClickHouse   โ”‚
                                                          โ”‚   (Analytics)  โ”‚
                                                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Quick Start

Prerequisites:

  • Docker and Docker Compose
  • Node.js 18+ (for TypeScript examples) or Python 3.10+ (for Python examples)
  • Redpanda Enterprise License - Get a free 30-day trial

๐Ÿš€ Recommended: Interactive Setup Script

The easiest way to get started - each ORM example includes an interactive setup.sh script that teaches CDC concepts while configuring your environment:

# Choose any example (TypeORM recommended for beginners)
cd apps/typeorm-example

# Run interactive setup - it will guide you through:
./setup.sh
# 1. Starting PostgreSQL with CDC enabled
# 2. Creating database tables
# 3. Configuring logical replication
# 4. Verifying CDC prerequisites

# Or use convenient shortcuts
make start              # Interactive setup
make status             # Check what's running
make logs-connector     # Debug CDC connector

Why use setup.sh:

  • ๐Ÿ“š Educational - Learn what CDC is and how it works as you set it up
  • โœ… Debuggable - Clear status checks and error messages at each step
  • ๐Ÿ”ง Flexible - Run all steps together or individual steps for troubleshooting
  • ๐Ÿ“Š Transparent - See exactly what's happening under the hood

After running setup.sh, start the remaining services and test the pipeline (see example-specific instructions below).

TypeScript + TypeORM

cd apps/typeorm-example

# Set Redpanda license
export REDPANDA_LICENSE="your_license_key_here"

pnpm install
# Run interactive setup (accept the prompt to run pnpm setup-db)
./setup.sh

# Terminal 1: Start Moose (includes CDC connector)
moose dev

# Terminal 2: Start the API server
pnpm dev

Visit http://localhost:3000 for API and http://localhost:3000/reference for docs.

Python + SQLModel

cd apps/sqlmodel-example

# Set Redpanda license
export REDPANDA_LICENSE="your_license_key_here"

# Setup (first time only)
python -m venv venv
source venv/bin/activate
pip install -e .

# Run interactive setup (accept the prompt to run python init_db.py)
./setup.sh

# Terminal 1: Start Moose (includes CDC connector)
moose dev

# Terminal 2: Start API (run python init_db.py first if you skipped it)
source venv/bin/activate
# python init_db.py
fastapi dev src/main.py --port 3002

Visit http://localhost:3002/docs for interactive API documentation.

About SQLModel: Created by FastAPI's author, SQLModel combines SQLAlchemy (database) and Pydantic (validation) into a single, type-safe model. One class definition serves as both database table and API validation schema.

Test the Pipeline

cd apps/test-client
pnpm install && pnpm dev

Visit http://localhost:3001 to create data and watch CDC in action.

The test client features:

  • ๐Ÿ”„ Dynamic backend switching - Switch between TypeORM, SQLModel, or other backends using the in-app Settings button
  • ๐ŸŽฏ Auto-detection - Automatically shows settings modal if it can't connect to the backend
  • ๐ŸŽฒ Data generation - Create random customers, products, and orders with one click
  • ๐Ÿ“Š Real-time CDC - Watch changes flow from PostgreSQL to ClickHouse in under 1 second
  • โœ๏ธ Full CRUD - Create, update, and delete records to test all CDC operations

See the Test Client README for complete usage instructions.

Documentation

Getting Started

Architecture & Design

Project Structure

oltp-to-olap-showcase/
โ”œโ”€โ”€ apps/
โ”‚   โ”œโ”€โ”€ typeorm-example/       # โœ… TypeScript/Node.js (port 3000)
โ”‚   โ”‚   โ”œโ”€โ”€ src/               # TypeORM entities & Express API
โ”‚   โ”‚   โ”œโ”€โ”€ moose/             # Moose OLAP tables & transformations
โ”‚   โ”‚   โ”œโ”€โ”€ config/            # Moose configuration helpers
โ”‚   โ”‚   โ””โ”€โ”€ setup.sh           # Interactive CDC setup script
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ sqlmodel-example/      # โœ… Python/FastAPI (port 3002)
โ”‚   โ”‚   โ”œโ”€โ”€ src/               # SQLModel models & FastAPI
โ”‚   โ”‚   โ”œโ”€โ”€ moose/             # Moose OLAP tables & transformations
โ”‚   โ”‚   โ”œโ”€โ”€ docs/              # Architecture & setup guides
โ”‚   โ”‚   โ””โ”€โ”€ setup.sh           # Interactive CDC setup script
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ drizzle-example/       # โš ๏ธ Experimental (port 3003)
โ”‚   โ”‚   โ””โ”€โ”€ setup.sh           # Interactive CDC setup script
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ prisma-example/        # โš ๏ธ Experimental (port 3004)
โ”‚   โ”‚   โ””โ”€โ”€ setup.sh           # Interactive CDC setup script
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ test-client/           # โœ… React UI (port 3001)
โ”‚       โ”œโ”€โ”€ src/
โ”‚       โ”‚   โ”œโ”€โ”€ components/    # Settings modal for backend switching
โ”‚       โ”‚   โ””โ”€โ”€ contexts/      # API context for dynamic URL management
โ”‚       โ”œโ”€โ”€ docs/              # Settings UI & environment configuration notes
โ”‚       โ””โ”€โ”€ README.md          # Test client documentation
โ”‚
โ”œโ”€โ”€ packages/
โ”‚   โ””โ”€โ”€ shared/                # Shared middleware, types, and utilities
โ”‚
โ””โ”€โ”€ docs/                      # Centralized guides & troubleshooting

All examples include interactive setup.sh scripts that guide you through CDC configuration.

How It Works

1. OLTP Models (Normalized)

Your application uses standard ORM models with relationships:

TypeORM:

@Entity()
export class Order {
  @PrimaryGeneratedColumn()
  id: number;

  @ManyToOne(() => Customer)
  customer: Customer;

  @OneToMany(() => OrderItem, (item) => item.order)
  items: OrderItem[];
}

SQLModel (Python):

from sqlmodel import SQLModel, Field, Relationship

class Order(SQLModel, table=True):
    id: int | None = Field(default=None, primary_key=True)
    customer_id: int = Field(foreign_key="customer.id")
    customer: Customer | None = Relationship(back_populates="orders")
    # Same class works for DB table AND API validation

2. CDC Capture

PostgreSQL's Write-Ahead Log (WAL) captures every change:

{
  "table": "order",
  "operation": "insert",
  "after": { "id": 1, "customer_id": 123, "status": "pending", "total": 99.99 }
}

Redpanda Connect streams these events to Kafka topics in real-time.

3. Transform & Load

Moose functions transform normalized data into denormalized analytics tables:

// Denormalize: join customer data into orders
export interface OrderFact {
  order_id: UInt64;
  customer_name: string; // From customers table
  customer_email: string; // From customers table
  status: string;
  total: Float64;
}

4. Fast Analytics

Query ClickHouse without expensive JOINs:

SELECT customer_name, SUM(total) as revenue
FROM order_fact
WHERE order_date >= today() - 30
GROUP BY customer_name
ORDER BY revenue DESC;

Use Cases

When to use OLTP-to-OLAP CDC:

  • Analytical queries slowing down your transactional database
  • Building real-time dashboards and reports
  • Need to separate operational and analytical workloads
  • Want fast aggregations without complex JOINs
  • Require historical data analysis alongside live transactions

Example scenarios:

  • Revenue and sales analytics
  • Customer behavior analysis
  • Product performance metrics
  • Real-time operational monitoring
  • Audit trails and compliance reporting

Technology Stack

Shared Infrastructure:

  • PostgreSQL 15 with logical replication
  • Redpanda Connect (CDC connector for capturing WAL changes)
  • Redpanda (Kafka-compatible streaming platform)
  • Moose (stream processing and transformations)
  • ClickHouse (columnar analytics database)

ORM Examples:

Example Language ORM API Framework Port
TypeORM TypeScript TypeORM 0.3 Express + Scalar 3000
SQLModel Python SQLModel (SQLAlchemy 2.0 + Pydantic) FastAPI 3002
Drizzle TypeScript Drizzle Express 3003
Prisma TypeScript Prisma Express 3004

Test Client:

  • React 18 + TypeScript + Vite
  • shadcn/ui components (Dialog, Button, etc.)
  • Tailwind CSS for styling
  • React Query for data fetching
  • Dynamic backend switching via settings UI

Key Concepts

Change Data Capture (CDC) Captures database changes from PostgreSQL's Write-Ahead Log. No polling, no triggers - log-based replication with minimal overhead.

Denormalization Pre-joining related data for fast analytics. Instead of JOINing orders with customers at query time, customer data is embedded in the order fact table.

Star Schema Fact tables (orders, order items) and dimension tables (customers, products) optimized for analytical queries and aggregations.

Real-time Processing Changes appear in ClickHouse within milliseconds, enabling live dashboards and up-to-the-second reporting.

Contributing

Contributions are welcome! This is an educational project demonstrating CDC patterns across multiple ORMs.

Areas for contribution:

  • ๐Ÿ› Bug fixes - Improvements to existing TypeORM and SQLModel examples
  • ๐Ÿ“š Documentation - Clarify setup instructions, add troubleshooting tips
  • ๐Ÿ”ง Experimental ORMs - Help stabilize the Drizzle and Prisma examples
  • โšก Performance - Optimize transformations and data flow
  • ๐Ÿงช Test scenarios - Add new features to test client (filters, charts, etc.)
  • ๐ŸŽจ UI improvements - Enhance test client user experience

Why contribute?

  • Learn CDC patterns hands-on
  • See how different ORMs handle the same architecture
  • Build portfolio-worthy work with real-world data engineering
  • Help others learn from your improvements

See CONTRIBUTING.md for detailed guidelines.

Learn More

Core Technologies:

ORMs & Frameworks:

License

MIT


Ready to Get Started?

Three ways to explore this project:

  1. Quick Test - Follow Quick Start above for a 5-minute setup
  2. Deep Dive - Review the SQLModel architecture docs under apps/sqlmodel-example/docs
  3. Interactive Learning - Run ./setup.sh in any example directory for guided CDC configuration

Choose your ORM:

  • TypeORM - TypeScript/Node.js (recommended for beginners)
  • SQLModel - Python/FastAPI
  • Drizzle and Prisma - Experimental setups ready for contributions

Then test with: Test Client - Interactive UI to see CDC in real-time