GitHub - ashinvinod/chernoffOT

3 min read Original article ↗

Real-time service monitoring with Chernoff faces. Spot outages at a glance — happy faces mean healthy services, angry faces mean something's wrong.

ChernoffOT queries any Prometheus-compatible endpoint and turns your services into expressive cartoon faces. Point it at your existing Prometheus, Amazon Managed Prometheus, Google Managed Prometheus, Grafana Cloud, or any other PromQL-compatible backend.

All services healthy — green happy faces
All services healthy

Some services degraded — angry and stressed faces
Some services degraded

Quick Start

Docker

docker build -t chernoffot .
docker run \
  -e PROMETHEUS_URL=https://your-prometheus-endpoint \
  -p 3000:3000 \
  chernoffot

Open http://localhost:3000 — you'll see faces for every discovered service.

Docker Compose

Copy the provided docker-compose.yml, set PROMETHEUS_URL in a .env file, then:

echo 'PROMETHEUS_URL=https://your-prometheus-endpoint' > .env
docker compose up

From Source

bun install
PROMETHEUS_URL=https://your-prometheus-endpoint bun run dev

Prerequisite: Bun runtime (curl -fsSL https://bun.sh/install | bash)


How It Works

Each service in your infrastructure becomes a face. By default, facial features encode these four outage signals:

Feature Default Metric Good → Bad
Mouth Error rate 😊 Smiling → 😟 Frowning
Eye Size Latency (p95) 😑 Calm → 😳 Wide open
Brow Angle Traffic anomaly 😌 Relaxed → 😠 Angry
Head Color Health score 🟢 Green → 🔴 Red

Sweat drops appear when a service is critically unhealthy (< 30% health). You can remap which normalized metric drives each facial feature from the in-app mapping modal (? button in the header).

Architecture

Your Services → Prometheus-compatible backend → ChernoffOT → 🎭 Faces

ChernoffOT auto-discovers your services, classifies them by type (app, database, cache, queue, worker), and fetches type-appropriate metrics via PromQL. No manual service registration required.

ChernoffOT works with any PromQL-compatible backend. Just point PROMETHEUS_URL at it:

Provider PROMETHEUS_URL
Amazon Managed Prometheus https://aps-workspaces.<region>.amazonaws.com/workspaces/<id>
Google Managed Prometheus https://monitoring.googleapis.com/v1/projects/<project>/location/global/prometheus
Azure Monitor Prometheus https://<workspace>.prometheus.monitor.azure.com
Grafana Cloud https://prometheus-prod-XX.grafana.net/api/prom

🚧 Coming soon: zero-config bundled Prometheus. Join waitlist to get notified→

ChernoffOT currently issues direct requests to PROMETHEUS_URL and does not add auth headers or cloud request signing. For secured endpoints, route through an authenticated proxy/gateway and point PROMETHEUS_URL to that proxy.

Configuration

All configuration is via environment variables with sensible defaults:

Core

Variable Default Description
PROMETHEUS_URL http://localhost:9090 PromQL-compatible endpoint
PORT 3000 Port the ChernoffOT server listens on
REFRESH_INTERVAL 15 Dashboard refresh interval in seconds

Service Discovery & Labels

Variable Default Description
LABEL_SERVICE service_name Primary label identifying services
SERVICE_LABEL_ALIASES service_name,service Comma-separated fallback label names
LABEL_STATUS http_response_status_code Label used for HTTP status codes
METRIC_DURATION http_server_request_duration_seconds Primary request duration metric name
SERVICE_TYPE_OVERRIDES Force service types: svc1=db,svc2=cache

Tuning

Variable Default Description
DISCOVERY_CACHE_MS 60000 How often to re-discover services (ms)
CLASSIFIER_CACHE_MS 600000 How often to re-classify service types (ms)
METRICS_CACHE_MS 15000 How often to re-fetch metrics (ms)
ACTIVE_WINDOW_SEC 300 Seconds before a service is considered missing
MISSING_RECENTLY_SEC 3600 Seconds before a missing service is considered stale
CATALOG_TTL_SEC 604800 Seconds before a stale service is pruned (7 days)

Service Classification

ChernoffOT automatically classifies services by inspecting which metrics they expose:

Type Detection Signals
Application http_server_*, http_requests_*, grpc_server_*
Database postgres_*, mysql_*, mongodb_*, lock/connection metrics
Cache redis_*, memcached_*, hit/miss, eviction metrics
Queue kafka_*, rabbitmq_*, nats_*, lag/depth metrics
Worker job_*, task_*, queue consumer metrics

Each type uses different PromQL queries optimized for that workload. Override with SERVICE_TYPE_OVERRIDES=my-redis=cache,my-pg=db.


Compatible Backends

Any backend that speaks PromQL:

  • ✅ Prometheus / VictoriaMetrics / Thanos / Cortex / Grafana Mimir
  • ✅ Amazon Managed Prometheus / Google Managed Prometheus / Azure Monitor
  • ✅ Grafana Cloud / Coralogix / New Relic (Prometheus endpoint)
  • ❌ Datadog (no PromQL support)

Tech Stack

  • Runtime: Bun — fast JavaScript/TypeScript runtime
  • Frontend: Vanilla JS + SVG (no framework, no build step)
  • Backend: Single TypeScript file, ~2200 lines
  • Dependencies: Zero runtime dependencies
  • Container: Alpine-based, < 100MB image

Development

bun install
PROMETHEUS_URL=http://localhost:9090 bun run dev

License

MIT