Incidentary — Shared causal traces for incident response

shared incident tracing

Your team.
One causal truth.
Before the war room starts guessing.

Monitoring tools help you investigate. Incidentary helps your team converge first.

When an alert fires, teams don't lack dashboards. They lack agreement. Incidentary captures the pre-alert causal chain and delivers it as a shared replayable artifact — so the room starts from one picture, not five.

pre-alert causal trace · assembled in < 2s

trace · onboarding-quickstart · 1.8s

service / operationtimelinems

5 spans·1 error·pre-alert

root cause:

payment-svc→pg_pool exhaustion

1 artifact

shared by the whole room

60 sec

pre-alert window captured

No lock-in

Open-source capture layer

before / after

Same incident. Different first minute.

Without Incidentary

Alert fires
Responders open different tools
Each person sees a different symptom
Cause and fallout get confused
One engineer synthesizes the story for everyone else
10 to 20 minutes spent aligning before real debugging begins

With Incidentary

Alert fires with a direct link to the shared trace
The room opens one artifact
Everyone sees what broke first, how it propagated, and where coverage is missing
Responders align in minutes, by shared evidence — not narration
Datadog becomes the second step, not the first

how it works

Four steps. No black boxes.

instrument

Drop in the SDK. One middleware call wraps your HTTP handlers and propagates incident context automatically. No distributed config files. No sampling tuning. No OpenTelemetry collector to maintain.

import { incidentary } from '@incidentary/sdk-node';

app.use(incidentary.middleware());

ingest

Spans, errors, and structured logs flush over a persistent gRPC stream. No sampling. No dropped events at the boundary. No gaps caused by buffer timeouts.

// spans flushed automatically
// errors captured at boundary
// logs correlated by trace-id

assemble

The correlator builds a causal graph in real time. When anomaly thresholds breach, the pre-arm ring buffer locks: the 60 seconds before the alert are already captured and attached to the incident artifact.

// pre-arm triggered at T-90s
// root cause isolated T-20s
// runbook linked T-8s

respond

The alert fires with a direct link. Your team opens one artifact — the complete causal trace, shared across the room. Not five dashboards. Not one senior engineer narrating to four others. One picture.

// alert fires
// one shared trace lands in Slack
// cause before dashboards, not after

solo mode

One SDK install.
Every dependency revealed.

Install the SDK on a single service. Incidentary observes every outbound call and surfaces uninstrumented dependencies as ghost services — services you depend on but don't have data from. One install reveals your entire dependency topology.

No teammates required. No config. The anomaly feed catches latency spikes and error bursts before they become incidents. The coverage scorecard shows you where to instrument next.

You get value in five minutes, not after the next outage.

payments

← calls

checkout-svc

calls →

inventory

checkout-svc

calls →

shipping

service coverage1 / 4 instrumented

instrumentedghost service

1serviceSee what your service depends on

2–5servicesMap your topology as you instrument

6–15servicesCross-service incidents, clearly

15+servicesTeam convergence and shared traces

pre-alert window

The 60 seconds before the alert.
Usually reconstructed at 3am.
Now already waiting.

Signal correlators watch your telemetry streams continuously. The moment anomalies appear, pre-arm sequences begin — assembling the causal path, linking related events, and tagging the break before the alert fires.

By the time PagerDuty wakes your team, the causal prelude is already rendered. Not a guess. Not an AI summary. A deterministic trace built from what your services actually reported to each other.

T-90s

checkout-svc latency ↑

T-72s

payment-svc 5xx rate spike

T-55s

db-pool exhaustion detected

T-38s

trace assembly started

alert fires → context ready

82:00

avg MTTR for distributed incidents

↓< 1:30 to shared ground truth

mttr improvement

The war room used to start by figuring out what happened.
Now it starts by acting on what happened.

The typical war room spends the first 15-20 minutes just figuring out what happened — five engineers, five tools, five incomplete pictures. Incidentary collapses that convergence phase to under 90 seconds, because the trace is already assembled when the alert fires.

who it's for

Built for the teams who feel the pain of
distributed incidents.

just split the monolith?

You went from one service to three. Now incidents involve services you didn't even know called each other. Incidentary shows the causal chain across every boundary — before the war room starts guessing.

running distributed services?

When five engineers are looking at five dashboards, agreement takes longer than the fix. Incidentary delivers one shared artifact so the room converges before anyone opens a terminal.

expansion

The incident is the product demo.

One engineer shares a trace link in Slack. Teammates see the causal chain without installing anything. They notice the ghost service gaps — services where Incidentary knows a call was made but can't see inside. The product sells itself through its own gaps.

01one engineer installsSDK on one service, 3 minutes. Ghost services and the anomaly feed appear immediately.

02first incident sharedA trace link lands in Slack. Teammates see the causal chain — and the ghost service gaps.

03teammates instrumentGhost services become real services. The coverage scorecard tracks progress toward full visibility.

04team convergesEvery incident starts from one shared artifact. MTTR drops. The coverage scorecard turns green.

platform

Every library. Zero config. One causal chain.

auto-instrumentation

The SDK detects libraries in your dependency tree and patches them at startup. No manual span creation. No config files. If OpenTelemetry already patched a library, the SDK skips it.

nodeexpress · fastify · koa · pg · ioredis · bullmq · amqplib · kafkajs · grpc

pythonfastapi · flask · django · psycopg2 · asyncpg · celery · kombu

gogin · echo · chi

dotnetaspnetcore · httpclient · efcore · grpc · masstransit · lambda

25 libraries · 4 ecosystems · zero config required

database query capture

Query timing and connection metadata captured automatically. No parameters. No full query text. No sensitive data.

pg · ioredis · psycopg2 · asyncpg

queue instrumentation

Publish-consume pairs linked causally. Async workflows traced end-to-end without manual context propagation.

bullmq · amqplib · kafkajs · celery · kombu

grpcfull causal linkage · all sdks

opentelemetryzero-code ingest from collector

custom eventswebhooks · jobs · custom ops

rest api10K req/min · cursor pagination

integrations

Plugs into the tools your team already uses.

slack
notifications + slash commands
Incident URL posted automatically. /incidentary slash command to open traces inline.
pagerduty
incident url in timeline
Webhook fires on alert. Causal trace URL injected into PagerDuty incident timeline.
opsgenie
webhook triggers
Webhook integration triggers artifact assembly. Link back into OpsGenie alert.
kubernetes
cluster events + topology
Helm install in one command. Watches 14 resource types — OOM kills, crash loops, evictions, node pressure, HPA scaling, deploy rollouts. Populates service topology from workload annotations. No SDK required on the cluster.
opentelemetry
zero-code ingest from existing collector
Send existing OTel spans to Incidentary via OTLP. No SDK install needed. Coexists with Incidentary SDKs in the same trace.
shared links
no login · token-based · read-only
Paste in Slack, email, or Jira. Anyone with the link sees the trace. No account needed.

trust posture

privacy:
  data_boundary:    metadata-only
  request_bodies:   never captured
  query_parameters: never captured
  headers:          never captured

completeness:
  labels:           full | partial | low
  topology_aware:   true

retention:
  windows:          14d | 30d | 90d
  deletion:         hard delete at expiry

pre_arm:
  signals:          5xx rate · slow success
                    in-flight pileup · retry onset
  thresholds:       configurable per service

quickstart

One middleware call.

No distributed config files. No sampling tuning. No OpenTelemetry collector to maintain. The SDK is a single middleware — it handles context propagation, event capture, and span flushing.

Your services keep running. Incidentary keeps watching.

checkout-svc/index.ts

import { incidentary } from '@incidentary/sdk-node';
import express from 'express';

const app = express();

// Wrap once — all routes instrumented
app.use(incidentary.middleware({
  apiKey: process.env.INCIDENTARY_API_KEY,
  serviceName: 'checkout-svc',
}));

app.post('/checkout', async (req, res) => {
  // spans, errors, and slow queries captured automatically
  const order = await processOrder(req.body);
  res.json(order);
});

get started

Start in minutes.
The SDKs are yours. The infrastructure is ours.

The capture SDKs are Apache 2.0 licensed. Read every line of source. Fork freely. No proprietary agent. No lock-in at the instrumentation layer.

Incidentary runs as a managed cloud service. No infrastructure to provision, no database cluster to operate, no retention policies to tune. Install the SDK, point it at your workspace, and the shared causal trace is there when the next alert fires.

First 20 teams get a direct Slack channel with the founder for feature requests and priority support.

Your team.One causal truth.Before the war room starts guessing.

Same incident. Different first minute.

Without Incidentary

With Incidentary

Four steps. No black boxes.

instrument

ingest

assemble

respond

One SDK install.Every dependency revealed.

The 60 seconds before the alert.Usually reconstructed at 3am.Now already waiting.

The war room used to start by figuring out what happened.Now it starts by acting on what happened.

Built for the teams who feel the pain ofdistributed incidents.

The incident is the product demo.

Every library. Zero config. One causal chain.

auto-instrumentation

database query capture

queue instrumentation

Plugs into the tools your team already uses.

slack

pagerduty

opsgenie

kubernetes

opentelemetry

shared links