Show HN: StepKit, an open and cross-platform durable execution standard
Hi HN! I’m Tony, one of the co-founders of Inngest (https://inngest.com/). Wanted to show you something we’re working on: StepKit.
StepKit is an open source SDK and framework for building and iterating on durable workflows that run on any platform (self-hosted, Inngest, Cloudflare, Netlify, etc.) without requiring any provider or bundler-specific code.
Here’s the repo: https://github.com/inngest/stepkit.
StepKit extracts the core execution loop that we built in Inngest and makes it fully open, Apache 2, and hackable/pluggable to different backends. We don’t want people to have to go through multiple major versions in SDKs to learn the lessons we’ve already learned in production:
It needs to work anywhere: This SDK is push-based and receives tasks via API endpoints, so it works anywhere (servers, serverless, k8s, etc). That's also customizable. It doesn’t need any specific runtime or bundler support.
It needs to be complete: StepKit includes the entire execution engine: step discovery, memoization, a core event loop to turn async steps into generators without relying on `try/catch` based control, and middleware for extensibility (eg, end-to-end encryption, Sentry integrations, and so on).
It needs a simple API: StepKit APIs are explicit `step.*()` functions, which we designed in our original Inngest SDK back in 2022. They’re easy to read, understand, implement, and use. They contain all of the primitives for durable execution: steps, suspend/resume, human-in-the-loop, and observability. These same APIs have also been adopted by Cloudflare, Netlify, Convex, and others, and support billions of runs every month.
It needs to be resilient: Steps tolerate changes when refactoring, and provide the building blocks of durable execution without complex abstractions.
We’re starting with in-memory, filesystem, Inngest, and Cloudflare (WIP) drivers, with more coming soon.
The roadmap includes middleware, compatible SDKs in different languages, as well as extensions like concurrency controls and idempotency: https://github.com/inngest/stepkit/discussions/52.
Excited to see what you think: https://github.com/inngest/stepkit. Inngest engineer here! For a little extra context, the `@stepkit/core` package is basically just an API for defining a workflow. There isn't much to it because we don't want to be overly opinionated on backend implementations! The `@stepkit/sdk-tools` package is a set of tools for building your own StepKit SDK. The vast vast majority of stuff in there is optional, but highly valuable if you want to avoid reinventing the wheel when building your own SDK. The API gods have struck gold again! I love this new approach. Really looking forward to seeing how extensions extend the core SDK functionality, especially those created by the OSS community. I also looking forward to seeing how the invoking of Inngest workflows directly in your business logic and getting the result synchronously will materialize. This will be a huge unlock in handling error prone workflows. I've had early access to StepKit and this kind of sane, explicit API with pluggable backends feels like the right direction. Kudos to the team here! This is really neat!! Looks easy to set up. Super convenient to have a SDK to deploy AI workflows. Love the automatic retries and suspend/resume Durable execution feels underrated. It lines up almost exactly with a Process Manager: track state, pick the next step, orchestrate calls, no hand-rolled state machines or persistence glue. You can hack a demo in a few hours, but getting the guarantees right in production is a totally different game, so seeing an OSS implementation from people who have actually done this before is interesting Long time inngest user, this looks very interesting. Wow, this is great. I have been using Inngest for over a year now and really like the APIs you guys have created for defining step functions / event handlers. I'm very glad to see that this API is now open-source so that it can be adopted more broadly! with all the hype around durable execution... what makes it difference from job queuing solutions like BullMQ or Agenda.js that rely on DLQ on top of Redis or Mongo? is it just a DX thing? You can see durable execution as a combination of persistent state and queues (simplified example). With regular queues, the state is spread across many places from messages, runtime and external storages where the primary value is the reliability of the message processing and simple error management. Durable brings more advanced error management and end to end reliability with persistent state. Damn that looks cool congrats guys! Agreed, it looks pretty nice! Looks cool. How does this compare to Vercel Workflow Kit or Cloudflare Workflows? Vercel Workflow Kit takes a very different approach. Lack of step IDs (which makes them worse at handling code changes), compilation step, more opinionated about backends ("worlds", as they call them). Vercel Workflow Kit has magic that admittedly makes it a little easier to get started, but that magic causes problems when you want a mature product. Cloudflare Workflows are actually complementary to StepKit! We'll soon release an adapter that lets you define StepKit workflows that run as Cloudflare Workflows. We have a POC in `packages/cloudflare` in our repo Looks neat, how to you guys compare to upstash workflows ? We'll release an Upstash Workflows adapter soon! StepKit is ultimately just an in-code API that lets you define workflows in a backend agnostic way. We want you to define workflows that can run in Upstash, Inngest, Cloudflare... really anywhere! Interesting, will dig this!