This’ll be a much more technical post — a bit of a break from my usual, more business leaning pieces.
I’ve been building production AI software for some years now between my work in GTM above, and with Casely. The recurring problem that comes up when working with my preferred framework of Next.js is the problem of long running compute and deferring work to be ran later.
For the longest time my answer had been tried and tested BullMQ worker jobs + Redis. I started feeling a lot more papercuts and kept an eye out for a new way of working — too many small network delays to enqueue work, not a clear way of terminating jobs immediately (very important for expensive AI calls if the user hits “stop”), etc.
I’ve since found my answer, and that is leveraging Node.js child processes.
I’ll focus the conversation around self-hosted Next.js instances primarily. If you are both a) building in Next.js and b) hosting on Vercel, I highly recommend just digging into “use workflow” — I’ve used this also, and will double click into this later.
Let’s jump straight into how I’m using child processes now for all my long running tasks. Think AI responses, processing files, extracting text from and generating PDFs.
What I have is a Next.js server action that triggers spawning of a child process, and a child process script that actually does the work I need.
I pass in a discriminated union type “ChildProcessData” which passes over the data I need for whatever job I’m looking to run.
"use server"
import { spawn } from ‘child_process’
export const childProcessStart = async (
data: ChildProcessData,
options?: {
detached?: boolean
/**
* When true, spawns an independent process that survives
* parent termination. Useful for long-running background tasks.
*/
independent?: boolean
},
): Promise<{ pid: number }> => {
const workerPath = path.join(__dirname, ‘child-process-worker.js’)
const isIndependent = options?.independent ?? false
const isDetached = options?.detached ?? true
// Spawn the worker process
const worker = spawn(’node’, [workerPath, JSON.stringify(data)], {
stdio: isIndependent ? ‘inherit’ : [’ignore’, ‘pipe’, ‘pipe’],
detached: isDetached,
})
// Forward logs for non-independent processes
if (!isIndependent) {
worker.stdout?.pipe(process.stdout)
worker.stderr?.pipe(process.stderr)
}
// Basic error handling
worker.on(’error’, (error) => {
console.error(`Worker process error:`, error)
})
worker.on(’exit’, (code) => {
if (code !== 0) {
console.error(`Worker exited with code ${code}`)
}
})
// Allow parent to exit independently
worker.unref()
if (!worker.pid) {
throw new Error(’Failed to start worker’)
}
return { pid: worker.pid }
}The server action is great developer experience — no need to call an API endpoint in my client code, and a regular function call in backend code.
I pipe all the logs from the child process over to the main Next.js process so we can continue to see logging in our main Next.js container.
You’ll notice this child process call function is mainly triggering another script to spawn. This is another TypeScript file that I have a custom build script for.
#!/usr/bin/env node
/**
* Background worker for handling child processes
* This script is spawned as a separate process to handle long-running tasks
*/
async function handleCounter(data: { count: number }) {
console.log(`[worker] Counter task started with count: ${data.count}`)
// Simple counter implementation
for (let i = 0; i < data.count; i++) {
console.log(`[worker] Counter: ${i + 1}/${data.count}`)
await new Promise((resolve) => setTimeout(resolve, 1000))
}
console.log(`[worker] Counter task completed`)
}
async function main() {
console.log(`[worker] Worker process started`)
// Get data from command line arguments
const args = process.argv.slice(2)
if (args.length === 0) {
console.error(`[worker] No data provided, terminating...`)
process.exit(1)
}
let data: { name: string; count: number }
try {
data = JSON.parse(args[0]!) as { name: string; count: number }
console.log(`[worker] Parsed data, type: ${data.name}`)
} catch (error) {
console.error(`[worker] Failed to parse data:`, error)
process.exit(1)
}
try {
// Handle different task types
switch (data.name) {
case ‘counter’:
await handleCounter(data)
break
default:
console.error(`[worker] Unknown process type: ${data.name}`)
process.exit(1)
}
console.log(`[worker] Process completed successfully`)
process.exit(0)
} catch (error) {
console.error(`[worker] Process failed:`, error)
process.exit(1)
}
}
// Run the worker
main().catch((error) => {
console.error(`[worker] Unhandled error:`, error)
process.exit(1)
})And the build script I run to compile this into javascript for the child process.
#!/bin/bash
# Development worker build script
# Watches for changes and rebuilds the worker automatically
# Get the directory where this script is located
SCRIPT_DIR=”$(cd “$(dirname “${BASH_SOURCE[0]}”)” && pwd)”
# Get the nextjs-web app directory (parent of scripts)
APP_DIR=”$(dirname “$SCRIPT_DIR”)”
# Get the repo root directory (parent of apps)
REPO_ROOT=”$(dirname “$(dirname “$APP_DIR”)”)”
SOURCE_FILE=”$REPO_ROOT/packages/shared/src/services/child-process-worker.ts”
OUTPUT_FILE=”$APP_DIR/dist/services/child-process-worker.js”
# Ensure output directory exists
mkdir -p “$(dirname “$OUTPUT_FILE”)”
esbuild “$SOURCE_FILE” \
--bundle \
--platform=node \
--target=node20 \
--outfile=”$OUTPUT_FILE” \
--external:pg-native \
--watch # remove this line for production buildsThis workflow is probably not for everyone.
I, however strongly prefer this use of Node.js primitives, and since I’m self-hosting my Next.js instance, there’s no surprises or confusions with where my code is running.
Setting up child processes lets me unattach the Redis instance I was using as it was primarily for managing BullMQ worker jobs, and I don’t have to run a separate container for my worker server to run. I have fault tolerance with retries and error try/catching in the background script itself — not provided above for simplicity, but just ask your work husband Claude Code to add it in for you.
And another massive improvement the overall smaller infrastructure footprint provides is things just run faster. No network delays when jobs kick off ie. AI text generations.
Also as a bootstrapped SaaS builder, having less infrastructure on AWS also is a meaningful improvement to cost reductions.
There are two other patterns I want to call out, the first of which being “use workflow” by Vercel.
If I was not self-hosting, this would’ve been my preferred option. The use workflow team has open sourced the code, and it’s implementation is essentially a maybe more “done for you” implementation of the same DX I have with child processes, but in the Vercel hosted environment.
You can try to self host use workflows — I was not quite able to get it to work between my code being in a monorepo, being built with turbopack, self hosted, etc. Lot of paper cuts for me personally going this route. But on projects hosted on Vercel, the workflows is extremely powerful, and provides much of the same benefits I’ve found with child processes.
Stepping outside JavaScript, devs in .NET and Java have known of the “Actor pattern” for some time.
There’s a massive open source implementation of this patter built by the folks over at rivet.dev for us in TypeScript land.
The actor pattern is a way to have long running compute, but done so in a way in which data can flow both to and out of the actor in real-time.
A client will kick off work to be done by an actor, and while the actor is completing its work, can send off websocket broadcasts back to the client to reflect updated state.
Another interesting tool here, but again I ran into implementation issues. One blocker here was the reliance on websockets meant I had to update my middleware auth covering all my endpoints to allow the Rivet actors to communicate with my client.
Not ideal.
The Next.js + AI SDK + Node.js primitives stack is extremely powerful, low level enough for me to feel I have most of the magic understood, and so far an extremely fun developer experience.
