How Server-Sent Events Actually Work on Deno Deploy

31 min read Original article ↗

I spent a week convinced my SSE implementation was wrong even if it did actually work in practice.

The dashboard for pls — a release automation tool I've been building — shows running jobs, failed runs, and recent releases. I wanted the "in progress" section to update in real-time. No page refreshes. Just watch the status go from pendingrunningsuccess.

Simple enough, right? Server-Sent Events. Textbook stuff.

But here's the thing: Deno Deploy is serverless.

There are no long-running processes. Isolates spin up, handle requests, and die. How does a persistent SSE connection even work in that world?

Let's go way too deep into various rabbitholes.

The Setup

My SSE endpoint looks like this:

// routes/api/events.ts
export const handler = define.handlers({
  async GET(ctx) {
    const user = await getSession(ctx.req);
    const db = getDb();

    const stream = new ReadableStream({
      start(controller) {
        const encoder = new TextEncoder();

                const pollInterval = setInterval(async () => {
          const runUpdates = await db.run.findMany({
            where: { updated_at: { gt: lastRunUpdate } },
            orderBy: { updated_at: "asc" },
            take: 50,
          });

          for (const run of runUpdates) {
            const event = `event: run\ndata: ${JSON.stringify(run)}\n\n`;
            controller.enqueue(encoder.encode(event));
          }
        }, 1000);

                ctx.req.signal.addEventListener("abort", () => {
          clearInterval(pollInterval);
        });
      },
    });

    return new Response(stream, {
      headers: {
        "Content-Type": "text/event-stream",
        "Cache-Control": "no-cache",
      },
    });
  },
});

Polling Postgres. Every second. From a serverless function.

My immediate reaction: "This can't possibly work."

It does.

Why It Works: Isolate Lifecycle

The key insight is in Deno Deploy's runtime documentation:

"The application remains alive until no new incoming requests are received or responses (including response body bytes) are sent for a period of time."

That phrase — "response body bytes" — is everything.

When you return a ReadableStream, the response isn't complete. The isolate stays alive as long as bytes keep flowing. My controller.enqueue() calls are keeping the isolate from being garbage collected.

Timeline:
────────────────────────────────────────────────────────────────►
│
├─ SSE connection opens, isolate starts
│
├─ poll DBenqueue bytes ───────┐
│                                 │
├─ [1s] poll DBenqueue bytes ──┼── Isolate stays alive
│                                 │   because bytes are flowing
├─ [2s] poll DBenqueue bytes ──┤
│                                 │
├─ [30s] : ping comment ──────────┘
│
├─ Client disconnects (abort signal fires)
│  └─ clearInterval(), response completes
│
├─ [5s-10min idle]Isolate shutdown

What's the maximum lifetime? The docs say the idle timeout is "between 5 seconds and 10 minutes", but as long as you're sending bytes, there's no hard cap mentioned. In practice, isolates can be terminated for other reasons (deployments, resource limits, infrastructure updates), but the browser's EventSource auto-reconnects transparently.

The Multi-Isolate Problem

Here's where it gets interesting.

Deno Deploy doesn't run one isolate. It runs many, potentially across different edge regions. When a webhook comes in from Trigger.dev saying "job complete", it might hit Isolate C. But the SSE connections for users watching that job are in Isolates A and B.

┌─────────────────────────────────────────────────────────────┐
│                     Deno Deploy Region                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Isolate A         Isolate B         Isolate C              │
│  (SSE for User 1) (SSE for User 2) (Webhook handler)        │
│      │                 │                 │                  │
│      │ pollpollINSERT           │
│      ▼                 ▼                 ▼                  │
│  ┌─────────────────────────────────────────────────────┐    │
│  │                     Postgres                        │    │
│  │     (the only shared state across isolates)         │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

There's no shared memory. No in-process pub/sub. Each isolate is a completely separate V8 instance.

The database is the only truth they all agree on.

Polling the database is the simplest approach that works across isolates without extra infrastructure. For a dashboard where 1-second latency is fine, it's the right call.

Going Deeper: What Actually Happens

Let me trace a complete cycle, from Trigger.dev callback to browser update.

Step 1: Webhook Arrives

Trigger.dev finishes a job and POSTs to /webhooks/trigger-dev:

POST /webhooks/trigger-dev
{ "runId": "run_123", "status": "success", "releaseTag": "v1.2.0" }

This hits some isolate — let's call it Isolate C. It updates Postgres:

await db.run.update({
  where: { id: "run_123" },
  data: { status: "success", release_tag: "v1.2.0", updated_at: new Date() },
});

Isolate C returns 200 and might immediately shut down. It has no idea that Users 1 and 2 are watching this run.

Step 2: SSE Isolate Polls

Meanwhile, in Isolate A, my setInterval fires. It queries:

SELECT * FROM run WHERE updated_at > '2024-01-15T10:00:00Z' LIMIT 50

The row for run_123 comes back with its new status.

Step 3: The Bytes Flow

const event = `event: run\ndata: {"id":"run_123","status":"success"}\n\n`;
controller.enqueue(encoder.encode(event));

That enqueue() call triggers a chain:

  1. Bytes go into the ReadableStream's internal queue
  2. Deno's HTTP layer reads from the stream
  3. Tokio (Deno's async runtime) writes to the TCP socket
  4. Linux kernel buffers the data
  5. Network carries it to the browser
  6. Browser's EventSource fires onmessage

Step 4: Browser Updates

const es = new EventSource("/api/events");
es.addEventListener("run", (e) => {
  const data = JSON.parse(e.data);
  });

Total latency: ~1 second (poll interval) + ~50ms (DB query) + ~50ms (network). About 1.1 seconds from database write to UI update.

The V8 Stuff I Didn't Know I Needed to Know

Here's where I went deeper than I probably needed to. But it's fascinating.

Tagged Values

Every JavaScript value in V8 is a "tagged value" — a machine word where the least significant bit (LSB) indicates whether it's a pointer or a small integer.

Here's the trick: heap objects are always aligned to even addresses (at least 2-byte aligned, usually 8-byte). An address like 0x1000 in binary is ...0001_0000_0000_0000 — the LSB is always 0. That's a wasted bit. V8 exploits this free bit for tagging:

Small integer (Smi) 42:
  42 << 1 = 0x54  (binary: ...0101_0100)
                                      └─ 0 = this is a Smi

Heap pointer to object at 0x1000:
  raw:    0x1000              (binary: ...0001_0000)
                                                  └─ 0 = free bit (alignment)
  tagged: 0x1000 | 1 = 0x1001 (binary: ...0001_0001)
                                                  └─ 1 = now marks "this is a pointer"

The runtime check is a single bitwise AND:

if (tagged & 1) {
    Object* ptr = (Object*)(tagged & ~1);  // mask off tag, follow pointer
} else {
    int32_t smi = (int32_t)(tagged >> 1);  // shift right to recover value
}

One bit. That's all it takes to distinguish "the number 42" from "a pointer to a string object." No type headers, no indirection for small integers.

Pointer Compression

On 64-bit systems, these tagged values are 8 bytes. That's wasteful — most JavaScript heaps are well under 4GB. So V8 does something clever: it allocates all heap objects within a 4GB region and stores 32-bit offsets instead of full pointers.

Full pointer:     0x0000_1000_0042_1337 (8 bytes)
Compressed:       0x0042_1337           (4 bytes, offset from heap base)

Decompression is just adding the base address. From V8's blog:

"Upon inspection of the heap, tagged values occupy around 70% of the V8 heap on real-world websites."

Compressing those from 8 bytes to 4 bytes is a significant win.

Inline Caches and Hidden Classes

This one actually matters for my SSE handler.

JavaScript objects are dynamic — you can add or remove properties at any time. Naively, that means every property access like obj.foo requires a dictionary lookup: hash the string "foo", probe the hash table, follow the pointer. Slow.

V8 avoids this with hidden classes (called "maps" internally, "shapes" in other engines). These are internal structures that describe an object's layout: which properties exist, in what order, at what memory offsets.

const user = { name: "alice", age: 30 };

// V8 internally creates:
// HiddenClass {
//   "name" → offset 0
//   "age"  → offset 8
// }
//
// The object stores:
// [pointer to HiddenClass][alice][30]

When V8 compiles user.age, it can check: "if this object has HiddenClass X, then age is at offset 8" — a single memory read instead of a hash lookup.

The trick is inline caches. At each property access site, V8 remembers which hidden class it saw last:

  • Monomorphic: Always the same hidden class. V8 generates machine code that does a direct offset read. Fastest.
  • Polymorphic: 2-4 different hidden classes. V8 checks each one. Still fast.
  • Megamorphic: More than ~4 hidden classes. V8 falls back to dictionary lookup. 10-100x slower.

From V8 function optimization:

"V8 gives up on polymorphism after it has seen more than 4 different object shapes, and enters megamorphic state."

For my SSE handler, every object from Prisma should have the same shape — same properties, same order. That keeps the inline cache monomorphic. But if Prisma sometimes returns objects with different property orders (say, optional fields present vs absent), the cache could go polymorphic or worse. I haven't profiled this, but it's on my list.

Concurrent Garbage Collection

My handler creates short-lived objects every second:

const runUpdates = await db.run.findMany({...}); // Array allocation
const event = `event: run\ndata: ${JSON.stringify(run)}\n\n`; // String allocation
controller.enqueue(encoder.encode(event)); // Uint8Array allocation

These go into V8's "young generation" — a small, fast region that gets scavenged frequently. V8's Orinoco garbage collector runs the scavenger in parallel across multiple threads. From V8's Orinoco blog:

"The parallel Scavenger has reduced the main thread young generation garbage collection total time by about 20%–50%."

The encoder and controller objects are long-lived — they survive scavenges and get promoted to the "old generation", which is collected less frequently with concurrent marking.

The upshot: GC pauses are generally imperceptible for my use case. V8 is really good at handling this pattern.

The Kernel Stuff

epoll: Red-Black Trees All the Way Down

When Tokio (Deno's async runtime) waits for my Postgres socket to be readable or my HTTP socket to be writable, it uses epoll on Linux.

Inside the kernel, an epoll instance maintains two data structures:

  1. Interest list: A red-black tree of all file descriptors being monitored. O(log n) insert/delete/lookup.
  2. Ready list: A linked list of file descriptors that have pending events. O(1) insert.

When my Postgres query completes, the kernel moves that socket's entry to the ready list. When Tokio calls epoll_wait(), it gets back only the sockets with events — not the hundreds of idle ones.

Tokio uses edge-triggered mode, which means epoll_wait() only returns when there's a state change, not continuously while data is available. This means fewer syscalls, but you have to drain the buffer completely each time epoll_wait() returns or you'll miss data.

sk_buff: Zero-Copy Networking (Sort Of)

When my SSE bytes hit the kernel, they become an sk_buff — the fundamental data structure of Linux networking.

struct sk_buff {
    unsigned char *head;   Start of allocated buffer
    unsigned char *data;   Start of actual data
    unsigned char *tail;   End of actual data
    unsigned char *end;    End of allocated buffer
     ... lots more
};

The clever part: when the buffer is allocated, the kernel reserves headroom at the front. As the packet moves down the network stack, each layer prepends its header by moving the data pointer backwards — the payload bytes never move:

Initial (application data "Hello"):
head                                 tail         end
 │                                    │            │
 ▼                                    ▼            ▼
 [--------- headroom ---------][Hello][--tailroom--]
                               ▲
                               │
                              data

After TCP adds header (20 bytes):
head                                 tail         end
 │                                    │            │
 ▼                                    ▼            ▼
 [------- headroom ------][TCP][Hello][--tailroom--]
                          ▲
                          │
                         data

After IP adds header (20 bytes):
head                                 tail         end
 │                                    │            │
 ▼                                    ▼            ▼
 [----- headroom ----][IP][TCP][Hello][--tailroom--]
                      ▲
                      │
                     data

After Ethernet adds header (14 bytes):
head                                 tail         end
 │                                    │            │
 ▼                                    ▼            ▼
 [-- headroom --][Eth][IP][TCP][Hello][--tailroom--]
                 ▲
                 │
                data

The buffer size is fixed. head and end never move. As headers are prepended, data shifts left into the headroom. tail stays put (unless data is appended).

The payload "Hello" stays at the same memory address throughout. Each layer just moves data backwards and writes its header into the reserved space. No memcpy() of the payload, no reallocation.

For my small SSE messages (~100-500 bytes), there's still a copy from userspace to kernelspace when the application calls write(). True zero-copy is possible via MSG_ZEROCOPY, a socket flag for send():

send(fd, buf, len, MSG_ZEROCOPY);

Instead of copying buf into kernel memory, the kernel pins those userspace pages and uses them directly. But this has overhead:

  • Page pinning: The kernel must lock the pages in memory so they can't be swapped out or moved while the network card reads from them.
  • Completion notifications: The kernel notifies your process (via the socket error queue) when transmission is complete and the buffer is safe to reuse. Your code must poll for these notifications.

This bookkeeping replaces per-byte copy cost with per-page accounting cost. The kernel docs say MSG_ZEROCOPY "is generally only effective at writes over around 10 KB".

This is a Linux syscall flag that Deno (via Tokio) could choose to use when calling send() (if exposed by tokio, which I'm not sure it is?).

HTTP/2: Why Multiple Tabs Don't Kill My Server

On HTTP/1.1, each SSE connection consumes one TCP connection. Browsers limit connections to 6 per domain. Open 7 tabs? One blocks.

Deno Deploy serves HTTP/2, where connections are multiplexed. Multiple streams share one TCP connection — and yes, this means multiple browser tabs to the same origin typically share a single connection:

┌─────────────────────────────────────────────────────────────┐
│                    Single TCP Connection                    │
├─────────────────────────────────────────────────────────────┤
│  Stream 1: SSE /api/events (tab 1)                          │
│  Stream 3: SSE /api/events (tab 2)                          │
│  Stream 5: fetch /api/runs                                  │
│  Stream 7: SSE /api/events (tab 3)                          │
│                                                             │
│  All interleaved as HTTP/2 DATA frames                      │
└─────────────────────────────────────────────────────────────┘

Default limit: 100+ concurrent streams per connection (RFC minimum is 100, NGINX defaults to 128). Way more than enough.

Isolation happens at the stream level: each stream has a unique ID, frames are tagged with that ID, and the browser tracks which stream belongs to which tab.

Worth noting: this is a different isolation model than Chrome's process-per-site architecture. Process isolation is OS-enforced — one renderer can't read another's memory because the kernel prevents it. HTTP/2 stream isolation is software-enforced in the browser's network service. A bug in the HTTP/2 implementation could theoretically leak data between streams.

And bugs do happen. Connection coalescing (deciding when to reuse connections) has had real vulnerabilities:

  • Firefox Bug 1420777: HTTP/2 connection reused for wrong server when DNS overlaps. Requests to Site B sent to Site A because Firefox reused an existing connection.
  • Firefox Bug 1604286: Connection reuse mix-up after DNS changes — old IP used as "coalescing key" for 12+ hours.
  • PortSwigger: "HTTP/2: The Sequel is Always Worse": HTTP/2 desync and request smuggling via connection reuse, enabling cross-user attacks on vulnerable servers.

These are mostly request misrouting rather than direct cross-tab data leakage, but they validate the point: sharing connections across contexts is a trade-off, it is a weaker isolation boundary but has significant performance benefits.

What About Database Connections?

HTTP/2 solves the browser-side connection limit. But there's a server-side concern: each SSE connection keeps an isolate alive, and each isolate has a database connection. Postgres defaults to 100 max_connections. With 100 concurrent users watching the dashboard, wouldn't I exhaust the pool?

This is where connection pooling comes in. I'm using Prisma Postgres (becuse of the free tier and one-click setup in deno deploy), which includes Prisma Accelerate — a connection pooler that sits between your application and the database:

┌─────────────────────────────────────────────────────────────┐
│                      Deno Deploy                            │
├─────────────────────────────────────────────────────────────┤
│  Isolate A    Isolate B    Isolate C    ...    Isolate N    │
│      │            │            │                   │        │
│      └────────────┴────────────┴───────────────────┘        │
│                           │                                 │
│                           ▼                                 │
│              ┌─────────────────────────┐                    │
│              │    Prisma Accelerate    │                    │
│              │   (connection pooler)   │                    │
│              └────────────┬────────────┘                    │
│                           │                                 │
└───────────────────────────┼─────────────────────────────────┘
                            │ ~10 actual connections
                            ▼
                 ┌─────────────────────┐
                 │      Postgres       │
                 │  (max_connections)  │
                 └─────────────────────┘

The pooler multiplexes many application "connections" over a small number of real database connections. Isolates talk to Accelerate over HTTP (not raw TCP), so there's no socket-per-isolate. The pooler manages the actual Postgres connections.

Without a pooler, serverless + Postgres is a recipe for connection exhaustion. With one, the database sees a stable, small connection count regardless of how many isolates are running.

HPACK Header Compression

HTTP/2 also compresses headers using HPACK. My SSE response headers:

content-type: text/event-stream
cache-control: no-cache

First request: sent in full (~50 bytes). Second request: referenced by index (~2 bytes).

From Cloudflare's HPACK blog:

"On average, Cloudflare sees a 76% compression for ingress headers."

Not huge for SSE (headers are only sent once per connection), but it adds up across all the other requests the dashboard makes.

PostgreSQL: How WHERE updated_at > $1 Actually Works

This is the query that runs every second:

SELECT * FROM run WHERE updated_at > $1 ORDER BY updated_at LIMIT 50

MVCC Visibility

Postgres doesn't lock rows for reads. Instead, every row has hidden columns:

  • xmin: Transaction ID that created this row version
  • xmax: Transaction ID that deleted/updated it (0 if current)
Row "run" (id=123) with two versions:

Version 1 (old):
┌────────┬────────┬──────────────────┬─────┐
│xmin=100│xmax=105│ status='pending' │ ... │
└────────┴────────┴──────────────────┴─────┘
         │
         └─ "deleted" by TX 105 (which created Version 2)

Version 2 (current):
┌────────┬────────┬──────────────────┬─────┐
│xmin=105│xmax=0  │ status='success' │ ... │
└────────┴────────┴──────────────────┴─────┘
         │
         └─ xmax=0 means "not deleted"

Which version does a transaction see?
  TX 102 (started before TX 105 committed) → sees Version 1
  TX 110 (started after TX 105 committed)  → sees Version 2

When my query runs, Postgres checks each candidate row: "Was xmin committed before my transaction started? Is xmax either 0 or not-yet-committed?"

This is MVCC — Multi-Version Concurrency Control. Multiple transactions can read different versions of the same row without blocking each other.

Hint Bits

Checking if a transaction is committed requires looking up the CLOG (commit log). That's expensive. So Postgres caches the result in "hint bits" in the row header itself.

First read of a row (hint bits not set):

  Row header                                   CLOG (on disk)
  ┌─────────────────────┐                      ┌──────────────────┐
  │ xmin=100            │──"TX 100 committed?" │ TX 99:  committed│
  │ xmax=0              │                      │ TX 100: committed│◄── look up
  │ hint_bits=0000      │─────────────────────►│ TX 101: aborted  │
  └─────────────────────┘                      └──────────────────┘
           │
           ▼ set hint bits
  ┌─────────────────────┐
  │ xmin=100            │
  │ xmax=0              │
  │ hint_bits=COMMITTED │◄── cached in row header
  └─────────────────────┘

Subsequent reads: check hint_bits directly, skip CLOG lookup.

First read: check CLOG, set hint bits in the page. This happens in shared_buffers — Postgres's shared page cache that all backend processes can access (each connection is a separate process, but they share this buffer pool). Setting hint bits marks the page "dirty". The background writer eventually flushes dirty pages to disk, persisting the hint bits.

On restart, Postgres's shared_buffers (the page cache) is empty. Pages must be re-read from disk. If hint bits were flushed before shutdown, they're still there. If not (e.g., crash, or page wasn't flushed yet), the first read must check CLOG again and re-set them.

This is why queries can be slightly slower right after a restart — not because hint bits are lost, but because the buffer cache is cold and pages that weren't recently flushed need their hint bits recalculated.

But wait, that's another shared resource! And indeed, shared memory in Postgres has had security implications. CVE-2021-32028 and CVE-2021-32029 allowed authenticated users to read "arbitrary bytes of server memory" through crafted INSERT ... ON CONFLICT or UPDATE ... RETURNING queries. That server memory includes shared_buffers — where other connections' data might be cached.

Not a direct "read another user's query results" attack, but memory disclosure from the shared buffer pool.

Index-Only Scans and the Visibility Map

If I have an index on updated_at, Postgres can potentially avoid reading the table entirely. But only if it knows all tuples on a page are visible to all transactions.

That's what the visibility map tracks: 2 bits per 8KB page indicating "all-visible" and "all-frozen".

Index Scan (must visit heap):

  Query: SELECT updated_at, status FROM run WHERE updated_at > $1

  Index (updated_at)            Heap (table)
  ┌─────────────────┐           ┌─────────────────────────────┐
  │ 2024-01-15 → p1 │──────────►│ Page 1: status='pending'    │
  │ 2024-01-16 → p1 │──────────►│         status='success'    │
  │ 2024-01-17 → p2 │──────────►│ Page 2: status='failed'     │
  └─────────────────┘           └─────────────────────────────┘
         │                               │
         └── has updated_at              └── status only in heap


Index-Only Scan (skip heap):

  Query: SELECT updated_at FROM run WHERE updated_at > $1

  Index (updated_at)        Visibility Map
  ┌─────────────────┐       ┌────────────────┐
  │ 2024-01-15 → p1 │──────►│ Page 1: visible│─► all visible, skip heap
  │ 2024-01-16 → p1 │       │ Page 2: visible│
  │ 2024-01-17 → p2 │──────►└────────────────┘
  └─────────────────┘
         │
         └── updated_at in index + all-visible = no heap access needed

Index-only scan works when (1) all columns you need are in the index, and (2) the visibility map confirms the page is all-visible (no MVCC check needed).

This is the difference between "index scan" and "index-only scan" in EXPLAIN ANALYZE. The latter is much faster — no random I/O to heap pages.

TLS 1.3: 0-RTT Resumption

When a browser reconnects to my SSE endpoint (after an isolate dies and the EventSource auto-reconnects), it might use 0-RTT resumption.

Normal TLS: 1 round trip to exchange keys before sending data. TLS 1.3 0-RTT: Send encrypted data with the first packet using a pre-shared key from the last connection.

Normal Resumption:
Client → Server: ClientHello + SessionTicket
Server → Client: ServerHello + Finished
Client → Server: Finished
Client → Server: GET /api/events      ← Data starts here

0-RTT Resumption:
Client → Server: ClientHello + EarlyData(GET /api/events)  ← Data in first packet!
Server → Client: ServerHello + Finished + Response

The catch: 0-RTT data can be replayed by an attacker. An adversary capturing your traffic can resend the encrypted 0-RTT data, and the server will process it again.

How does the browser decide when to use 0-RTT? It doesn't analyze your endpoint for idempotency — it uses HTTP method semantics. GET and HEAD are assumed safe because the HTTP spec says they SHOULD be idempotent. POST/PUT are not sent in 0-RTT. My SSE endpoint is a GET, so the browser will use 0-RTT if available.

This is a real trade-off. From Trail of Bits:

"A user previously logged into a system walks into a coffee shop, gets on WiFi, and initiates a buy or sell transaction. An adversary capturing traffic off the WiFi captures that request and sends the same request again to the server. Unlike TLS 1.2, this request is not rejected by the TLS layer."

Even with a short replay window, attackers can send millions of replays. Some mitigations:

  • Servers can reject 0-RTT entirely, or return 425 (Too Early) for specific requests (RFC 8470)
  • CDNs like Cloudflare only allow GET requests without query strings in 0-RTT
  • Go's TLS 1.3 implementation doesn't support 0-RTT at all, citing safety concerns

For my SSE endpoint, replay is harmless — you'd just get the same event stream twice. But if your "idempotent" GET actually triggers side effects (logging, rate limit counters, analytics), those would fire multiple times.

The V8 Sandbox

One more thing, because it's cool.

As mentioned before V8's heap is a 4GB "cage" — a contiguous memory region where all pointers are 32-bit offsets. But some objects (like ArrayBuffer backing stores) live outside the cage.

If V8 stored raw pointers to external memory, a heap corruption bug could let an attacker read arbitrary process memory. That's bad.

So V8 uses an External Pointer Table:

┌───────────────────────────────────────────────────────────┐
│  ArrayBuffer object (in V8 heap):                         │
│    backing_store_index: 42   ← NOT a raw pointer          │
└───────────────────────────────────────────────────────────┘
                    │
                    │ index lookup
                    ▼
┌───────────────────────────────────────────────────────────┐
│  External Pointer Table:                                  │
│    [42] = 0x7fff12340000 | ARRAY_BUFFER_TAG               │
└───────────────────────────────────────────────────────────┘

Even if an attacker corrupts the heap and changes index 42 to 9999, they can only access other entries in the External Pointer Table — not arbitrary memory.

From V8's sandbox blog:

"An attacker would need an additional vulnerability: a V8 Sandbox bypass."

Deno Deploy runs my code in this sandbox. The security isn't just "isolates are separate processes" — it's defense in depth all the way down to pointer representation.

Going Even Deeper: The Stuff That Made Me Say "Huh"

Everything above, I at least conceptually knew ~some of it. However the next threads required way more digging to make sense of it.

V8 Ignition: Before TurboFan Kicks In

My poll callback runs thousands of times over the life of a connection. But V8 doesn't immediately compile it to optimized machine code. First, it runs through Ignition, V8's bytecode interpreter.

Ignition is a register machine with an accumulator. When V8 parses my JavaScript, it generates bytecode like this:

[generating bytecode for function: formatSSEEvent]
Parameter count 2
Frame size 8
   0x2ddf8802cf6e @    StackCheck
   0x2ddf8802cf6f @    LdaSmi [1]
   0x2ddf8802cf71 @    Star r0
   0x2ddf8802cf73 @    LdaNamedProperty a0, [0], [4]
   0x2ddf8802cf77 @    Add r0, [6]
   0x2ddf8802cf7a @    Return

You can see this yourself with node --print-bytecode yourfile.js.

The instructions are terse:

  • LdaSmi [42] — Load Small Integer 42 into the accumulator
  • Star r0 — Store accumulator to register r0
  • LdaNamedProperty a0, [0], [4] — Load a property from argument 0
  • Return — Return what's in the accumulator

From V8's Ignition blog:

"Ignition compiles JavaScript functions to a concise bytecode, which is between 50% to 25% the size of the equivalent baseline machine code."

The bytecode handlers themselves are generated by TurboFan — V8's optimizing compiler generates the interpreter. It's compilers all the way down.

The accumulator is the key optimization. Most bytecodes implicitly use it, so you don't need to specify source/destination registers for every operation. This keeps bytecode compact.

Once a function gets "hot" (called enough times with consistent types), TurboFan compiles it to optimized machine code. But for my SSE handler, which runs once per connection, Ignition is probably where most execution happens.

How can I actually know for sure? V8 has tracing flags:

$ deno run --v8-flags=--trace-opt server.ts

I created a test with two functions — one called 5 times, one called 10,000 times:

function handleSSE(req) {
  return new Response("data: hello\n\n");
}

function hotLoop(n) {
  let sum = 0;
  for (let i = 0; i < n; i++) sum += i;
  return sum;
}

for (let i = 0; i < 5; i++) handleSSE({});
for (let i = 0; i < 10000; i++) hotLoop(100);

Output:

[marking 0x34e8b43d2311 <JSFunction hotLoop ...> for optimization to MAGLEV,
    ConcurrencyMode::kConcurrent, reason: hot and stable]
[compiling method 0x34e8b43d2311 <JSFunction hotLoop ...> (target MAGLEV), ...]
[completed compiling ... (target MAGLEV) - took 0.000, 20.921, 0.101 ms]
[compiling method ... (target TURBOFAN_JS) OSR, ...]

hotLoop gets optimized. handleSSE never appears — only called 5 times, not hot enough. Confirmed: my SSE handler stays in Ignition.

But wait — MAGLEV? I thought it was Ignition -> TurboFan. What's Maglev?

Maglev: The Middle Tier

Turns out V8 added a mid-tier compiler in 2023. The pipeline is now:

Ignition (interpreter) -> Maglev (fast compiler) -> TurboFan (optimizing compiler)

TurboFan produces excellent code but takes time to compile. How much time? The trace output includes timing:

[completed compiling ... <JSFunction compute ...> (target MAGLEV) - took 0.000, 2.711, 0.015 ms]
[completed compiling ... <JSFunction compute ...> (target TURBOFAN_JS) - took 0.008, 14.991, 0.011 ms]

Maglev: ~2.7ms. TurboFan: ~15ms. About 5-6x difference for the same function.

For functions that are warm but not "hot hot", Maglev hits a sweet spot — faster startup than waiting for TurboFan, better performance than staying in Ignition.

You can see this in the trace: hotLoop first gets compiled to MAGLEV, then later to TURBOFAN_JS (the "OSR" means on-stack replacement — upgrading a running function mid-execution).

Deoptimization: When Assumptions Break

You can also trace deoptimization — when V8 bails out of optimized code:

deno run --v8-flags=--trace-opt,--trace-deopt server.ts
function process(obj) {
  return obj.x + obj.y;
}

// Train with consistent shape
for (let i = 0; i < 10000; i++) process({ x: 1, y: 2 });

// Break the assumption
process({ y: 2, x: 1 }); // same props, different order = different hidden class!
[bailout (kind: deopt-eager, reason: wrong map): begin.
    deoptimizing ... <JSFunction process ...>]

"wrong map" = the object's hidden class didn't match what the optimized code expected. Same properties in different order means different hidden class, which means deoptimization. This is why consistent object shapes matter for performance.

TCP: What's Actually On the Wire

When my SSE event travels to the browser, here's what the TCP segment looks like:

TCP Header (20 bytes minimum):
┌─────────────────────────────────────────────────────────────────┐
│ Bytes 0-1:   Source Port        (e.g., 0x01BB = 443)            │
│ Bytes 2-3:   Destination Port   (e.g., 0xC5A0 = 50592)          │
│ Bytes 4-7:   Sequence Number    (e.g., 0x4A3B2C1D)              │
│ Bytes 8-11:  Acknowledgment     (e.g., 0x1D2C3B4A)              │
│ Byte 12:     Data Offset (4 bits) + Reserved (4 bits)           │
│ Byte 13:     Flags: [CWR ECE URG ACK PSH RST SYN FIN]           │
│ Bytes 14-15: Window Size        (e.g., 0xFFFF = 65535)          │
│ Bytes 16-17: Checksum                                           │
│ Bytes 18-19: Urgent Pointer                                     │
└─────────────────────────────────────────────────────────────────┘

The sequence number is a 32-bit counter that identifies where this segment's data fits in the overall byte stream. It wraps around at 2^32 (about 4 billion).

Here's what a capture looks like:

Packet 1 (SYN):           seq=0x4A3B2C1D, ack=0, flags=[SYN]
Packet 2 (SYN-ACK):       seq=0x1D2C3B4A, ack=0x4A3B2C1E, flags=[SYN,ACK]
Packet 3 (ACK):           seq=0x4A3B2C1E, ack=0x1D2C3B4B, flags=[ACK]
...
Packet N (SSE data):      seq=0x4A3B2D00, ack=0x1D2C3B4B, flags=[PSH,ACK]
                          data="event: run\ndata: {...}\n\n"
Packet N+1 (ACK):         seq=0x1D2C3B4B, ack=0x4A3B2D42, flags=[ACK]

The acknowledgment number says "I've received everything up to this byte, send me the next one." If the sender doesn't get an ACK, it retransmits.

The window size tells the sender "I have this many bytes of buffer space available." When my browser gets busy, the window shrinks. The server backs off. This is flow control, built into the protocol.

Wireshark shows relative sequence numbers by default (starting from 0), which is easier to read. The actual initial sequence numbers (ISNs) are not truly random — per RFC 6528, they're computed as:

ISN = M + F(localip, localport, remoteip, remoteport, secretkey)

Where M is a timer and F() is a cryptographic hash. This makes them unpredictable to off-path attackers (preventing TCP session hijacking) while being deterministic for the system generating them.

io_uring: The Future (That Deno Doesn't Use Yet)

Tokio uses epoll on Linux. But there's a newer, faster API: io_uring.

The difference is fundamental:

epoll (readiness-based):

  1. You tell the kernel: "Notify me when fd 42 is readable"
  2. Kernel says: "fd 42 is readable now"
  3. You call read(fd, buf, len) (syscall)
  4. Kernel copies data to your buffer
  5. Repeat

io_uring (completion-based):

  1. You put a read request in a shared ring buffer: "Read from fd 42 into this buffer"
  2. Kernel picks it up, does the read, puts result in completion ring
  3. You check completion ring — no syscall needed
  4. Data is already in your buffer

From Lord of the io_uring:

"Since bulk communication occurs through shared buffers, the huge performance overhead of copying data between kernel and user space is eliminated."

The architecture looks like this:

┌─────────────────────────────────────────────────────────────────┐
│                    User Space                                   │
│                                                                 │
│  ┌──────────────────┐              ┌──────────────────┐         │
│  │ Submission Queue │   mmap       │ Completion Queue │         │
│  │ (SQ)             │   shared     │ (CQ)             │         │
│  │                  │   memory     │                  │         │
│  │ [SQE][SQE][SQE]  │◄────────────►│ [CQE][CQE]       │         │
│  └────────┬─────────┘              └────────▲─────────┘         │
│           │ (write SQEs)                    │ (read CQEs)       │
├───────────┼─────────────────────────────────┼───────────────────┤
│           │           Kernel                │                   │
│           ▼                                 │                   │
│  ┌──────────────────────────────────────────┴─────────────────┐ │
│  │              io_uring Kernel Thread                        │ │
│  │    (polls SQ, executes I/O, fills CQ)                      │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

SQE = Submission Queue Entry. CQE = Completion Queue Entry. Each entry is one I/O operation.

With submission queue polling enabled, you can build a server that does zero syscalls per request after startup. The kernel thread watches the submission queue and processes requests without being asked.

Benchmarks show io_uring achieving 1.7M IOPS in polling mode vs ~608K for the older aio interface.

Tokio has a separate crate, tokio-uring, but it's not the default. From Tokio's announcement:

"With epoll, a tuned TCP proxy will spend 70% to 80% of CPU cycles outside of userspace, including cycles spent performing syscalls and copying data between the kernel and userspace."

Deno Deploy presumably still uses the standard Tokio runtime with epoll. For my SSE use case, it doesn't matter — I'm I/O-bound on Postgres latency, not syscall overhead. But for high-throughput proxies and databases, io_uring is a big deal.

PostgreSQL's Buffer Pool: Clock Sweep

When my query runs, Postgres doesn't read directly from disk. It goes through the buffer pool — a cache of 8KB pages in shared memory.

The size is controlled by shared_buffers (default is typically set to 25% of RAM).

But here's the interesting part: how does Postgres decide which pages to evict when the buffer is full?

It uses clock sweep, an approximation of LRU that's much cheaper to implement:

Buffer Pool (circular):
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│    ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐          │
│    │ 3 │ │ 0 │ │ 2 │ │ 1 │ │ 5 │ │ 0 │ │ 1 │ │ 0 │  ...     │
│    └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘          │
│      ▲                                                      │
│      │                                                      │
│    Clock hand (nextVictimBuffer)                            │
│                                                             │
│    Numbers = usage count (max 5)                            │
│    Clock hand sweeps, decrementing counts                   │
│    Evicts when it finds a 0                                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

The algorithm:

  1. Each buffer has a usage count (0-5)
  2. When a page is accessed, increment its count
  3. When we need a victim, sweep the clock hand around
  4. For each buffer: if count > 0, decrement and move on; if count = 0, evict it

From PostgreSQL buffer management:

"The eviction technique is based on the fact that for each access to a buffer, processes increment the usage count. Buffers that are used less often have a smaller count and are good candidates for eviction."

Why not true LRU? Because maintaining a strict LRU list requires updating the list on every access, which means lock contention. Clock sweep only needs to update a single counter, and the sweep itself is protected by a single spinlock.

The usage count cap of 5 prevents "hot" pages from accumulating infinite counts that would make them impossible to evict.

Note that Postgres also uses the OS page cache, so there's double caching. But the OS uses simple LRU, not clock sweep. Postgres's cache is optimized for database access patterns.

HTTP/3 and QUIC: The Future (Maybe)

I couldn't find definitive evidence that Deno Deploy supports HTTP/3 yet.

HTTP/3 runs over QUIC instead of TCP. QUIC is UDP-based with its own reliability layer, which means:

  • No head-of-line blocking: Lost packets don't stall unrelated streams
  • Faster connection setup: 0-RTT is built into the protocol, not bolted on
  • Connection migration: Change IP addresses without reconnecting

For SSE, HTTP/3 would mean my EventSource reconnections could be even faster. But since SSE is a long-lived connection anyway, the benefits are marginal.

The main blocker for HTTP/3 adoption: UDP on port 443 is blocked by some corporate firewalls. That's why browsers always have TCP fallback.

If Deno Deploy does support HTTP/3, they'd advertise it via the Alt-Svc header:

Alt-Svc: h3=":443"; ma=86400

This tells the browser "hey, you can also reach me via HTTP/3 on port 443, remember this for 24 hours."

Checking my own site:

$ curl -sI https://usepls.dev | grep -i "alt-svc\|server\|http"
HTTP/2 200
server: deployd
via: HTTP/2 ams.vultr.prod.deno-cluster.net

No Alt-Svc header. Deno Deploy is HTTP/2 only for now.

Down the Rabbit Hole

I started with a simple question: how does SSE work on serverless? The answer turned out to be "it depends on what you mean by work."

At the JavaScript level, it's a ReadableStream and setInterval. But that stream becomes V8 bytecode, which runs on Ignition, whose handlers are generated by TurboFan. The timer goes through Tokio's event loop, which calls epoll, which wakes on edge-triggered notifications from the kernel's network stack. Each layer I peeled back revealed another layer of engineering.

Some moments surprised me. Running --trace-opt and discovering Maglev — a compiler tier I didn't know existed. Asking "could shared_buffers leak data?" and finding CVE-2021-32028. Learning that { x: 1, y: 2 } and { y: 2, x: 1 } are different hidden classes. The 0-RTT replay attack surface. TCP sequence numbers being cryptographic hashes, not random values.

The recursive nature of it is humbling. V8's optimizing compiler compiles the interpreter. The kernel's page cache caches Postgres's buffer pool. HTTP/2 multiplexes streams that carry chunked transfers. It's turtles all the way down, and I've only seen a few of them.

The dashboard works. Runs update in real-time. And my brain can finally rest — the "okay, but why?" loop has run out of stack frames. For now.