History of the Internet: From ARPANET to the Modern Web

21 min read Original article ↗
12 min read · Capstone · Network

How it works · Capstone

The internet, fifty years of one packet at a time.

Type a URL, press enter. By the time the page is on screen, half a dozen protocols and a dozen pieces of infrastructure have done their part. This page walks every phase, and points back at the guide that picks it apart.

Parts01 – 08 InteractiveCold / warm waterfall Cross-linksDNS · TCP · HTTPS · HTTP · LB

How the Internet works. Nine phases, in order

Nine phases, in order.

The internet request lifecycle is the sequence of nine distinct phases that turn a click into a rendered page: address resolution (DNS, ARP), TCP handshake, TLS handshake, server-side processing (edge → load balancer → app → database), response transit, and browser rendering. Each phase has its own latency budget, and the slowest one dominates the perceived load time.

A page load is not one event. It is nine distinct phases, each with its own physics and its own bottleneck. Knowing them by name is the difference between "the site is slow" and "DNS is taking 200 ms because the resolver is in Frankfurt and the user is in Sydney."

Two of these phases happen before any packet leaves the device — the keypress walks the kernel, the URL turns into a record of intent. Five happen on the wire and are the ones every network engineer has spent a career picking apart. The last two happen after the bytes come home, and they are where most of the wall-clock time the user actually feels lives.

The numbers below are realistic warm-path values for a fast residential connection to a major site. Cold-path adds a DNS hop, a TCP handshake, and a TLS handshake. Typically 100–200 ms of preamble before the request even reaches the server.

  1. 00Keyboard interrupt → browser process~4 ms warm · 200–800 ms cold-launch
  2. 01URL parse + pre-connect cache~1 ms
  3. 02DNS resolution~1 ms warm · 20–60 ms cold
  4. 03TCP three-way handshake0 warm · 1 RTT cold (~24 ms fiber)
  5. 04TLS handshake0 warm · 1 RTT TLS 1.3 (~26 ms fiber)
  6. 05HTTP request to edge~1 ms over an open connection
  7. 06Server hop · edge → LB → app → DB~90 ms (30–500 ms in the wild)
  8. 07Response transit · TTFB → last byte~18 ms
  9. 08Browser render · DOM, CSSOM, layout, paint~240 ms

Render is the largest single bucket on most pages. The first seven phases combined often clock in under 200 ms; the eighth — what the browser has to do with the bytes — runs to seconds when the page is heavy. The cheapest performance optimisation is to send less. The second cheapest is to defer the non-critical bytes you do send.

The cold-start vs warm-cache waterfall

The waterfall, cold and warm.

Below: the same request, end-to-end. Toggle Cold vs Warm to see what connection reuse, DNS cache, and TLS resumption actually save you. Click a phase to jump; press Play to scrub.

Phase 01 of 09 · Keyboard → browser 4.0 ms · cold · HTTP/2 · Fiber

Before the network ever sees this request, the input has to walk a small operating-system stack. The keypress fires a hardware interrupt, the kernel routes it to the focused window, the browser's render process decodes it, an IPC message wakes the network process. On a warm browser this is a few milliseconds; on a cold-launch it is hundreds. Most timelines start at "URL parse" and skip this entirely — but it is the place latency budgets are most often spent without anyone noticing.

# Phase 0 — what happens before the URL is "parsed"
1. Keyboard scan-matrix encoder fires (<1 ms)
2. USB/Bluetooth HID interrupt → kernel
3. OS kernel routes to focused process (the browser)
4. Browser event loop wakes, decodes Enter
5. Render-process IPC → network process
6. The network process now begins URL parsing

# why it matters
# this is "free" only when the user already
# has a browser tab open. cold-start a browser
# from a desktop click and it costs 200–800 ms
# before a single packet has even been considered.

DNS. Cached or walked

DNS, cached or walked.

The browser knows the hostname; it needs an IP. Three layers of cache stand between the browser and the recursive resolver: an in-process cache, the OS resolver cache, and finally the recursive (1.1.1.1, 8.8.8.8, ISP). On a warm cache, this phase is half a millisecond. On a miss, the recursive walks root → TLD → authoritative and the latency depends on how close it is to those servers.

The full mechanics. Including anycast, glue records, and the message wire format. Live in the DNS guide. The point here is that this is one of the easiest phases to optimise: pre-resolve hostnames you will need, use a DNS provider with low p99 latency, and pay attention to TTL.

TCP — three packets, one round trip

TCP, three packets, one round trip.

A new TCP connection costs one round trip — SYN, SYN-ACK, ACK, the protocol defined in RFC 9293. The full breakdown, including initial sequence numbers and the window options that ride along, lives in the TCP guide. What matters here is connection reuse.

Browsers maintain a per-origin connection pool. HTTP/1.1 keep-alive, HTTP/2 multiplexing, and HTTP/3 connection migration all exist to avoid paying this cost more than once. If the page loads ten resources from the same origin, only the first pays for TCP setup. The other nine ride the open connection.

TLS. One round trip, sometimes zero

TLS, one round trip, sometimes zero.

TLS 1.3 brings up encryption in one round trip. With session resumption, the browser can encrypt the GET with a key derived from a previous session and send it alongside the ClientHello. Zero-RTT, the handshake and the request fly together.

The mechanics, including PKI, certificate transparency, and what changed from 1.2 to 1.3, live in the HTTPS guide. What the timeline cares about: this used to cost two round trips, and now it can cost zero.

The server side. Edge, load balancer, app, database

Edge, load balancer, app, database.

The request reaches a CDN edge first. If the response is in the edge cache, the answer comes back from there and the rest of the trip is short. See the CDN guide. If not, the edge forwards to the origin.

At the origin, a load balancer picks an app instance, often after a reverse proxy has already terminated TLS. The load-balancing guide covers algorithm choice. Round robin vs P2C, L4 vs L7, what to measure. The chosen instance opens database connections, runs queries (often a fan-out of several), maybe calls a downstream service or two, then renders a response.

This is the phase where most production p99 lives. Database queries on cold caches. Network calls between services. Hot keys hashing to one shard. Single slow nodes that poison the pool. If a page is mysteriously slow, this is where the answer almost always is. The caching and Redis guides cover the fixes.

Browser rendering — turning bytes into a page

The browser turns bytes into a page.

Bytes home. Now the browser parses HTML into a DOM, parses CSS into a CSSOM, combines them into a render tree, lays it out in real geometry, paints each layer, and composites on the GPU. Critical milestones along the way:

FCP

First Contentful Paint

The first text or image appears. Usually arrives shortly after the HTML response because the browser can render before all stylesheets and scripts are loaded — unless render-blocking resources stop it.

LCP

Largest Contentful Paint

The hero element — main image or headline — appears. Google's Web Vitals scores LCP because it correlates strongly with perceived load. Target ≤ 2.5 s on 75% of visits.

TBT / CLS

Total Blocking · Layout Shift

TBT measures long JavaScript tasks blocking the main thread. CLS measures whether the page jumps around as resources load. Both are about feel — they describe what the user notices.

What you can actually optimise in this chain

What you can actually optimise.

Most pages spend the majority of their wall-clock time in two phases: phase 06 (the server hop) and phase 07 (the render). Optimisation budget should follow the time — you cannot squeeze a noticeable win out of TLS 1.3 because there is barely any time there to find.

Server side

Cache, then squeeze the DB.

Edge caching for everything that can be public. App-tier cache for everything that can't. Indexed, narrow database queries. The indexing guide is the right starting place. N+1 queries, hot keys, slow downstream calls. Find the actual offender via tracing, not guesses.

Client side

Send less, defer the rest.

Compression (gzip → Brotli → Zstd), code-splitting, lazy-loading images and scripts below the fold, deferring non-critical JS. The cheapest win is shipping fewer bytes. The next is making sure the bytes you do ship aren't blocking the first paint.

Read the waterfall first

Chrome DevTools Performance and the Network panel both show the phases above for any real load. Read the waterfall before you change anything. The biggest bar is the right thing to attack; everything else is wishful thinking.

Before any packet leaves — DNS, ARP, routing

Before any packet leaves the device.

Every timeline diagram starts at "DNS lookup". The honest one starts a few dozen milliseconds earlier. At the keypress. The Enter key fires a hardware interrupt; the kernel routes it to the focused window; the browser's render process decodes the keystroke; an inter-process message wakes the network process. On a warm browser tab this is a few milliseconds. On a cold-launched browser, it can be hundreds.

Once the network process has the URL, it parses it into a record of intent. Scheme, host, path, query, fragment. Then a sequence of cheap, in-memory checks: is the origin in the HSTS preload list (refusing plaintext)? Is there a service worker registered for this scope (which can answer the request locally without ever hitting the network)? Is there an open connection in the pre-connect cache? On many requests, the answer ends here. The service worker hands back a cached body, or the pre-connect cache delivers a TCP socket that's already warm.

# What the browser does between keypress and first packet

1. HID interrupt          ~0.3 ms     keyboard scan-matrix → USB / Bluetooth
2. Kernel routes event    ~0.5 ms     focused-window dispatch
3. Browser IPC wake       ~1–4 ms     render process → network process
4. URL parse              ~0.1 ms     scheme / host / path / query
5. HSTS check             ~0.05 ms    in-memory hash lookup
6. Service-worker probe   ~1 ms       SW activation if registered
7. Pre-connect cache      ~0.05 ms    do we already have an open conn?
8. (now we are ready to send)

The cold-launch tax is the surprising one. Open a brand-new browser, click a saved bookmark — the OS has to load shared libraries, restore tabs, allocate the network process, set up the GPU process. Modern browsers do most of this in the background while you're typing the URL, but on a memory-constrained device it can add hundreds of milliseconds before the network ever sees the request. This is a real budget item the rest of the timeline almost never accounts for.

The optimisation handles here are limited but real. Pre-resolving DNS for likely-next origins (<link rel="dns-prefetch">), pre-opening connections (<link rel="preconnect">), and registering a service worker that can serve shell HTML offline. All of these collapse the early phases into roughly nothing on the second visit.

The HTTP request. Not just headers

The request, not just headers.

By the time the request leaves the browser, the network and TLS layers are doing their job invisibly. The only thing left to compose is the HTTP message itself. On HTTP/1.1 it's a plaintext envelope; on HTTP/2 it's a HEADERS frame on a fresh stream multiplexed onto the existing TCP connection; on HTTP/3 the same HEADERS frame rides on a QUIC stream over UDP. Same data, three transports.

The line and the headers are where every cross-cutting concern of the web ends up: the Host header tells the server which virtual host the request is for (necessary for any multi-tenant edge); the Cookie header is how stateless HTTP gets a session; Accept-Encoding lets the server pick a compression scheme; User-Agent lets the server pick a build target; Authorization or a session cookie carries identity. Each of these is a potential failure mode.

# Anatomy of a single HTTP/2 GET to a CDN-fronted origin

GET / HTTP/2
:authority      example.com
:scheme         https
:path           /

# what the browser computed and decided to send
host            example.com
user-agent      Mozilla/5.0 (Macintosh; Intel Mac OS X 14_4) ...
accept          text/html,application/xhtml+xml,*/*;q=0.8
accept-encoding gzip, br, zstd                  # compression we can decode
accept-language en-GB,en;q=0.9
cache-control   max-age=0                       # forced revalidation
sec-fetch-dest  document                        # browser policy hint
sec-fetch-site  none                            # this is a top-level nav
upgrade-insecure-requests 1

# the parts the user typed
cookie          session=abc123; theme=dark; csrf=...
if-none-match   "a1b2c3"                        # we have a cached copy
if-modified-since Wed, 30 Apr 2026 09:12:44 GMT

# 800 bytes typical, 4 KB if cookies are large
# fits in one TCP segment, arrives at the edge in <5 ms

Two practical truths hide in this envelope. First, cookies are bytes a session cookie plus an analytics cookie plus a feature-flag cookie can easily hit 4 KB on every request, and those bytes go on every single asset, including images. Origins that put cookies on the same hostname as their static assets pay this tax forever. Second, HTTP/2's HPACK and HTTP/3's QPACK compress repeated headers across requests on the same connection. Meaning the second request to an origin sends a fraction of what the first request sent.

The Semicolony's HTTP guide picks apart the headers worth knowing in detail, plus the differences between HTTP/1.1, HTTP/2, and HTTP/3. The point here is that everything between the user and the server. Auth, caching, content negotiation, browser policy. Has to fit in this one envelope. The envelope is small; it ends up being where surprising amounts of per-request budget get spent.

The HTTP response. Streamed, not delivered

The response, streamed, not delivered.

HTTP responses don't arrive at the browser in one piece. The server begins sending bytes the moment the first chunk is ready — TCP's send window lets the response stream while the next chunks are still being generated. The first byte the browser sees is the first one the server emitted; the headers and body flow as a stream gated by congestion control, the application's own buffering, and the bandwidth of the path.

Two metrics break out of this stream and matter. TTFB — Time To First Byte. Is the gap between request-sent and first-byte-received. It captures everything that happened between the edge and the application. Last byte is when the body has fully arrived; for a heavy page that depends on bandwidth, not latency. Compression matters here: a 200 KB HTML document compresses to roughly 40 KB with gzip, 32 KB with Brotli, and 28 KB with Zstd. Different orders-of-magnitude on a slow connection.

# What the response looks like as it arrives

HTTP/2 200 OK
server                       nginx/1.25
content-type                 text/html; charset=utf-8
content-encoding             br                       # Brotli compression
strict-transport-security    max-age=63072000; includeSubDomains; preload
cache-control                public, max-age=300, stale-while-revalidate=86400
etag                         "a1b2c3"                 # for next If-None-Match
content-security-policy      default-src 'self'; ...
link                         </app.css>; rel=preload; as=style
                             </app.js>;  rel=preload; as=script
server-timing                edge;dur=2, app;dur=68, db;dur=24

# the body streams in TCP-window-sized chunks (~16 KB each on a warm conn)
[ chunk 1: HTML <head> + above-the-fold body ]   ← FCP can fire here
[ chunk 2 ]
[ chunk 3 ]
...
[ last chunk: closing </body></html> ]            ← document complete

The link rel="preload" headers above are the response doing real work. They tell the browser before the HTML body even arrives which CSS and JS the page will need, so those resources can begin downloading in parallel. 103 Early Hints takes this further. The server can send a 103 status with link headers before the 200, so the browser starts pre-loading while the application is still computing the response. It is one of the most under-used wins in modern HTTP.

The server-timing header is the unsung hero of debugging slow pages. The server can attach phase timings. Edge, app, database. Directly to the response, and Chrome DevTools surfaces them in the Network panel. With server-timing on, "the page is slow" becomes "the database query took 240 ms" without leaving the browser. The Semicolony's caching and CDN guides cover the full menu of response-side optimisations.

By the numbers,
what each phase actually costs.

Performance work is mostly the discipline of refusing to optimise the wrong thing. The numbers below are realistic in 2026. Fast residential link, well-tuned origin, warm path unless noted. Read them before you change anything: if a phase here is already at the floor, it is not the place to look.

Network round-trip times (RTT) — single hop

NetworkMedian RTTNotes
Same-PoP fibre / data centre0.5–2 msintra-AZ traffic
Residential fibre8–15 mslast-mile + ISP backbone
Cable / DSL15–35 mslast-mile dominant
4G LTE35–60 msradio scheduler + jitter
5G mmWave / sub-615–30 msstill highly variable
Starlink (LEO sat)25–50 msorbital hop, weather-sensitive
Geostationary satellite600+ msphysics — 36 000 km up and down
Trans-Atlantic fibre~70 msNYC ↔ London, speed-of-light bound
Trans-Pacific fibre~140 msSFO ↔ Singapore

Per-phase cost on a warm fibre connection (10 ms RTT)

PhaseWarmCold
Keypress → browser process~4 ms200–800 ms cold-launch
URL parse + pre-checks~1 ms~1 ms
DNS resolution<1 ms (cached)20–60 ms (recursive)
TCP handshake0 (reuse)~10 ms (1 RTT)
TLS 1.3 handshake0 (resumption / 0-RTT)~10 ms (1 RTT)
Request to edge~5 ms~5 ms
Edge cache hit~2 ms~2 ms
Origin TTFB (well-tuned)30–100 ms100–500 ms (cold cache)
Body transfer (14 KB Brotli)~15 ms~15 ms
Browser render to FCP100–300 ms300–1500 ms

Core Web Vitals. Google's "good" thresholds (75th-percentile)

MetricGoodNeeds workPoor
LCP — Largest Contentful Paint≤ 2.5 s≤ 4.0 s> 4.0 s
INP. Interaction to Next Paint≤ 200 ms≤ 500 ms> 500 ms
CLS. Cumulative Layout Shift≤ 0.10≤ 0.25> 0.25
TTFB. Time To First Byte≤ 0.8 s≤ 1.8 s> 1.8 s

The Web Vitals targets are not arbitrary. Google's research correlates each threshold with measurable drop-off in user engagement, and Search Console will quietly demote origins that fail the 75th-percentile bar over rolling 28-day windows. The numbers are the floor, not the ceiling. Sites with strong engineering culture (Vercel, Linear, Stripe) typically run LCP under 1.0 s and INP under 100 ms.

Famous outages,
mapped to the timeline.

Every phase on this page has, in the last decade, taken down a meaningful chunk of the public internet for hours. Reading the post-mortems with the timeline in mind is the fastest way to develop intuition about where things actually go wrong. The list below is short and selective; the lessons under each are general.

  1. Phase 02 · DNS

    Facebook, October 2021. A routine BGP update accidentally withdrew the routes to Facebook's authoritative DNS servers. Resolvers worldwide could no longer answer for facebook.com; a six-hour outage took Facebook, Instagram, and WhatsApp off the internet, and locked engineers out of physical data-centre access tools that depended on the same DNS. Lesson: out-of-band recovery paths must not depend on the system you're trying to recover.

  2. Phase 03 · TCP / routing

    Cloudflare, June 2022. A change to a network-configuration template caused a subset of PoPs to stop announcing their IP prefixes. Traffic was either black-holed or mis-routed for ~75 minutes, taking down a long tail of customers. Lesson: configuration changes that touch the routing plane should ship through the same canary stages as code changes, and probably stricter ones.

  3. Phase 04 · TLS

    Microsoft Teams, February 2022. A TLS certificate used by the authentication path expired and was not renewed, so logins began failing globally. Recovery required deploying a new cert through a process that itself depended on Teams. Lesson: cert expiry is the most predictable preventable outage in the business. Automate renewal, monitor days-to-expiry as an SLO, never rely on a calendar reminder.

  4. Phase 06 · Origin

    AWS S3 us-east-1, February 2017. An engineer running a debugging command typed the wrong subset of servers; a much larger fraction of S3's index and placement subsystems were taken offline than intended. The recovery required restarting subsystems that had not been restarted in years; us-east-1 S3 was substantially degraded for four hours, dragging large parts of the web with it. Lesson: restart paths atrophy without exercise; the longer a system runs, the more dangerous "just bounce it" becomes.

  5. Phase 06 · Edge / CDN

    Fastly, June 2021. A customer pushed a configuration that exposed a latent bug in Fastly's edge software; the bug crashed every PoP, taking down the New York Times, Reddit, Twitch, the UK government, and more for about an hour. Lesson: customer-supplied configuration is hostile input; the edge has to sandbox it the same way a browser sandboxes JavaScript.

  6. Phase 08 · Render

    The infinite-spinner outage. Every site that has ever shipped a third-party script that hangs the main thread. The page is delivered, the bytes are home, but a render-blocking script never resolves and the user sees a spinner forever. Lesson: render-blocking third-party JavaScript is a tail-latency weapon pointed at your own users; defer or sandbox everything you do not own.

A pattern emerges from these. Almost none of them are the protocol failing. They are configuration changes, expired credentials, hostile input, and recovery paths that depended on the system being recovered. Reading the timeline well means knowing which phase you are looking at when something goes wrong, and which guide on this site explains the underlying mechanics.

Modern fast-loading patterns — what 2024 production sites actually do

Eight techniques that compound.

The current state of the art for shipping the fastest possible page:

HTTP/3 + 0-RTT
QUIC over UDP saves 1-2 RTT vs TCP+TLS for fresh connections. 0-RTT data on resumed connections eliminates the round-trip entirely.
Brotli compression
~20% smaller than gzip on text. Universally supported since 2017. Always on.
Image formats
AVIF (~50% smaller than JPEG) for photos, WebP fallback, SVG for icons. The <picture> element negotiates per-browser.
Resource hints
preload, preconnect, dns-prefetch tell the browser what's coming. Cuts critical-path latency by hundreds of ms when used right.
Server-side rendering + streaming
SSR with progressive HTML streaming (React 18, Astro, Solid Start) lets the browser start parsing the page before the server has finished generating it.
Edge functions
Run dynamic logic at the CDN edge (Cloudflare Workers, Vercel Edge, Netlify Edge). Eliminates the round-trip to origin for personalised pages.
Critical CSS inlined
Above-the-fold CSS in the HTML head; defer the rest. Cuts time-to-first-paint by ~50% on slow connections.
Long-cache static assets
Hash-named JS/CSS files cached for a year (Cache-Control: public, max-age=31536000, immutable). The browser never re-downloads them.

None of these alone is dramatic. Stacked, they take a 5-second cold load on a 4G connection down to ~1 second. Vercel, Cloudflare Pages, and Netlify make all eight defaults; rolling them yourself takes deliberate work.

Three quick checks before you close the tab.

Pick an answer for each. The right one reveals a short explanation; the wrong ones do too. There is no scoring — these are here so you can confirm the mental model travels with you when you go back to the editor.

Q1. A user has the page open and clicks an internal link. Which phases are skipped?

Q2. On a fiber link with 10 ms RTT, the cold-path TCP+TLS hit roughly equals…

Q3. Which phase typically dominates real-world page-load latency?

A closing note

This page is a capstone. Every phase here is one of the bespoke guides. TCP, HTTPS, DNS, HTTP, Load Balancing, CDN, Caching. All visible in their context. The goal of the Semicolony's how-it-works shelf is to make the entire stack legible. If you read this waterfall and know which guide each phase points at, you have a complete picture of the modern web request. Everything else is an alias for one of these phases.

Read
each phase.

Found this useful?