Production decisions need orders of magnitude, not vibes. This notebook collects published, citable measurements for the systems we describe elsewhere — memory hierarchy, B-tree depth, Bloom filter accuracy, compression speed, queueing latency, fan-out tails, write amplification.
I
Storage & data structures.
How fast is each layer, and where the structure pays off.
EXP 01
The memory hierarchy
Each level is roughly an order of magnitude slower.
Question
How long does it take to read one byte from each level of the memory hierarchy on a modern server?
Setup
Numbers are typical for an x86-64 cloud VM circa 2024 (Skylake-class CPU, 3 GHz, NVMe SSD, 25 Gbit network). Latency varies with hardware; ratios are stable.
Readings (ns)
L1 cache
1
~0.5 ns; predictable
L3 cache
12
Shared across cores
Main memory (RAM)
100
DRAM access
NVMe SSD (random read)
100,000
~100 µs
Same-DC network (10 GbE)
500,000
~500 µs round trip
Cross-region (US-EU)
100,000,000
~100 ms; speed of light
HDD (random seek)
10,000,000
~10 ms; rotational
log scale
Takeaway
Cache locality dominates algorithmic complexity for small n. A well-laid-out array beats a smart tree of pointers up to 10,000 elements because the array never misses cache; the tree pays L2 / L3 / RAM penalties on every pointer chase.
EXP 02
B-tree depth vs row count
Why your index never gets deep.
Question
How deep does a B+tree get for a billion-row index?
Setup
Each internal node holds 200 keys (typical for 4 KiB pages with 8-byte keys + 8-byte child pointers). Each leaf holds rows. Depth = ⌈log_b(n)⌉ + 1.
Readings (levels)
1 thousand rows
2
2 page reads
100 thousand rows
3
3 page reads
10 million rows
4
4 page reads
1 billion rows
5
5 page reads
100 billion rows
6
6 page reads
10 trillion rows
7
7 page reads
Takeaway
B-trees stay shallow because of the wide fan-out. A binary tree at 10 million rows is 24 levels deep; the B+tree is 4. That's the difference between 24 disk seeks and 4 — the order-of-magnitude that makes databases feasible.
EXP 03
Bloom filter false-positive rate
10 bits per element ≈ 1% false positive.
Question
Given m bits per element and k hash functions, what false-positive rate do you get?
Setup
Standard Bloom filter formula: FPR ≈ (1 − e^(−kn/m))^k, with optimal k = (m/n) ln 2. Numbers below assume optimal k.
Readings (% false positive)
4 bits/elt
14.6
optimal k=3
8 bits/elt
2.16
optimal k=6
10 bits/elt
0.96
optimal k=7 — the popular default
16 bits/elt
0.046
optimal k=11
24 bits/elt
0.001
optimal k=17
32 bits/elt
0
optimal k=22
log scale
Takeaway
Doubling the bits-per-element roughly squares the FPR. Cassandra and RocksDB use 10 bits/elt (~1% FPR) by default. Higher precision is rarely worth it: the cost of the rare false positive is one extra disk read.
EXP 03b
Hash table — collisions vs load factor
Past 0.85, every insert is a fight.
Question
How many probes does an open-addressing hash table take per lookup as load factor α rises?
Setup
Linear probing with random hashing. Expected probes per successful lookup ≈ ½(1 + 1/(1−α)). Numbers below are mean probes for a successful lookup.
Readings (probes)
α = 0.75
2.5
Java HashMap default
α = 0.85
3.8
starting to bite
α = 0.90
5.5
visible slowdown
α = 0.95
10.5
do not run here
Takeaway
Resize before α crosses 0.75. Open-addressing tables go unstable in their tail, not their mean — at α = 0.9 the p99 of a probe sequence is well into the dozens. Robin-hood hashing softens the tail; it does not move the wall.
EXP 03c
LSM read amplification by levels
Bloom filters are why LSMs are usable.
Question
How many SSTables does a point lookup touch in an LSM tree, with and without bloom filters?
Setup
Worst case: N levels means N SSTable lookups. With bloom filters at FPR p, expected disk reads ≈ p × (N − 1) + 1.
Readings (avg disk reads)
4 levels · no bloom
4
always touches all
4 levels · bloom 1%
1.03
almost always one read
7 levels · no bloom
7
classic RocksDB shape
7 levels · bloom 1%
1.06
still ~one read
10 levels · bloom 1%
1.09
why bloom matters
10 levels · bloom 0.1%
1.009
diminishing returns
Takeaway
Bloom filters are not optional for LSM point reads. They convert what would be 7-10× read amplification into ~1 disk read on the average path, at the cost of ~12 bits/key in RAM. RocksDB enables them by default for exactly this reason.
II
Compression.
Speed-vs-size, with real bytes.
EXP 04
gzip vs brotli vs zstd
Compression ratio vs encode/decode speed.
Question
How much does each modern compressor reduce a 1 MB JSON document, and at what speed?
Setup
Squash benchmark v0.7, x86-64, default level for each. JSON corpus from the Squash dataset. Numbers are MB/s for compress and decompress.
Readings (mixed)
gzip · decompress
360
MB/s
brotli · ratio
6.5
× — best for static assets
brotli · compress
0.9
MB/s — slow at default
brotli · decompress
380
MB/s
zstd · ratio
5.7
× — at default
zstd · compress
530
MB/s — fastest mainstream encode
zstd · decompress
1,500
MB/s — fastest decode
Takeaway
For real-time encode (logs, RPC bodies), zstd is the clear winner — same ratio as gzip, 5x encode speed, 4x decode. Brotli only wins for static web assets where you encode once and decode billions of times.
EXP 05
JSON vs Protobuf wire size
A 25–40% reduction, give or take.
Question
How much smaller is a Protobuf message vs the equivalent JSON for typical record shapes?
Setup
Compared on five common shapes: a payments record, an analytics event, a user profile, a search hit, a config blob. Sizes are uncompressed.
Readings (% of JSON)
Payments record
38
% of JSON size
Takeaway
25–45% reduction is typical. Small numeric values benefit most (varint encoding); deeply nested optional fields benefit less. After gzip, the gap narrows considerably — gzipped JSON is often within 10% of gzipped Protobuf. Protobuf still wins on parse cost and schema discipline.
EXP 05b
Columnar compression on time-series
A row-store loses to Parquet by 10×.
Question
How much smaller is the same data in a columnar format with appropriate encodings?
Setup
Same 1 GB time-series table (timestamp, sensor_id, value, tag). Stored in CSV, Parquet+Snappy (default), Parquet+ZSTD, ORC. ClickBench-style measurements.
Readings (MB on disk)
CSV · uncompressed
1,024
MB · baseline
CSV · gzip
280
MB · 3.7× smaller
Parquet · Snappy
110
MB · 9.3× smaller
Parquet · ZSTD
78
MB · 13× smaller
ORC · ZSTD
71
MB · best columnar
Takeaway
Columnar formats win by storing each column contiguously and applying type-aware encodings (RLE, delta, dictionary) before the entropy coder runs. The 10× edge over compressed CSV is why every analytics warehouse — BigQuery, Snowflake, ClickHouse, Redshift — is column-oriented.
EXP 05c
JSON encoding overhead per row
Field names cost more than values.
Question
How much of a JSON payload is structural overhead (keys, braces, quotes, commas) for typical small rows?
Setup
Five representative rows of varying field count and value type. "Value bytes" is the sum of value lengths; "structural" is everything else.
Readings (% structural)
3 fields · short str
64
% structural
10 fields · short str
58
%
Takeaway
A "JSON event" is 50-70% braces and key names. This is why MessagePack, CBOR, Avro and Protobuf can shrink small payloads dramatically — the field tag is a single byte, not a quoted string. Compression flattens this advantage; uncompressed wires do not.
III
Concurrency & tail latency.
When parallelism stops paying.
EXP 06
M/M/1 queue: latency at high utilisation
You cannot run a server at 99% utilisation.
Question
How does mean wait time grow as utilisation ρ approaches 1?
Setup
Standard M/M/1 result: mean wait W = (1/μ) × ρ/(1−ρ). Below: ρ vs the multiplier on baseline service time.
Readings (× service time)
ρ = 0.50
1
1× service time wait
ρ = 0.99
99
99× — and growing
Takeaway
Aim for 60–70% utilisation for latency-sensitive services. The (1−ρ) denominator makes the curve hyperbolic past 0.9. A "well-utilised" 95% server is in the latency death zone.
EXP 07
Tail latency for fan-out queries
p99 of one becomes p50 of 100.
Question
If you fan out a query to N parallel workers and wait for all, how does the system's p99 latency change?
Setup
Each worker has independent latency with a heavy-tailed distribution (Weibull-like, p99 ≈ 5× p50). System latency is the max of N samples.
Readings (× single-worker p50)
N = 10
9
9× — p99 of one is now p50 of 10
Takeaway
In fan-out, the tail dominates. Mitigations from Dean & Barroso 2013: hedged requests (fire two, take the first), bounded latency (timeout and partial result), tied requests (cancel the slow one when the first returns). Without them, p99 of the system explodes.
EXP 07b
Amdahl's law
5% serial means a hard ceiling at 20×.
Question
If a fraction s of work is inherently serial, how much speedup can N parallel cores give you?
Setup
Amdahl 1967: speedup = 1 / (s + (1−s)/N). Below: speedup at N = 16, 64, 256, ∞ for typical serial fractions.
Readings (× speedup)
s = 1% · N=16
13.9
~87% efficiency
s = 1% · N=∞
100
hard ceiling
s = 5% · N=∞
20
ceiling at 20×
s = 10% · N=∞
10
no point past 32 cores
s = 25% · N=∞
4
parallel barely helps
Takeaway
Adding cores past Amdahl's ceiling only burns money. The bigger the system gets, the more important the serial 1-5% becomes — locks, cross-shard coordination, leader-only operations. The optimisation that pays at scale is removing the serial fraction, not adding cores.
EXP 07c
False sharing — cache line contention
Two cores writing 4 bytes apart, 100× slowdown.
Question
How much does false sharing cost when two cores update independent variables that happen to share a 64-byte cache line?
Setup
Microbenchmark: each core does 10⁸ atomic increments on its own counter. "Same cache line" places counters within 64 bytes; "padded" pads each counter to its own line.
Readings (normalised throughput)
1 core
1
baseline · ~10⁸ ops/s
2 cores · padded
1
no sharing — perfect scale
2 cores · same line
0.04
25× slowdown · ping-pong
4 cores · same line
0.012
~80× slowdown
8 cores · same line
0.005
~200× slowdown
Takeaway
Pad hot per-thread state to a full cache line (64 B on x86, 128 B on Apple silicon). Java has @Contended, C++ has alignas(64) std::hardware_destructive_interference_size. The cost of getting it wrong is invisible on a single core and catastrophic at scale.
IV
Storage amplification.
What you write isn't what hits disk.
EXP 08
LSM-tree write amplification
Flushed once, compacted many times.
Question
How many bytes hit disk per byte logically written, by compaction strategy?
Setup
RocksDB with default level-style compaction, ~10 levels, 10× growth factor. Numbers from Facebook's 2016 RocksDB paper and follow-ups.
Readings (× write amp)
Level (default)
22
each byte rewritten ~22× across compactions
Universal (size-tiered)
8
fewer rewrites, more space amp
FIFO
1
no compaction, oldest segment dropped
B-tree (Postgres)
2
one page write + one WAL write
Takeaway
Choosing between B-tree and LSM is choosing what to amplify. B-tree: low write amp, higher random I/O. LSM: high write amp, sequential I/O — perfect for SSDs. Universal compaction (used by ScyllaDB) trades space for write amp.
EXP 08b
Replication factor — bytes vs durability
Eleven nines costs less than you think.
Question
How does storage cost change as you add replicas or switch to erasure coding?
Setup
Per-byte storage overhead for each scheme. Durability target: probability the data is lost in a year given typical disk AFR ~1%.
Readings (% of raw bytes)
Single copy
100
% · ~1% loss/yr · don't
2× replication
200
% · ~10⁻⁴ loss/yr
3× replication
300
% · ~10⁻⁶ loss/yr · classic HDFS
EC 6+3 (Reed-Solomon)
150
% · ~10⁻¹¹ · S3 / Hadoop EC
EC 10+4
140
% · ~10⁻¹³ · cold tier
Takeaway
Erasure coding gives more durability than 3× replication at half the storage cost. The trade is CPU on writes (encoding) and slower repair (decode N shards to rebuild one). For warm/cold tiers EC dominates; for hot reads 3× replication still wins on tail latency.
EXP 08c
SSD endurance — DWPD vs lifetime
Write amp eats your warranty.
Question
Given a drive's DWPD (drive writes per day) rating, how does database write amplification translate into lifetime years?
Setup
Modern enterprise SSD: 1.92 TB capacity, 1 DWPD over 5-year warranty. "Effective DWPD" = host writes ÷ capacity, with the database's write-amp factor.
Readings (years to wear out)
Postgres B-tree · 2× WA
12.5
years · comfortable
MyRocks LSM · 8× WA
3.1
years · borderline
RocksDB level · 22× WA
1.1
years · DWPD limited
Cassandra LCS · 30× WA
0.83
years · need 3 DWPD drive
Idle replica · 0.1× WA
250
years · NAND cell decays first
Takeaway
On write-heavy LSM workloads, drive endurance — not capacity — sizes the cluster. Buy 3 DWPD drives or pre-shrink write amp with universal compaction. The hidden cost of "we're replacing drives every year" is one of the strongest arguments for tuning compaction.
V
Network.
Where the speed of light is the actual limit.
EXP 09
TCP throughput vs RTT (BDP)
A 100ms link with a 64KB window can do 5 Mbps. That's it.
Question
For a TCP connection with window W and round-trip time R, what is the maximum throughput?
Setup
Throughput ≤ W / R. Below: max throughput at common RTTs for typical windows. The default Linux TCP receive window is auto-tuned up to 6 MB.
Readings (Mbps)
64 KB window · 1 ms
524
Mbps · same rack
64 KB window · 10 ms
52
Mbps · same region
64 KB window · 100 ms
5.2
Mbps · cross-region — fixed window kills you
6 MB window · 100 ms
480
Mbps · auto-tuned
6 MB window · 200 ms
240
Mbps · transpacific
BBR · 200 ms
950
Mbps · congestion-based
Takeaway
Throughput on long-fat pipes is window-bound, not bandwidth-bound. Confirm receive-buffer auto-tuning is on (it is by default on Linux ≥ 2.6.17) and consider BBR for high-RTT high-bandwidth paths. The classic "my 1 Gbps link only does 50 Mbps" story is almost always a window problem.
EXP 10
TLS handshake cost
TLS 1.3 cut the cold handshake in half.
Question
How much does each TLS variant cost in round trips and milliseconds on a 50 ms RTT link?
Setup
Round trips before first application byte. ms = RTTs × 50 ms (the dominant cost). CPU cost not included.
Readings (ms before first byte)
TLS 1.2 · cold
100
ms · 2 RTTs
TLS 1.2 · session resume
50
ms · 1 RTT
TLS 1.3 · cold
50
ms · 1 RTT
TLS 1.3 · 0-RTT resume
0
ms · piggybacked
QUIC · cold (HTTP/3)
50
ms · 1 RTT incl. transport
QUIC · 0-RTT resume
0
ms · single packet
Takeaway
Default to TLS 1.3 everywhere; the cold handshake savings alone are worth the migration. QUIC (HTTP/3) folds the transport handshake into the TLS handshake — one round trip instead of two — which is why CDN providers default to it on lossy networks.
EXP 10b
HTTP versions — concurrent streams
HTTP/2 turned 6 connections into one.
Question
How many concurrent in-flight requests can a single client maintain?
Setup
Browser-default behaviour against a single origin. Numbers reflect Chrome / Firefox 2024 settings.
Readings (concurrent streams)
HTTP/1.1 · pipelined
0
effectively zero — disabled in browsers
HTTP/1.1 · parallel
6
connections per origin
HTTP/2 · single conn
100
streams default · MAX_CONCURRENT_STREAMS
HTTP/2 · server-push
100
optional, deprecated by Chrome
HTTP/3 · QUIC streams
100
no head-of-line blocking on loss
Takeaway
HTTP/2 multiplexing eliminated the "six connections per origin" hack browsers used for a decade. HTTP/3 fixes the remaining issue: a packet loss in one HTTP/2 stream blocked all the others because TCP delivered bytes in order. QUIC streams are independent.
EXP 10c
Cross-region round trips
Speed of light in fibre is ~200,000 km/s.
Question
What round-trip latency floor does the speed of light impose between regions?
Setup
Great-circle distance × 2 ÷ 200,000 km/s gives the theoretical floor. Real RTT is typically 1.3-1.5× this due to non-direct fibre paths.
Readings (ms RTT)
us-east ↔ us-east-2 (Ohio)
12
ms · floor 8 ms
us-east ↔ us-west
60
ms · floor 40 ms
us-east ↔ eu-west (Ireland)
75
ms · transatlantic
us-east ↔ ap-northeast (Tokyo)
150
ms · transpacific
eu-west ↔ ap-south (Mumbai)
120
ms · backbone limited
us-east ↔ ap-southeast (Sydney)
200
ms · half the globe
Takeaway
No software optimisation can beat physics. A user in Sydney hitting a US-east API will see at least 200 ms before any code runs. Multi-region strategies — read replicas, edge compute, anycast — exist because of this floor, not despite it.
VI
Caching.
Hit ratio is the bill payer.
EXP 11
Zipfian hit ratio vs cache size
A small cache catches a huge fraction of traffic.
Question
Under a Zipf-distributed access pattern, what hit ratio does a cache of size k achieve out of N keys?
Setup
Zipf with α = 1.0 (typical for web traffic). Hit ratio = sum of probability mass for the top-k keys, ≈ H(k)/H(N) where H is the harmonic number.
Readings (% hit)
0.1% of keyspace
27
% hit · the long tail dominates here
1% of keyspace
51
% hit · half of traffic
Takeaway
A cache holding 1% of the keyspace catches half the traffic for typical Zipf-distributed access. This is why CDNs work — most users want a small popular subset. The long tail is real, but it is not the bill.
EXP 12
LRU vs LFU vs W-TinyLFU on a real trace
Modern admission policies beat both classics.
Question
On a real workload, what hit ratio does each eviction policy achieve at the same cache size?
Setup
ARC and W-TinyLFU benchmarks from the Caffeine project, search trace, cache size = 1% of unique keys.
Readings (% hit)
LRU
47.1
% hit · the baseline
LFU
50.4
% hit · pure frequency
ARC
53.6
% hit · adaptive replacement
W-TinyLFU
57.8
% hit · admission + LRU
Belady (oracle)
62.2
% hit · upper bound
Takeaway
Modern admission policies (W-TinyLFU, used by Caffeine) close ~70% of the gap between LRU and the optimal Belady oracle. Free wins for any cache that's big enough to matter. The implementation tradeoff is some metadata per entry — usually 2-4 bits.
EXP 12b
CDN tiered cache — origin offload
Two tiers cut origin requests to 1%.
Question
How does adding a regional cache between edge and origin change the origin request rate?
Setup
Each layer's hit ratio multiplies. Cloudflare Tiered Cache and similar architectures from Fastly, AWS CloudFront.
Readings (% origin load)
No CDN
100
% of requests reach origin
Edge only · 90% hit
10
% · classic CDN
Edge + regional · 95% × 80%
1
% · two-tier hit
Three-tier · 95% × 85% × 80%
0.15
% · super-PoP shielded
Takeaway
Tiered caching is multiplicative. The edge catches the easy 90%; the regional catches half of what the edge missed; the result is a 100× origin offload. CDN providers price tiered cache as a premium feature for exactly this reason — it is the largest single cost lever for high-traffic sites.
VII
Distributed systems.
Coordination is the cost.
EXP 13
Consensus round-trip cost
A Paxos write is at least one cross-AZ RTT.
Question
How much latency does a quorum write add over a single-node write?
Setup
Paxos / Raft both require a round trip from leader to a majority of followers. Numbers below assume 2 ms intra-AZ, 4 ms cross-AZ in a 3-node, 3-AZ cluster.
Readings (ms write latency)
Single-node write
2
ms · fsync
Quorum write · 3 AZs
6
ms · leader + 1 follower
Quorum write · cross-region
80
ms · multi-region Spanner-style
Two-region · 1 follower far
80
ms · still 80, no win
Spanner global · TrueTime
10
ms · uncertainty bound
Takeaway
Strong consistency costs at least one cross-zone RTT per write. Multi-region strong consistency costs at least one cross-region RTT — typically 80-200 ms. Designs like Spanner buy back some of this with TrueTime; most systems accept the cost or relax to eventual consistency.
EXP 13b
Replication lag at saturation
Lag grows hyperbolically as primary fills up.
Question
How does async replication lag behave as the primary approaches its write capacity?
Setup
Same M/M/1 dynamics as exp 06, applied to the WAL streaming pipeline. Lag scales as ρ/(1−ρ) once the replica is the bottleneck.
Readings (s lag)
ρ = 0.30
0.05
s · imperceptible
ρ = 0.80
0.6
s · still acceptable
ρ = 0.90
1.5
s · alerting territory
ρ = 0.95
4.5
s · users notice stale reads
ρ = 0.99
30
s · pager goes off
Takeaway
Replication lag is a leading indicator of primary saturation. Alert at 5 seconds; page at 30. The fix is almost never "make replication faster" — it is "shed load from the primary" (cache, batch, async writes).
EXP 13c
Clock skew — NTP vs PTP vs TrueTime
NTP is good to milliseconds. TrueTime needs atomic clocks.
Question
How well-synchronised are clocks across a fleet?
Setup
Typical observed skew between independent servers, by sync method.
Readings (ms skew)
No sync
60,000
ms · drift over a day
NTP · LAN
1
ms · with chronyd
PTP (IEEE 1588)
0.001
ms · sub-microsecond on hardware
Spanner TrueTime
0.005
ms uncertainty bound · GPS+atomic
Takeaway
Trusting wall-clock time across servers is a bug. Use logical clocks (Lamport, vector) or hybrid logical clocks (Cockroach) for ordering. If you must use wall-clock, allow at least 30 ms of skew, mark events as "happened within a window," and never rely on millisecond-precise causality.
EXP 13d
SLO error budget — minutes per quarter
Three nines is 7 hours. Five nines is 26 seconds.
Question
What downtime budget does each common SLO actually allow?
Setup
Allowed downtime per quarter (90 days) for each availability target.
Readings (min/quarter)
99% (two nines)
21,600
min / quarter — 15 days
99.9% (three nines)
130
min / quarter — 2.2 hrs
99.99% (four nines)
13
min / quarter — one bad deploy
99.999% (five nines)
1.3
min / quarter — 78 seconds
99.9999% (six nines)
0.13
min / quarter — needs fault-tolerance, not luck
Takeaway
Each extra nine costs roughly 10× the engineering. Pick SLOs by user impact, not aspiration. Most consumer products live happily at 99.9%; payments and healthcare push for 99.99%; only telcos and stock exchanges genuinely need five-nines architectures.
VIII
Observability.
What you can see vs what you missed.
EXP 14
Cardinality blowup in metrics
A label per user_id, a Prometheus on fire.
Question
How many unique time-series do common label-set patterns create?
Setup
Each combination of label values creates a distinct time-series. Prometheus / Mimir scrape memory grows roughly linearly in series count.
Readings (distinct time-series)
method × status (3×6)
18
tiny · sane
+ endpoint (3×6×30)
540
still fine
+ service (3×6×30×100)
54,000
~50k · OK on a single node
+ pod_id (×500)
27,000,000
27M · Prometheus dies
+ user_id (×1M)
27,000,000,000
27 billion · don't
log scale
Takeaway
Label values must come from a closed, small set. user_id, request_id, session_id are forbidden as labels — those belong in logs or traces, never metrics. Prometheus reference: target ≤ 10M series total per instance.
EXP 15
Log volume vs cost
A debug log left on costs more than the engineer who left it.
Question
What does logging cost in commercial SaaS observability platforms?
Setup
Datadog Logs at standard pricing (mid-2024). Average log line ~250 bytes. 30-day retention. Volumes per service per day.
Readings ($ / month)
1 GB/day · 30 days
18
$/month — small service
10 GB/day · 30 days
180
$/month
100 GB/day · 30 days
1,800
$/month — typical mid-size
1 TB/day · 30 days
18,000
$/month — annual ~$220k
10 TB/day · 30 days
180,000
$/month — find a cheaper plan
Takeaway
Sample. Tier hot vs cold logs. Move debug-level off the hot path entirely. The "log everything, forever" default at scale is a six-figure mistake. Self-hosted alternatives (Loki, Vector + S3) cost 20-50× less but require operational effort.
EXP 16
Trace sampling — what 1% catches
Head-sampling ~1% misses the worst traces.
Question
For a service handling N requests, what fraction of "interesting" traces does each sampling strategy keep?
Setup
Imagine a 1M-request hour, with 1% being "slow" (>p99) and 0.1% being "errored". Sampling fractions kept by each strategy.
Readings (% of total traces)
Head sample 1%
1
% kept · independent of trace outcome
Head sample 0.1%
0.1
% kept · loses most errors
Tail sample errors-only
0.1
% kept · 100% of errors retained
Tail sample slow + errors
1.1
% kept · 100% of interesting traces
Adaptive (Honeycomb-style)
0.5
% kept · keeps representative + interesting
Takeaway
Head-based sampling is cheap but loses tail and error events. Tail-based sampling sees the whole trace before deciding to keep it — much more useful but requires a buffering layer. Production traces ship with adaptive: keep all errors, all slow, plus a fixed % of fast happy-path.
IX
Security parameters.
How long until brute force wins.
EXP 17
Password entropy vs cracking time
A 10-char password without a passphrase is sand.
Question
For a password of given alphabet and length, how long does brute force take at 1 trillion guesses/second?
Setup
Modern hashcat on a single 8×RTX-4090 rig achieves ~10¹² SHA-256 guesses/second (~10⁹ for bcrypt at cost 12). Time to exhaust the keyspace, in years.
Readings (years to exhaust)
8-char alphanumeric (62^8)
0
years · ~3.5 minutes — broken
8-char with symbols (94^8)
0.002
years · ~17 hours
12-char alphanumeric
110
years
12-char with symbols
14,000
years
4-word diceware (~52 bits)
143
years
6-word diceware (~78 bits)
9,000,000
years
log scale
Takeaway
Length beats character variety: a four-word passphrase beats a "Tr0ub@dor" type 10-char password by orders of magnitude. Use a password manager, generate 16+ chars or 5+ diceware words. Use bcrypt / Argon2 with cost ≥ 12 to slow the attacker by ~1000×.
EXP 18
TLS handshake — cert chain size
A long chain costs an extra round trip.
Question
How many bytes does each TLS cert-chain configuration add to the first round trip?
Setup
Server certificate plus its chain in the ServerHello. RSA-2048 cert ≈ 1.5 KB; ECDSA P-256 cert ≈ 0.5 KB. Initial congestion window is typically 10 segments × 1460 bytes ≈ 14.6 KB.
Readings (KB cert + chain)
ECDSA leaf only
0.5
KB · single cert
ECDSA leaf + 1 intermediate
1
KB · clean
RSA leaf only
1.5
KB · still 1 round-trip
RSA leaf + 2 intermediates
4.5
KB · 1 RT
RSA leaf + 4 intermediates
7.5
KB · still 1 RT
RSA + many intermediates (>14 KB)
16
KB · 2 round-trips · slow start kicks in
Takeaway
A bloated cert chain crosses the initcwnd boundary and triggers an extra round trip on the very first connection. Use ECDSA where possible (5× smaller). Strip unnecessary intermediates from your chain. Test with ssllabs.com.
Adjacent
From the numbers, to the systems.
Every measurement here corresponds to a guide, a simulator, or a foundation entry that explains the underlying mechanism. Read alongside.