Writergate by andrewrk · Pull Request #24329 · ziglang/zig

13 min read Original article ↗

Previous Scandal

Summary

Deprecates all existing std.io readers and writers in favor of the newly provided std.Io.Reader and std.Io.Writer which are non-generic and have the buffer above the vtable - in other words the buffer is in the interface, not the implementation. This means that although Reader and Writer are no longer generic, they are still transparent to optimization; all of the interface functions have a concrete hot path operating on the buffer, and only make vtable calls when the buffer is full.

I have a lot more changes to upstream but it was taking too long to finish them so I decided to do it more piecemeal. Therefore, I opened this tiny baby PR to get things started.

These changes are extremely breaking. I am sorry for that, but I have carefully examined the situation and acquired confidence that this is the direction that Zig needs to go. I hope you will strap in your seatbelt and come along for the ride; it will be worth it.

The breakage in this first PR mainly has to do with formatted printing.

More Detailed Motivation

I wrote this up for ziggit but I think it would be good to include in release notes:

  • The old interface was generic, poisoning structs that contain them and forcing all functions to be generic as well with anytype. The new interface is concrete.
    • Bonus: the concreteness removes temptation to make APIs operate directly on networking streams, file handles, or memory buffers, giving us a more reusable body of code. For example, http.Server after the change no longer depends on std.net - it operates only on streams now.
  • The old interface passed errors through rather than defining its own set of error codes. This made errors in streams about as useful as anyerror. The new interface carefully defines precise error sets for each function with actionable meaning.
  • The new interface has the buffer in the interface, rather than as a separate "BufferedReader" / "BufferedWriter" abstraction. This is more optimizer friendly, particularly for debug mode.
  • The new interface supports high level concepts such as vectors, splatting, and direct file-to-file transfer, which can propagate through an entire graph of readers and writers, reducing syscall overhead, memory bandwidth, and CPU usage.
  • The new interface has "peek" functionality - a buffer awareness that offers API convenience for the user as well as simplicity for the implementation.

Performance Data

Building Self-Hosted Compiler with Itself

Benchmark 1 (3 runs): master/fast/bin/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          12.4s  ± 49.3ms    12.3s  … 12.4s           0 ( 0%)        0%
  peak_rss           1.03GB ± 4.67MB    1.02GB … 1.03GB          0 ( 0%)        0%
  cpu_cycles          105G  ±  323M      105G  …  105G           0 ( 0%)        0%
  instructions        207G  ± 4.41M      207G  …  207G           0 ( 0%)        0%
  cache_references   6.62G  ± 23.9M     6.60G  … 6.64G           0 ( 0%)        0%
  cache_misses        449M  ± 3.17M      447M  …  453M           0 ( 0%)        0%
  branch_misses       411M  ± 1.62M      409M  …  412M           0 ( 0%)        0%
Benchmark 2 (3 runs): writergate/fast/bin/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.6s  ± 28.6ms    10.5s  … 10.6s           0 ( 0%)        ⚡- 14.6% ±  0.7%
  peak_rss           1.14GB ± 5.26MB    1.14GB … 1.15GB          0 ( 0%)        💩+ 10.8% ±  1.1%
  cpu_cycles         95.0G  ± 19.8M     95.0G  … 95.1G           0 ( 0%)        ⚡-  9.6% ±  0.5%
  instructions        191G  ± 2.22M      191G  …  191G           0 ( 0%)        ⚡-  7.8% ±  0.0%
  cache_references   5.68G  ± 13.9M     5.66G  … 5.69G           0 ( 0%)        ⚡- 14.2% ±  0.7%
  cache_misses        386M  ± 2.47M      384M  …  388M           0 ( 0%)        ⚡- 14.2% ±  1.4%
  branch_misses       400M  ±  516K      400M  …  401M           0 ( 0%)        ⚡-  2.6% ±  0.7%

Building My Music Player Project

source

Benchmark 1 (3 runs): master/stage3/bin/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.86s  ± 36.3ms    1.84s  … 1.90s           0 ( 0%)        0%
  peak_rss            798MB ± 3.84MB     796MB …  803MB          0 ( 0%)        0%
  cpu_cycles         11.0G  ± 24.0M     11.0G  … 11.1G           0 ( 0%)        0%
  instructions       28.5G  ±  796K     28.5G  … 28.5G           0 ( 0%)        0%
  cache_references    610M  ± 1.41M      609M  …  611M           0 ( 0%)        0%
  cache_misses       52.8M  ±  559K     52.2M  … 53.2M           0 ( 0%)        0%
  branch_misses      49.7M  ±  366K     49.3M  … 50.1M           0 ( 0%)        0%
Benchmark 2 (3 runs): writergate/bin/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.09s  ± 2.02ms    2.09s  … 2.09s           0 ( 0%)        💩+ 12.4% ±  3.1%
  peak_rss            800MB ±  693KB     800MB …  801MB          0 ( 0%)          +  0.3% ±  0.8%
  cpu_cycles         12.4G  ± 41.7M     12.4G  … 12.4G           0 ( 0%)        💩+ 12.5% ±  0.7%
  instructions       30.4G  ±  472K     30.4G  … 30.4G           0 ( 0%)        💩+  6.8% ±  0.0%
  cache_references    615M  ± 2.47M      612M  …  617M           0 ( 0%)          +  0.9% ±  0.7%
  cache_misses       53.9M  ± 1.11M     52.9M  … 55.1M           0 ( 0%)          +  2.1% ±  3.8%
  branch_misses      46.5M  ±  179K     46.4M  … 46.7M           0 ( 0%)        ⚡-  6.3% ±  1.3%

Compiler Binary Size (ReleaseSmall)

  • x86_64: 13.6 -> 13.3 MiB (-2%)
  • zig1.wasm: 2.8 -> 2.7 MiB (-4%)

C Backend Building the Zig Compiler

Benchmark 1 (3 runs): master/bin/zig build-exe -ofmt=c ...writergate source tree...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.67s  ± 40.5ms    2.65s  … 2.72s           0 ( 0%)        0%
  peak_rss            572MB ± 1.13MB     571MB …  573MB          0 ( 0%)        0%
  cpu_cycles         37.3G  ±  190M     37.2G  … 37.5G           0 ( 0%)        0%
  instructions       72.8G  ± 3.21M     72.8G  … 72.8G           0 ( 0%)        0%
  cache_references   1.90G  ± 9.27M     1.89G  … 1.91G           0 ( 0%)        0%
  cache_misses        131M  ± 1.29M      130M  …  132M           0 ( 0%)        0%
  branch_misses       146M  ±  161K      146M  …  146M           0 ( 0%)        0%
Benchmark 2 (3 runs): writergate/bin/zig build-exe -ofmt=c ...writergate source tree...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.70s  ± 19.5ms    2.68s  … 2.72s           0 ( 0%)          +  0.8% ±  2.7%
  peak_rss            572MB ±  208KB     572MB …  572MB          0 ( 0%)          -  0.0% ±  0.3%
  cpu_cycles         36.4G  ±  253M     36.2G  … 36.6G           0 ( 0%)          -  2.3% ±  1.4%
  instructions       69.8G  ± 4.18M     69.8G  … 69.8G           0 ( 0%)        ⚡-  4.1% ±  0.0%
  cache_references   2.08G  ± 38.2M     2.04G  … 2.12G           0 ( 0%)        💩+  9.5% ±  3.3%
  cache_misses        134M  ± 2.81M      131M  …  136M           0 ( 0%)          +  2.1% ±  3.8%
  branch_misses       143M  ±  486K      142M  …  143M           0 ( 0%)        ⚡-  2.5% ±  0.6%

C Backend Building Hello World

ReleaseFast zig

Benchmark 1 (27 runs): master/stage3/bin/zig build-exe master/zig/test/standalone/simple/hello_world/hello.zig -ofmt=c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           187ms ± 4.08ms     180ms …  194ms          0 ( 0%)        0%
  peak_rss            129MB ±  553KB     129MB …  131MB          0 ( 0%)        0%
  cpu_cycles         1.49G  ± 11.3M     1.47G  … 1.53G           2 ( 7%)        0%
  instructions       2.70G  ±  252K     2.70G  … 2.70G           0 ( 0%)        0%
  cache_references    101M  ±  498K      101M  …  103M           1 ( 4%)        0%
  cache_misses       8.70M  ±  194K     8.14M  … 9.18M           2 ( 7%)        0%
  branch_misses      8.32M  ± 91.4K     8.07M  … 8.46M           1 ( 4%)        0%
Benchmark 2 (35 runs): writergate/bin/zig build-exe writergate/zig/test/standalone/simple/hello_world/hello.zig -ofmt=c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           145ms ± 5.17ms     137ms …  160ms          0 ( 0%)        ⚡- 22.4% ±  1.3%
  peak_rss            128MB ±  518KB     127MB …  129MB          0 ( 0%)          -  0.8% ±  0.2%
  cpu_cycles         1.17G  ± 11.4M     1.16G  … 1.20G           0 ( 0%)        ⚡- 21.5% ±  0.4%
  instructions       2.07G  ±  210K     2.07G  … 2.07G           0 ( 0%)        ⚡- 23.3% ±  0.0%
  cache_references   81.4M  ±  478K     80.5M  … 82.6M           0 ( 0%)        ⚡- 19.7% ±  0.2%
  cache_misses       7.21M  ±  152K     6.91M  … 7.47M           0 ( 0%)        ⚡- 17.1% ±  1.0%
  branch_misses      7.40M  ± 64.0K     7.27M  … 7.54M           0 ( 0%)        ⚡- 11.1% ±  0.5%

Debug zig

Benchmark 1 (3 runs): master/bin/zig build-exe master/zig/test/standalone/simple/hello_world/hello.zig -ofmt=c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          17.2s  ± 16.7ms    17.1s  … 17.2s           0 ( 0%)        0%
  peak_rss            245MB ± 1.04MB     244MB …  246MB          0 ( 0%)        0%
  cpu_cycles         54.4G  ±  150M     54.2G  … 54.5G           0 ( 0%)        0%
  instructions       40.8G  ± 3.62M     40.8G  … 40.8G           0 ( 0%)        0%
  cache_references   2.65G  ± 5.59M     2.65G  … 2.66G           0 ( 0%)        0%
  cache_misses        408M  ± 1.97M      406M  …  410M           0 ( 0%)        0%
  branch_misses       190M  ±  103K      190M  …  190M           0 ( 0%)        0%
Benchmark 2 (3 runs): writergate/bin/zig build-exe writergate/zig/test/standalone/simple/hello_world/hello.zig -ofmt=c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.1s  ± 43.4ms    16.0s  … 16.1s           0 ( 0%)        ⚡-  6.3% ±  0.4%
  peak_rss            244MB ±  695KB     244MB …  245MB          0 ( 0%)          -  0.1% ±  0.8%
  cpu_cycles         44.3G  ±  139M     44.2G  … 44.4G           0 ( 0%)        ⚡- 18.5% ±  0.6%
  instructions       33.4G  ±  437K     33.4G  … 33.4G           0 ( 0%)        ⚡- 18.2% ±  0.0%
  cache_references   2.19G  ± 11.5M     2.18G  … 2.20G           0 ( 0%)        ⚡- 17.2% ±  0.8%
  cache_misses        339M  ± 6.31M      332M  …  345M           0 ( 0%)        ⚡- 16.9% ±  2.6%
  branch_misses       185M  ±  395K      184M  …  185M           0 ( 0%)        ⚡-  2.7% ±  0.3%

Upgrade Guide

Turn on -freference-trace to help you find all the format string breakage.

"{f}" Required to Call format Methods

Example:

std.debug.print("{}", .{std.zig.fmtId("example")});

This will now cause a compile error:

error: ambiguous format string; specify {f} to call format method, or {any} to skip it

Fixed by:

std.debug.print("{f}", .{std.zig.fmtId("example")});

Motivation: eliminate these two footguns:

Introducing a format method to a struct caused a bug if there was formatting code somewhere that prints with {} and then starts rendering differently.

Removing a format method to a struct caused a bug if there was formatting code somewhere that prints with {} and is now changed without notice.

Now, introducing a format method will cause compile errors at all {} sites. In the future, it will have no effect.

Similarly, eliminating a format method will not change any sites that use {}.

Using {f} always tries to call a format method, causing a compile error if none exists.

Format Methods No Longer Have Format Strings or Options

pub fn format(
    this: @This(),
    comptime format_string: []const u8,
    options: std.fmt.FormatOptions,
    writer: anytype,
) !void { ... }

⬇️

pub fn format(this: @This(), writer: *std.io.Writer) std.io.Writer.Error!void { ... }

The deleted FormatOptions are now for numbers only.

Any state that you got from the format string, there are three suggested alternatives:

  1. different format methods
pub fn formatB(foo: Foo, writer: *std.io.Writer) std.io.Writer.Error!void { ... }

This can be called with "{f}", .{std.fmt.alt(Foo, .formatB)}.

  1. std.fmt.Alt
pub fn bar(foo: Foo, context: i32) std.fmt.Alt(F, F.baz) {
    return .{ .data = .{ .context = context } };
}
const F = struct {
    context: i32,
    pub fn baz(f: F, writer: *std.io.Writer) std.io.Writer.Error!void { ... }
};

This can be called with "{f}", .{foo.bar(1234)}.

  1. return a struct instance that has a format method, combined with {f}.
pub fn bar(foo: Foo, context: i32) F {
    return .{ .context = 1234 };
}
const F = struct {
    context: i32,
    pub fn format(f: F, writer: *std.io.Writer) std.io.Writer.Error!void { ... }
};

This can be called with "{f}", .{foo.bar(1234)}.

Formatted Printing No Longer Deals with Unicode

If you were relying on alignment combined with Unicode codepoints, it is now ASCII/bytes only. The previous implementation was not fully Unicode-aware. If you want to align Unicode strings you need full Unicode support which the standard library does not provide.

std.io.getStdOut().writer().print()

Please use buffering! And don't forget to flush!

var stdout_buffer: [1024]u8 = undefined;
var stdout_writer = std.fs.File.stdout().writer(&buffer);
const stdout = &stdout_writer.interface;

// ...

try stdout.print("...", .{});

// ...

try stdout.flush();

Miscellaneous

  • std.fs.File.reader -> std.fs.File.deprecatedReader
  • std.fs.File.writer -> std.fs.File.deprecatedWriter
  • std.fmt.fmtSliceEscapeLower -> std.ascii.hexEscape
  • std.fmt.fmtSliceEscapeUpper -> std.ascii.hexEscape
  • std.zig.fmtEscapes -> std.zig.fmtString
  • std.fmt.fmtSliceHexLower -> {x}
  • std.fmt.fmtSliceHexUpper -> {X}
  • std.fmt.fmtIntSizeDec -> {B}
  • std.fmt.fmtIntSizeBin -> {Bi}
  • std.fmt.fmtDuration -> {D}
  • std.fmt.fmtDurationSigned -> {D}
  • std.fmt.Formatter -> std.fmt.Alt
    • now takes context type explicitly
    • no fmt string

These are deprecated but not deleted yet:

  • std.fmt.format -> std.io.Writer.print
  • std.io.GenericReader -> std.io.Reader
  • std.io.GenericWriter -> std.io.Writer
  • std.io.AnyReader -> std.io.Reader
  • std.io.AnyWriter -> std.io.Writer

If you have an old stream and you need a new one, you can use adaptToNewApi() like this:

fn foo(old_writer: anytype) !void {
    var adapter = old_writer.adaptToNewApi();
    const w: *std.io.Writer = &adapter.new_interface;
    try w.print("{s}", .{"example"});
    // ...
}

New API

Formatted Printing

  • {t} is shorthand for @tagName() and @errorName()
  • {d} and other integer printing can be used with custom types which calls formatNumber method.
  • {b64}: output string as standard base64

std.io.Writer and std.io.Reader

These have a bunch of handy new APIs that are more convenient, perform better, and are not generic. For instance look at how reading until a delimiter works now.

These streams also feature some unique concepts compared with other languages' stream implementations:

  • The concept of discarding when reading: allows efficiently ignoring data. For instance a decompression stream, when asked to discard a large amount of data, can skip decompression of entire frames.
  • The concept of splatting when writing: this allows a logical "memset" operation to pass through I/O pipelines without actually doing any memory copying, turning an O(M*N) operation into O(M) operation, where M is the number of streams in the pipeline and N is the number of repeated bytes. In some cases it can be even more efficient, such as when splatting a zero value that ends up being written to a file; this can be lowered as a seek forward.
  • Sending a file when writing: this allows an I/O pipeline to do direct fd-to-fd copying when the operating system supports it.
  • The stream user provides the buffer, but the stream implementation decides the minimum buffer size. This effectively moves state from the stream implementation into the user's buffer

std.fs.File.Reader

Memoizes key information about a file handle such as:

  • The size from calling stat, or the error that occurred therein.
  • The current seek position.
  • The error that occurred when trying to seek.
  • Whether reading should be done positionally or streaming.
  • Whether reading should be done via fd-to-fd syscalls (e.g. sendfile)
    versus plain variants (e.g. read).

Fulfills the std.io.Reader interface.

This API turned out to be super handy in practice. Having a concrete type to pass around that memoizes file size is really nice.

std.fs.File.Writer

Same idea but for writing.

What's NOT Included in this Branch

This is part of a series of changes leading up to "I/O as an Interface" and Async/Await Resurrection. However, this branch does not do any of that. It also does not do any of these things:

  • Rework tls
  • Rework http
  • Rework json
  • Rework zon
  • Rework zstd
  • Rework flate
  • Rework zip
  • Rework package fetching
  • Delete fifo.LinearFifo
  • Delete the deprecated APIs mentioned above

I have done all the above in a separate branch and plan to upstream them one at a time in follow-up PRs, eliminating dependencies on the old streaming APIs like a game of pick-up-sticks.

Merge Checklist:

  • bootstrapped compiler is crashing
  • Windows TODOs
  • fix bootstrapped stage3 compiler crashing
  • fix failing behavior tests
  • fix failing std lib tests
  • solve the TODOs in std.io.Writer
  • finish implementing std.fs.File.Writer which doesn't handle positional mode. this is probably breaking caching which now relies on the manifest being written positionally to save from having to seek.
  • eliminate Writer.count
  • update-zig1 is non viable
  • error for using alignment options when they're not observed
  • something about packed structs in Reader
  • formatNumber rather than formatInteger