Waypipe is a proxy for Wayland applications, which makes it possible to run an application on a different computer but interact with it locally, as if it were actually running on the local computer. (Wayland is the slowly-improving window system protocol for Linux, successor to X11; which most applications now support. The protocol sends plain data over a Unix socket, along with file descriptors to share less serializable things like window surface image data.)
It was written by me during the summer of 2019, and was implemented in C because libwayland used C, because most libraries provide a C interface, because other programming languages often aren’t available or are hard to install as a user on old, shared systems, and because no complicated data structures or libraries were used for which C++ would be necessary. The core operations (basic protocol parsing and shared memory buffer replication) did not take long to implement, and were done in a week. Most of Waypipe’s code is spent making this practical: making the buffer replication for displayed windows run fast and only when necessary; handling other Wayland “protocols” (read: Wayland object types and associated methods), supporting replication of DMABUFs (GPU-side memory buffers used to transfer image data between applications; typically used by OpenGL and Vulkan in place of CPU-side shared memory file descriptors.), and optionally video-encoding DMABUFs.
Making Waypipe reliable, secure, and efficient has been challenging. Waypipe receives and sends messages from Wayland applications and compositors, which it should not trust to use the various Wayland protocols properly. In addition to the (currently rather theoretical) risk of malicious applications, regular mistakes and complicated stacks of libraries can use the Wayland protocols in unexpected ways. There are several libraries implementing the base wire protocol, a number of compositors and toolkits that use it, libraries that extend or try to “share” a single Wayland connection with an existing program, and clients that people have written which directly use a wayland library instead of going through a toolkit, similarly to how many people directly used Xlib.
My approach was to try to write reliable code that handles all errors, in some form or another. (Ideally, by cleanly shutting down the connection and sending an error message to the application; this is what libwayland-server also does.) Of course, to make reliable code, I needed to test it. My main strategies were: trying many Wayland clients and subcomponents of Waypipe (worked, but tests take a while to write and still miss things), injecting errors (to check how broken memory allocation failure paths were), using addressanitizer and static analysis tools to detect issues, and fuzzing (to see what crashes when a fuzzer controls the Wayland message inputs and the internal protocol used to connect the local and remote Waypipe instances; like testing, this requires some framework code to let the fuzzer provide and manipulate file descriptors, which still doesn’t cover all cases).
Altogether, these testing approaches appear to have worked, but they
require a measure of active maintainance over time as the code is
updated. New Wayland protocols and protocol revisions continue to be
made and Waypipe has needed and will often need to adapt to them; the
wl_drm protocol once used to share DMABUFs has now been
entirely replaced by zwp_linux_dmabuf_v1, and new protocols
for explicit synchronization, presentation timing, screen capturing, and
color management are now done or being designed. There have also been
new feature requests and ideas for performance improvements.
Implementing all of these required or will require new code, which is
not as well tested as the older code and would require a lot of work to
bring to the same standard.
Rewriting Waypipe in Rust was expected to have multiple benefits.
First, to reduce the cost of making changes and adding new features
at the same level of security; Rust provides a framework with
which to encapsulate memory-unsafe code, and a safe and comprehensive
standard library, which together should significantly reduce the number
of places where memory-unsafe bugs could appear in Waypipe. Second, I
wanted to change Waypipe’s DMABUF handling backend library from
libgbm to vulkan to improve performance,
handle explicit synchronization, and more efficiently do RGB to YCbCr
conversion for the optional video encoding feature; in total I expected
that this would require changing or adding about half of Waypipe’s lines
of non-test code. Third: for me to better learn Rust; and fourth:
because I had been hearing about other C or C++ to rust rewrite
projects, and was curious whether a rewrite would be worth it. The best
way to determine that was to try it.
In practice
The rewrite went roughly as expected.
Instead of doing an incremental port of Waypipe, converting its
various logical parts piece by piece, I redeveloped the Rust version in
parallel, roughly following the same development path as the original
Waypipe. (Except this time I knew the end goal.) That is, I started with
a simplified form of the command line interface, and then developed a
basic main proxy loop, Wayland protocol parsing logic, and shared memory
buffer replication. The initial step was easier because I already had
written a different (local) Wayland proxy program in Rust
(windowtolayer). Once that was ready, I iteratively added
back the various features of Waypipe, starting with damage tracking,
compression support, and multithreaded buffer diff calculation and
application; often testing the code by connecting it to the original
Waypipe implementation.
Much of my time in the middle of the port was spent implementing
DMABUF support, this time using Vulkan instead of libgbm. I started with
a simple, single-threaded implementation and once that worked,
progressively introduced multi-threading, buffer update calculations,
zwp_linux_dmabuf_v1 protocol handling, and stride
adjustments to match the weird way the original C implementation
adjusted nominal buffer strides when using libgbm. To implement
Waypipe’s optional video encoding feature, I started with the possibly
tricky case of hardware video encoding and decoding. As Vulkan hardware
video extensions had been released in the last few years, I just used
ffmpeg’s encoder/decoder based on them, which was recently added but
worked with few issues. Software video encoding and decoding were easy
to add afterwards.
The second 90% of the work has been spent on all the miscellaneous
tasks: bringing the Rust rewrite up to feature parity with the original
version, getting it to integrate with Waypipe’s existing build system
(using meson.), and resolving the issues found after I
deemed the Rust port good enough and brought it into the main git
repository.
The rewritten code is slightly larger:
tokeireports the C implementation had about 12000 lines of code, without comments and tests, and 19000 with comments and tests, while the Rust implementation has about 16000 lines of code without comments and tests, and 23000 when comments and tests are included. (I am ignoring about 5000 lines of auto-generated Wayland protocol handling data and code which are tracked ingitfor Rust, but auto-generated in C.) The largest chunk of the difference comes from the DMABUF copying and video encoding implementation using Vulkan and libavcodec, which together use about 4000 more lines than the C implementation (which had about 400 lines for libgbm, and 1200 to libswscale, libavcodec, and vaapi interaction); most of these lines would still have been needed, had the library change been done in C.Test code was generally more efficient to write for the Rust implementation because higher-level constructs are available; for example, making it possible to compare two vectors of bytes with
==, or using closures to efficiently reuse the same generatedparse_<msg>andwrite_<msg>functions in the main protocol replication test framework as were used in the main proxy logic. The C protocol replication tests skipped many checks because they would be awkward and repetitive to write or would need more code generation. Note: These are benefits that would be available had I used C++ or some other language for Waypipe instead; I also had the advantage with the rewrite of focusing on “end-to-end” tests running the Waypipe’s proxy logic (as exposed through two Unix sockets) against various Wayland protocol transcripts. I expect this approach will require less maintenance with time than the more integrated tests used for the C implementation.Lifetimes and exclusive references: were annoying to work with in some early code, but Waypipe generally either does nothing complicated, or has multiple independent references to objects and needs to use
Rc<_>orArc<_>. They have prevented a few incorrect designs. One suboptimal thing remains: the main proxy loop (loop_inner()insrc/mainloop.rs) usesnix::poll::poll()which takesnix::poll::PollFdwhich contain references toOwnedFdobjects that are owned by various structures for Wayland protocol file descriptor replication, calledShadowFds for historical reasons; theShadowFdobjects are stored underRc<RefCell<_>>, and making theirOwnedFds available topoll()currently requires acquiring and storing aRef<_>for each one in a separate vector, and also extracting the return events from eachPollFdinto a separate vector because thePollFds need to be dropped immediately to drop theRef<_>s, so the following code can access theShadowFds. There are ways to avoid these extra vectors, and building them wasn’t expensive to begin with; but ultimately the problem is that I’ve spent too much time thinking about how to refactor this thing.When rewriting code, I sometimes noticed details that I’d missed in the original; like incomplete ssh argument parsing, or a rare edge case when clients construct a
wp_presentation_feedbackobject immediately after bindingwp_presentation. I probably would have missed these if I had translated the code instead of writing it from scratch, and comparing it with the C implementation later. Other minor improvements (like not precisely replicating DMABUF modifiers) were discovered through the use of Vulkan instead of libgbm.Much of the work in this rewrite was rather tedious, with little fundamentally new code. (I have used Vulkan before.) The only interesting bugs I have had to track down so far were memory safety+threading issues in libraries I was using, and an unfortunate typo when almost-but-not-quite copy-pasting code. The technically interesting parts (making more efficient buffer difference calculations) have been postponed until after remaining regressions have been discovered, the next release is done, and I start changing Waypipe’s internal protocol.
Rust’s error and string types are much better than C’s;
Result,Option, and first-class tuples make detecting and unpacking errors require much less work than C; one no longer needs to check which magic values identify failure, whethererrnois set or where else error messages are stored, and which arguments are returned by pointer in what circumstances. As the Wayland wire protocol is binary I did not need to use much C string handling in the original implementation.Waypipe varies in how well checked its unsafe code is; I’ve tried to document core operations on file descriptors and memory maps in detail; on the other hand, most of the DMABUF and video code is unsafe and FFI-heavy, and may leak memory when failures occur. (Fortunately, most failures are fatal, so the leaks here are not critical.) I’ve been using direct library bindings via
bindgenor unsafe crates likeashfor external libraries because the current safe bindings generally are missing required features, require statically linking in libraries, or bring in too many other dependencies.One of the original implementation’s design mistakes, perhaps, was trying to cleanly handle memory allocation failures (and report an error to the application) instead of just exiting when
malloc()returnsNULL; this made the code more complicated and added many failure paths to the code that are hard to test. While may be possible to write a Wayland client that can make Waypipe’s calls tomalloc()returnNULL, normal clients will not do this. Because Waypipe uses one process per Wayland connection it is safe for it to abort whenmalloc()fails. On the other hand, Rust has good enough memory and error handling to reliably and safely do a clean shutdown whenmalloc()fails, but standard library changes to enable this are still unstable or in progress.Error handling in longer stretches of unsafe code (ensuring everything is freed on failure) can be more awkward than C, because the standard
goto cleanup;trick is not available. Wrapping things with a type that destroys them onDropgenerally works instead. (Properly unwinding on panic for FFI wrappers generally is not needed, because C libraries generally do not panic and the FFI wrappers are usually straightforward leaf functions.)enums are useful to make the possible states of a structure clear, but doing this often requires that I define more structs for each possible state, and name them all. Picking good names for them all is a major unsolved problem in theory, so in practice I just pick bad names.Waypipe’s build system now somewhat of a mess:
mesonrunscargothrough an intermediate script to control the location of the output executable, and I still haven’t fully connectedmeson’s various build types tocargo’s. (I am continuing to usemesonbecause it is used by Waypipe’s original C implementation, which I’ve moved into a subfolder of the repository, and because Waypipe has amanpage that needs to be installed to the right place.) This continues to evolve.Rust uses much more build space (about 250MB) than C (14 MB) when building with debuginfo; this is mostly caused by a few big dependencies and 4MB compiled build scripts.
bindgenis nice to have, but translates C’scharintoi8oru8depending on the platform, instead of translating it tostd::ffi::c_char. As a result, I used*const i8a few times in my own code, until discovering the build failures on platforms wherec_char = u8. After that, I switched toc_charand started checking the C headers whenever I wanted to know whether a function’s argument actually was* char,* int8_t, or* uint8_t.cargo testis OK, but could be better. There is no convenient way to set per-test timeouts. (Some of Waypipe’s tests should never take more than a millisecond of CPU time; others take a fraction of a second if things go well.) Maybe I should switch tonextest; although I’d prefer configuring test properties in the code instead of in a separate config file. Even withnextest, though, there is the limitation that tests appear to be pass/fail and do not have a way to communicate that they are inconclusive. As Waypipe needs to maintain copies of DMABUFs, some of Waypipe’s tests are run performed for each render device available on the system. These tests would ideally be consideredSKIPPED. I have also observed the video encoding implementation producing odd results (a constant color on a non-constant image); ideally tests observing this could reportUNCLEARas this is not clearly Waypipe’s fault. I’m certainly not the first person to want either of these behaviors.Rust’s integer support is much better than C’s, for which implicit conversions are common and can hide mistakes, which in turn are hard to enable warnings for because the conversions are common. Rust also provides useful features like
ilog2,isqrt,leading_zeros,next_power_of_two,saturated_addthat to do well in C require intrinsics, carefully written bit manipulation, or that you write a function for them yourself (which the compiler hopefully identifies and replaces with the ideal implementation.)Because it was easy to do with
bindgenandlibloading, the rewrite now dynamically loadslibavcodecandlibavutilat runtime, when necessary. This reduces the time to start thewaypipeexecutable (as measured by timingwaypipe --help) from 45 to 5 milliseconds.I did not use any
async/awaitcode under the assumption that it would be too complicated and not worth the benefit. Many of Waypipe’s off-main thread tasks are compute heavy, and these tasks often wait for a specific region of a shared resource (mirror of a buffer) to become available or for the GPU to finish an operation.There currently does not appear to be a stabilized and universally efficient way for Waypipe to safely interact with shared memory regions, other than by using architecture-specific assembly. Under the C++-like memory model that Rust uses, arbitrary shared memory found through
mmapshould be consideredvolatile, since any arbitrary process or DMA device could modify or react to the memory in “ways unknown to the implementation”. However, Waypipe in particular can assume it is connected to a well-behaving Wayland process, and that there are no side effects to memory access and that ordering of its writes does not matter, as long as they all happen before the application reads the contents of Waypipe’s nextsendmsg(). Similarly, Waypipe only needs to see memory writes that happen before its lastrecvmsg()returns, and only needs to be “safe” when reading from the shared memory region: the compiler should never assume that two repeated or overlapping read operations will return the same result.Using
&[u8]would not provide this guarantee, so Waypipe currently treats memory buffers shared with other processes as essentially&[AtomicU8], usingRelaxedmemory access ordering. This is probably fine in practice on current architectures, as the relaxed atomic operations would be implemented either with plain loads and stores, or with something stronger. There is still the theoretical problem that, as far as I am aware, Atomic types are only guaranteed to work when the memory is updated “within the memory model”. (For example, one could imagine an architecture where the compiler’s preferred atomic operations will crash the program if they overlap with DMA operations, but it has volatile operations which are OK.) As an alternative, Waypipe might be able to usestd::ptr::read_volatileandstd::ptr::write_volatileon entire 64-byte cache lines and thereby give the compiler more freedom to optimize than if Waypipe were to do volatile operations on a singleu8oru64at a time.
Things that I’d like to have
A cross-GPU-platform library for general data compression and decompression on GPU with Vulkan; ideally for lz4 or zstd, but some other CPU-friendly format would be OK.
For
bindgento accept a list of functions so that, if it does not generate bindings for all of them, it should return a failing exit code.bindgencurrently can only filter which of the functions (or variables, constants, etc.) it makes bindings for.A variation on the
format!()macro that produces an iterator instead of aString; this would make it possible to (without restructuring the code very much) eliminate many intermediate allocations from dynamically chosen trees offormat!operations, like the following:format!("{} is {}", if a { format!("{:x}", b) } else { "C" }, if z { format!("{:x}", y) } else { "Z" })I would not be surprised if this already exists.
Possible improvements for Rust
Having learned more Rust recently, it is my irresponsibility to suggest things wiser programmers probably can explain are bad ideas.
I sometimes use key
k1to lookup an&mutvaluexfrom a BTreeMap, read data fromxto determine a keyk2distinct fromk1, which I use to lookup&mutvaluey, and then modify bothxandyin some fashion. Doing this requires droppingxand then looking it up again in the map. Sometimes there is a third key whose value I’d like to modify, but the total number is always small. The extra lookups could be avoided withRefCell, but that has significant space overhead and is awkward to use when programming. I think this problem could be solved with a sort ofsplit_at_mut()-analogue; a method on BTreeMap that looks something likeget_mut_and_remainder(&mut self, key: &K) -> Option<(&mut V, RemainingMap<K,V,1>)>where
RemainingMap<K,V,N>is a type referring to the BTreeMap which keeps a list ofNreferences&Kand allows mutable lookups (but not insertions or deletions) of keys with aget_mut_and_remainder<N>(&mut self, key: &K) -> Option<(&mut V, RemainingMap<K,V,N + 1>)>signature, failing when
keymatches any of theNreferences stored so far. This would be an adaptive version of the currently-unstableHashMap::get_many_mut. One can emulate something like this idea for slices usingsplit_at_mut(), but I don’t see how to soundly and efficiently build it on top ofBTreeMap’s current API. Maybe there is a crate that already does this.As far as I understand it, Rust has a notion of “uninitialized memory”, where, quoting
MaybeUninit’s documentation, it is “undefined behavior to have uninitialized data in a variable even if that variable has an integer type”. I don’t think this is necessary, and believe that Rust’s existing rules and mechanisms for making unconditional promises to the compiler are sufficient to enable all practical optimizations.Currently, the memory provided to Rust by
allocmay be uninitialized, and the memory region needs to be manipulated by pointer or through MaybeUninit instead of by&mut [u8]slice, becausestd::slice::from_raw_parts_mutrequires that the data region it operates on be properly initialized for the slice type (in this case,u8). As a result, one often requires two variants of any FFI function that fills a region of memory: one straightforwardly usable one which takes an&mut [u8], and one which uses raw pointers (which isunsafeto use) orMaybeUninit<u8>(safer but complicated). In practice, these two variants would produce the same code, but if a crate provides the&mut [u8]version one cannot obtain the raw pointer orMaybeUninitversion from it. (Without laundering the pointer through FFI.) I ran into this issue when trying to usenix::sys::uio::readvon a fresh allocation, and when making wrapper functions forlz4andzstdcompression and decompression.Making
allocprovide an initialized[u8](albeit with arbitrary contents) would avoid the above code duplication and the number of uses ofunsaferequired when making data structures or using external libraries. But I do not think it would inhibit necessary compiler optimizations, because Rust has good mechanisms for introducing undefined behavior (read: unconditional promises to the compiler). If one wants to optimize bounds checks around a partially initialized region of memory, thenstd::hint::assert_uncheckedcan be used to instruct the compiler which addresses are actually being read from, or one can access memory through an intermediate slice (with associated undefined behavior if an unchecked access is out of bounds for that slice.) Similarly, when allocating memory for a non-plain typeT, one does not need an “uninitialized memory” concept to make accessingTundefined behavior; the compiler should already assume that blindly transmuting raw memory ([u8; _]) intoTis invalid, because it is not guaranteed that the memory has valid contents forT. Finally, the use of uninitialized variables (e.g.let x: u8; x += 1;) is already a language error in Rust.Also: I read a document explaining adding
undefto LLVM; it gives mostly C-specific or internal justifications: like discarding implicit function return values, optimizing global variable initialization, or improving compilation when a variable in an outer scope is not used when a given condition holds: none of these should affect the Rust abstract model.Also: I read a relevant post from 2019 mentioning an old set data structure which can work when its memory region is arbitrarily initialized; the sparse set reads from “uninitialized” but exclusively owned memory. I should note two other examples which do not need initialization: first, there are catalytic algorithms which use an (arbitrarily initialized) region of memory in their calculation and later return it with the content reset to its initial values; these only require exclusive access to the memory region. Second, my favorite binary tree inversion algorithm, which uses only O(log(
tree depth)) words of space (in exchange for awesome and superlative runtime): it uses the algorithm of Savitch’s theorem to identify which addresses in memory correspond to tree nodes, and then swaps the children of each tree node. This algorithm will read from memory that the algorithm does not own (and which may constantly be changing); but only requires exclusive (&mut) access to the set of tree nodes; if one just wanted to count tree nodes, read-only non-exclusive (&) access would suffice.
Conclusion
Was the rewrite worth it? I suspect yes: improving the code does seem to be somewhat easier to do in Rust than with the original, where I could never be certain that I was not missing some edge case, and moving DMABUF handling to use Vulkan has significantly improved performance. I will know for certain in a few years when I see what types of bugs I run into. Rewriting the code did take time; I did not precisely measure it but would estimate a month of work so far (spread out over a longer period, since Waypipe was not my sole focus); this is similar to the time needed to develop the program to begin with. Could I have acheived the same effect with a month of work in C? Probably, but I would not have as much confidence that the project quality would remain stable in the future, when I will probably make many changes and spend less time testing them. (For example: I held off on parallelizing buffer diff message application with the C version, because I expected it to be a difficult task to do right.)
Overall, I think Waypipe was appropriate for a Rust rewrite: Waypipe is network facing code, needs to be efficient, does some parsing, and uses multiple threads; and was originally written in C. Interacting with existing libraries’ C APIs was, as expected, more tedious to do than in C, but I think the improvements to Waypipe’s core logic are worth it.
In general, I would pick Rust for new projects that do a lot of parsing or communication with other (untrusted or badly written) processes, are CPU limited and need to be fast or power-efficient, require fast startup, and do not deeply use large and irreplacable libraries from some other language. I would want to switch from C or C++ to Rust if the project is something that I use and make changes to often enough for the cost of making the change to be worth it; but this is rare. Switching from existing memory safe languages is probably only worth it when performance is at stake, and it is not practical to convert just the hot code.
I would not currently use Rust for glue scripts, basic file conversion, data analysis, game scripting, or exploratory programming; languages with a garbage collector and a more compact syntax (like Python, Scheme, Haskell, Clojure) tend to be better there.
Often the choice of language is controlled by which libraries are available: I’ve used C++ for many things because it was the easiest interface for a major library (Qt, OpenCASCADE, CGAL, or SDL/OpenGL). C is OK for small programs where most of the content is interaction with C APIs, but the language itself is the limiting factor beyond a certain scale, when proper number handling, string operations, or nontrivial data structures are required.
Finally, a reminder: Waypipe has been available for five years, and
using it exposes one’s local Wayland compositor to an application
running on a different computer. Even though Waypipe makes some sanity
checks on the messages it receives, it cannot guard against bugs in a
Wayland compositor. As before, do not assume that Waypipe itself
possibly being more secure makes it safe to waypipe ssh
into a compromised computer and run GUI programs; Wayland compositors
are in general not well tested against adversarial clients.
Home