PipeWire: The Linux audio/video bus

11 min read Original article ↗
LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing.

For more than a decade, PulseAudio has been serving the Linux desktop as its predominant audio mixing and routing daemon — and its audio API. Unfortunately, PulseAudio's internal architecture does not fit the growing sandboxed-applications use case, even though there have been attempts to amend that. PipeWire, a new daemon created (in part) out of these attempts, will replace PulseAudio in the upcoming Fedora 34 release. It is a coming transition that deserves a look.

Speaking of transitions, Fedora 8's own switch to PulseAudio in late 2007 was not a smooth one. Longtime Linux users still remember having the daemon branded as the software that will break your audio. After a bumpy start, PulseAudio emerged as the winner of the Linux sound-server struggles. It provided a native client audio API, but also supported applications that used the common audio APIs of the time — including the raw Linux ALSA sound API, which typically allows only one application to access the sound card. PulseAudio mixed the different applications' audio streams and provided a central point for audio management, fine-grained configuration, and seamless routing to Bluetooth, USB, or HDMI. It positioned itself as the Linux desktop equivalent of the user-mode audio engine for Windows Vista and the macOS CoreAudio daemon.

Cracks at PulseAudio

By 2015, PulseAudio was still enjoying its status as the de facto Linux audio daemon, but cracks were beginning to develop. The gradual shift to sandboxed desktop applications may be proving fatal to its design: with PulseAudio, an application can snoop on other applications' audio, have unmediated access to the microphone, or load server modules that can interfere with other applications. Attempts were made at fixing PulseAudio, mainly through an access-control layer and a per-client memfd-backed transport. This was all necessary but not yet sufficient for isolating clients' audio.

Around that time, David Henningson, one of the core PulseAudio developers, resigned from the project. He cited frustrations over the daemon's poor fit for the sandboxed-applications use case, and its intermixing of mechanism and policy for audio-routing decisions. At the end of his message, he wondered if the combination of these problems might be the birth pangs of a new and much-needed Linux audio daemon:

In software nothing is impossible, but to re-architecture PulseAudio to support all of these requirements in a good way (rather than to "build another layer on top" […]) would be very difficult, so my judgment is that it would be easier to write something new from scratch.

And I do think it would be possible to write something that took the best from PulseAudio, JACK, and AudioFlinger, and get something that would work well for both mobile and desktop; for pro-audio, gaming, low-power music playback, etc. […] I think we, as an open source community, could have great use for such a sound server.

PulseVideo to Pinos

Meanwhile, GStreamer co-creator Wim Taymans was asked to work on a Linux service to mediate web browsers' access to camera devices. Initially, he called the project PulseVideo. The idea behind the name was simple: similar to the way PulseAudio was created to mediate access to ALSA sound devices, PulseVideo was created to mediate and multiplex access to the Video4Linux2 camera device nodes.

A bit later, Taymans discovered a similarly-named PulseVideo prototype [video], created by William Manley, and helped in upstreaming the GStreamer features required by its code. To avoid conflicts with the PulseAudio name, and due to scope extension beyond just camera access, Taymans later renamed the project to Pinos — in a reference to his town of residence in Spain.

Pinos was built on top of GStreamer pipelines, using some of the infrastructure that was earlier refined for Manley's prototype. D-Bus with file-descriptor passing was used for interprocess communication. At the GStreamer 2015 conference, Taymans described the Pinos architecture [PDF] to attendees and gave a demo of multiple applications accessing the system camera feed in parallel.

Due to its flexible, pipeline-based, file-descriptor-passing architecture, Pinos also supported media broadcasting in the other direction: applications could "upload" a media stream by passing a memfd or dma-buf file descriptor. The media stream can then be further processed and distributed to other applications and system multimedia sinks like ALSA sound devices.

While only discussed in passing, the ability to send streams in both directions and across applications allowed Pinos to act as a generic audio/video bus — efficiently funneling media between isolated, and possibly sandboxed, user processes. The scope of Pinos (if properly extended) could thus overlap with, and possibly replace, PulseAudio. Taymans was explicitly asked that question [video, 31:35], and he answered: "Replacing PulseAudio is not an easy task; it's not on the agenda [...] but [Pinos] is very broad, so it could do more later."

As the PulseAudio deficiencies discussed in the earlier section became more problematic, "could do more later" was not a far-off target.

PipeWire

By 2016, Taymans started rethinking the foundations of Pinos, extending its scope to become the standard Linux audio/video daemon. This included the "plenty of tiny buffers" low-latency audio use case typically covered by JACK. There were two main areas that needed to be addressed.

First, the hard dependency on GStreamer elements and pipelines for the core daemon and client libraries proved problematic. GStreamer has plenty of behind-the-scenes logic to achieve its flexibility. During the processing of a GStreamer pipeline, done within the context of Pinos realtime threads, this flexibility came at the cost of implicit memory allocations, thread creation, and locking. These are all actions that are well-known to negatively affect the predictability needed for hard realtime code.

To achieve part of the GStreamer pipelines' flexibility while still satisfying hard realtime requirements, Taymans created a simpler multimedia pipeline framework and called it SPA — the Simple Plugin API [PDF]. The framework is designed to be safely executed from realtime threads (e.g. Pinos media processing threads), with a specific time budget that should never be exceeded. SPA performs no memory allocations; instead, those are the sole responsibility of the SPA framework application.

Each node has a well-defined set of states. There is a state for configuring the node's ports, formats, and buffers — done by the main (non-realtime) thread, a state for the host to allocate all the necessary buffers required by the node after its configuration, and a separate state where the actual processing is done in the realtime threads. During streaming, if any of the media pipeline nodes change state (e.g. due to an event), the realtime threads can be notified so that control is switched back to the main thread for reconfiguration.

Second, D-Bus was replaced as the IPC protocol. Instead, a native fully asynchronous protocol that was inspired by Wayland — without the XML serialization part — was implemented over Unix-domain sockets. Taymans wanted a protocol that is simple and hard-realtime safe.

By the time the SPA framework was integrated and a native IPC protocol was developed, the project had long-outgrown its original purpose: from a D-Bus daemon for sharing camera access to a full realtime-capable audio/video bus. It was thus renamed again, to PipeWire — reflecting its new status as a prominent pipeline-based engine for multimedia sharing and processing.

Lessons learned

From the start, the PipeWire developers applied an essential set of lessons from existing audio daemons like JACK, PulseAudio, and the Chromium OS Audio Server (CRAS). Unlike PulseAudio's intentional division of the Linux audio landscape into consumer-grade versus professional realtime audio, PipeWire was designed from the start to handle both. To avoid the PulseAudio sandboxing limitations, security was baked-in: a per-client permissions bitfield is attached to every PipeWire node — where one or more SPA nodes are wrapped. This security-aware design allowed easy and safe integration with Flatpak portals; the sandboxed-application permissions interface now promoted to a freedesktop XDG standard.

Like CRAS and PulseAudio, but unlike JACK, PipeWire uses timer-based audio scheduling. A dynamically reconfigurable timer is used for scheduling wake-ups to fill the audio buffer instead of depending on a constant rate of sound card interrupts. Beside the power-saving benefits, this allows the audio daemon to provide dynamic latency: higher for power-saving and consumer-grade audio like music playback; low for latency-sensitive workloads like professional audio.

Similar to CRAS, but unlike PulseAudio, PipeWire is not modeled on top of audio-buffer rewinding. When timer-based audio scheduling is used with huge buffers (as in PulseAudio), support for rewriting the sound card's buffer is needed to provide a low-latency response to unpredictable events like a new audio stream or a stream's volume change. The big buffer already sent to the audio device must be revoked and a new buffer needs to be submitted. This has resulted in significant code complexity and corner cases [PDF]. Both PipeWire and CRAS limit the maximum latency/buffering to much lower values — thus eliminating the need for buffer rewinding altogether.

Like JACK, PipeWire chose an external-session-manager setup. Professional audio users typically build their own audio pipelines in a session-manager application like Catia or QjackCtl, then let the audio daemon execute the final result. This has the benefit of separating policy (how the media pipeline is built) from mechanism (how the audio daemon executes the pipeline). At GUADEC 2018, developers explicitly asked Taymans [video, 23:15] to let GNOME, and possibly other external daemons, take control of that part of the audio stack. Several system integrators had already run into problems because PulseAudio embeds audio-routing policy decisions deep within its internal modules code. This was also one of the pain points mentioned in Henningson's resignation e-mail.

Finally, following the trend of multiple influential system daemons created in the last decade, PipeWire makes extensive use of Linux-kernel-only APIs. This includes memfd, eventfd, timerfd, signalfd, epoll, and dma-buf — all of which make the "file-descriptor" the primary identifier for events and shared buffers in the system. PipeWire's support for importing dma-buf file descriptors was key in implementing efficient Wayland screen capture and recording. For large 4K and 8K screens, the CPU does not need to touch any of the massive GPU buffers: GNOME mutter (or similar applications) passes a dma-buf descriptor that can then be integrated into PipeWire's SPA pipelines for further processing and capturing.

Adoption

The native PipeWire API has been declared stable since the project's major 0.3 release. Existing raw ALSA applications are supported through a PipeWire ALSA plugin. JACK applications are supported through a re-implementation of the JACK client libraries and the pw-jack tool if both native and PipeWire JACK libraries are installed in parallel. PulseAudio applications are supported through a pipewire-pulse daemon that listens to PulseAudio's own socket and implements its native communication protocol. This way, containerized desktop applications that use their own copy of the native PulseAudio client libraries are still supported. WebRTC, the communication framework (and code) used by all major browsers, includes native PipeWire support for Wayland screen sharing — mediated through a Flatpak portal.

The graph below shows a PipeWire media pipeline, generated using pw-dot then slightly beautified, on an Arch Linux system. A combination of PipeWire-native and PulseAudio-native applications is shown:

[PipeWire
pipeline]

On the left, both GNOME Cheese and a GStreamer pipeline instance created with gst-launch-1.0 are accessing the same camera feed in parallel. In the middle, Firefox is sharing the system screen (for a Jitsi meeting) using WebRTC and Flatpak portals. On the right, the Spotify music player (a PulseAudio app) is playing audio, which is routed to the system's default ALSA sink — with GNOME Settings (another PulseAudio app) live-monitoring the Left/Right channel status of that sink.

On the Linux distributions side of things, Fedora has been shipping the PipeWire daemon (only for Wayland screen capture) since its Fedora 27 release. Debian offers PipeWire packages, but replacing PulseAudio or JACK is "an unsupported use case." Arch Linux provides PipeWire in its central repository and officially offers extra packages for replacing both PulseAudio and JACK, if desired. Similarly, Gentoo provides extensive documentation for replacing both daemons. The upcoming Fedora 34 release will be the first Linux distribution that will have PipeWire fully replace PulseAudio by default and out of the box.

Overall, this is a critical period in the Linux multimedia scene. While open source is a story about technology, it's also a story about the people hard at work creating it. There has been a notable agreement from both PulseAudio and JACK developers that PipeWire and its author are on the right track. The upcoming Fedora 34 release should provide a litmus test for PipeWire's Linux distributions adoption moving forward.


Index entries for this article
GuestArticlesDarwish, Ahmed S.