Arcan Explained – A browser for different webs.

This is the concluding follow up to the article on ‘Arcan as Operating System Design‘. When combined, we have reach the end of trying to explain what Arcan actually ‘is’ on a higher level. After this we can revisit the topic in a first order form as ‘Implementation’.

In the first slides from the only public presentation (2015) we claimed that the idea behind Arcan was to find the crunchy middle between a display server, a game engine and a multimedia processor.

The control plane to this “desktop engine” was designed for a scripting API targeting entry-level developers. Lua remain as the weapon of choice for that role as a better follow up to the ‘BASIC’ of the home computer era, but also as an intermediate for other tools to compile to.

Slide about the general idea behind Arcan (2015)

To make a long story short: That, my friends, is a browser; just average in a few areas, yet create something interesting out of orchestrated mediocrity — not unlike musical theatre.

In order to get anywhere on such a broad topic we will need to limit the scope a bit or we will be here all day. The web itself is as much, or more, social structures growing and evolving as a bacterium on a technological substrate than it is any singular piece of technology.

Even if some cry out in doom and gloom about the dominance of Chrome ‘the browser’ (albeit also Chrome as Chrome OS and Chrome as Electron), do remember that it is still subject to other pieces like the server, the protocols, the app and the structural glue that is ‘the link‘. If you want to offset one part, you ought to consider how well it plays with the others as well.

This article will only focus on the technical side to the user facing access to this substrate. That is the browser software itself. An example of the makings of another ‘web’ to go with it, sponsored by NLnet (https://nlnet.nl/project/Arcan-A12/), is covered in ‘Arcan-a12: Weaving a Different Web‘ but it is architecturally loosely coupled.

To make things a little easier than what is normal around here, I will split it into a simplified ‘short form’ and as two heavier text sections.

The first section, Browser Breakdown, covers the problem space and some cherry picked history. The second, Arcan as Browser Design, goes into how we account for it.

The browser broken down is the end result of “the document” trying to envelop the computing world. The browser as proposed is “the desktop” trying to reach out.

The short form can be found here:

Presentation slides with a short form version of this blog post and the related ‘a12-web’ one.

To demonstrate that this is existing technology, the presentation is also provided as an Arcan appl, viewable through “arcan-net arcan.divergent-desktop.org explain” (it is running on a potato used for active development).

That one happens to have been published a while ago. According to the logs, about two people have managed to find it. It seems there are lessons from the past about other meanings of “Search” that might be useful to rediscover now when more and more people and places alike are all llmpty on the inside and the noise floor is reaching the ceiling.

Some suggested extended reading on the topics that will be covered here and in the next section:

The History of PDF (historical, light read)
Information Management: A Proposal (1990) (historical, intermediate)
NeWS: A Networked and Extensible Windowing System (video, pdf) (historical, advanced)
Inside look at a modern web browser (current, multiple parts, light read)
SOK: On the Analysis of Web Browser Security (current, advanced)
A File Structure for the Complex, The Changing and the Intermediate (historical, advanced)
Five Walled Gardens: Why Browsers are Essential to the Internet and How Operating Systems are Holding Them Back (current, political)

Browser Breakdown

The big question to unpack is if the browser is supposed to ‘present and navigate interlinked documents?’ in the spirit of the ~~original~~ much-too-simple hypertext, or be ‘an operating system agnostic substrate for networked applications’.

Lets start with a hypothetical speedrun through history.

You are back in the cosy times of the early 1990ies when computing was slow, storage was small and floppy disks got corrupted simply by staring at them for too long. You are struck with the brilliant idea of developing a software product that is supposed to just show very simple documents with formatted text in various typefaces, sizes and styles. Throw in the odd image or two. Maybe some media clip, but that can come later.

These documents should naturally be hosted and shared across networks of computers as that is faster and more convenient than copying things to floppy disks.

It turns out great with a lot of satisfied users that soon wants more. At the same time, windowing systems become more common. These force your simple document viewer to support moving things around to react to the window resizing. Users naturally want to handle different outputs, such as printers.

The documents these users have created started to form packs and people wanted ways for them to cross reference each-other and navigate these references with the click of a button. You oblige by adding ‘links’ to other documents and the option to open them in a new window.

Networks at this time are slow and servers are quite busy, so the next logical step would be some caching to make sure you only retrieve a linked document or its embedded images when something has actually changed.

A ‘popular’ kind of document turned out to be ‘forms’ for people to fill out, print and post because those are like cocaine to governments and businesses alike. With that you add some way for the user to type into parts of these forms to fill them out before printing. This will save many from wasting time deciphering messy handwriting. Both throughput and accuracy improves greatly.

Still people manage to fill them out wrong — to err is human and all that. It would save everyone time if you just add some easy way to have presets to chose from; a list box here and a radio box there. Soon enough every UI ‘widget’ imaginable has managed to join in, but you still think of it as a ‘document’ in the end.

Why even go through the pains of printing and posting? We can just sneak in some magic to POST the filled form as an e-mail or send it to the same place the document was hosted. Even better, we can embed some code to make sure that some of the inputs the user provides can be checked for errors locally and tell the user about any errors.

If something is made programmable, it follows that some crazy hacker kids will figure out creative ways to take advantage of this to do all sorts of unintended things. It did not take long before the documents started spawning little cats chasing the mouse pointer around.

Documents reshaping themselves as they were being viewed became the new normal. That is a big change in expectations for all stakeholders.

You should be able to see where this is going by now — It doesn’t take that many feature requests for a static document to become a dynamic application, similar to how few wrong turns it take to make a regular language grammar into a recursively enumerable one.

To keep up with new discoveries in everything from image compression to network communication lots of new code got written in order to support new media formats and more advanced protocols.

The easiest way to make software do new things is to add more code, and the path of least resistance is typically the one people follow. This is where software architecture matters and stops being just boxes and arrows on a whiteboard somewhere; it sets the constraints for how new code combines with the old.

Just adding more code to solve for new features rather than refactoring to be more clever about things comes at a cost. More code means more potential bugs. Mix in economic incentives and some of those bugs turn into opportunities to exploit which calls for mitigation strategies.

One of the more potent ones are sometimes called sandboxing (a terrible term but that is another can of worms best left unopened) and comes with heavy requirements on security boundaries. Re-architecting to do that successfully is not for the faint of heart or light of wallet.

In the browser case, such a project comes with another interesting side effect: you are now compositing and routing inputs and outputs between a number of interactive sources and sinks across several privilege boundaries.

That is how you end up with something like this:

Simplified chrome composition, taken from “Inside Look at a Modern Web Browser”

Congratulations, you’ve made a display server with the weakest windowing system in town. Instead of leveraging the IPC system for better ways of collaborating with local software they stuck to crude left-overs from the outer windowing system like clipboards, drag and drop or, sigh, “portals”.

A historically interesting tangent is that Microsoft was on this trajectory way back when and got close to a stronger solution through its COM/OLE/ActiveX path before falling into the antitrust trials pit (not that the idea went away).

Now every few years there is some take on the “Java Applet” form of assimilation by forced cross compilation to an embedded executable format referenced from the document (which turns the document into a de-facto format for linking and loading yet with few of the insights embedded in proper executable formats) to target a virtual machine. Shockwave Flash being a notable one, and “WebAssembly” a more recent attempt.

What if, instead of trying to regress towards “simple” in the browser domain (just to see the same story repeat itself) or in the data model (and still be stuck with the same browser story) – we assume the networked application endgame from the start and design for that?

This leads us to …

Arcan as a Browser Design

To summarise the previous section – the browser started as linked documents breaking down into a networked application with ‘the document’ organically turning into ‘an executable format’.

In a spider graph form they went from this:

Chart highlighting that the ‘browser’ initial strength and focus was media processing.

To this:

Chart showing the ‘browser’ improving along all axes, dominating in the media processing role.

These are, of course, rough estimates. Improvements along each feature axis is exponentially more difficult and expensive.

The expected and required feature-space span the entire gradient:

Gradient between ‘Static Document’ and Networked Application’ saying that the traditional browser needs to cover all of it.

It is worthwhile to note that it can transition from any point on the gradient to another. You get no guarantees about where you start, or if it will stay that way, unless you disable expected features with unpredictable results.

We can say the processing is iterative and the linking (as in program execution) is dynamic and mutable (events triggering code pulling in remote code hooking events).

In contrast, Arcan started at roughly this back in the early 2000s when hope was high and life worth living:

Chart showing Arcan in its beginning as a game engine with equal parts media processing.

It is now here:

Chart showing the effect of Arcan mainly focusing on the display server aspects over the last decade.

The processing looks like this:

Two-coloured image with a hard barrier between documents and application, suggesting that the browser should only do the later.

We can say the processing is planar, static and recursive. With planar we mean that if there is a document stage, there is a discrete and decoupled transition from that into the app form. You know, a build system: compile, link/package, execute.

This means that the “as document” part of the runtime are separate tools not part of the browser itself. It only concerns itself with being an execution environment for the networked application. This keeps the browser part from having to walk an endless trail of outdated formats. Some are already conditioned to this approach with that of “static site generators”, it is just the output format that needs some change to move away from document to document translation.

With static it means that the compiled app is not capable of loading code outside of its own package, except for a set of preset helper scripts. This is the default with a user controlled opt-in rather than an opt-out as in “install extension to disable javascript”.

With recursive it means that navigating a link starts another instance of the browser that would either embed itself as any other data source in the parent, or take its place.

The emphasis on the display server aspect throughout the years here is no coincidence: it’s the weak-point of the dominant browser model and one that has very little documentation. It is also the reason why the apps developed and demonstrated here has mainly been desktop oriented — to politely point out that “you missed a spot”.

Chrome splits its app execution into a number of processes that are roughly “Network”, “Browser”, “UI”, “Storage”, “GPU”, “Device”, “Render” and “Plugin”. This is a logical breakdown of the internal division of responsibility. For privilege separation, each process role has a configured policy on which system resources it can access.

Arcan splits up its app execution into a number of processes by information transfer boundaries:

Decode – computer to human representation (images, videos, music, voice synthesis, …) into GPU buffers, pixel transfers and audio sample streams.
Encode – human to computer representation (recording, streaming, language models, object-character recognition, …).
Network – name to address resolving, discovery and retrieving remote resources.
Terminal – outer operating system space controls.

These come as a set of external clients that are picked for each instance – meaning that the supported formats, protocols and so on can be swapped out without any changes to the engine itself and per application basis. They are dependency and license “sponges”.

Their security model comes from a combination of least-privilege and capabilities. “Decode” is security wise the most sensitive one as that is where parsing of untrusted inputs go. “Encode” is privacy wise the most sensitive one as that is where the real ‘you’ distil into digital form.

After the connection is setup, it needs no file system access at all. The system calls required match an OpenBSD pledge of “stdio sendfd recvfd” which is a hard surface to break through. Even then we have microarchitectural attacks to consider in the most hardened setups. This is where the ability to live-migrate any shmif client to a networked form comes in — you can compartment to cheap one-off devices like single board computers and the architecture still holds. Memory is never safe, wear a helmet.

State store and list of permitted integrations with local software are in an SQLite database that is also picked for each instance. That local software can, of course, also be a different browser.

This becomes a hierarchy that looks like this:

Design diagram showing how parts of the Arcan engine compose recursively.

Not pictured here are the ‘hook-scripts’ facility which corresponds to ‘browser extensions’ in the web sense and the ‘developer tools’ sense.

The key component is the inter-process communication system (IPC). The one in Chrome, Mojo, is highly generic: an interface description language (IDL), a binding code generator and there you go. The qualities of the IDL limits what can be communicated and the binding generator affects the developer effort it takes to use. Part of the interface description links in ancillary specifications. An important one is the sandbox policy to apply.

We have seen it before: MIDL, CORBA, Fuchsia IDL, D-Bus, Wayland, Binder (AIDL) etc. ad nauseam. One may wonder how many generic ways we may need in order to eventually be specific.

The pitch for those are always ‘extendable / flexible’. I read that as ‘afraid to commit to anything’. It is a tempting model if you are unsure what you are supposed to be building (or organisation decoupling where you want formal barriers for communication between teams) and kick the bucket down the road. It is less tempting if there is ample history and plenty of examples to learn from.

The generic approach come at a cost. Code generated creeps into, or even dictates, the design of both sides and it ends up much bulkier and more verbose. It is rare for ancillary tooling support for debugging, report generation, testing, profiling etc. to be included.

In a previous life as a security auditor, a very successful path to victory was to locate any IDLs in use; patch to integrate fuzzing; deploy and wait for things to blow up. They typically do.

The IPC in Arcan, SHMIF, is highly specific. It is a lock-stepped data model with a C API. The message types and fields came from the needs of a full desktop OS, which has changed very little over the years. There is no reason to be clever, flexible or extendable. Just look at what others have done before and generalise from that. Validate against existing examples and use cases. Easy peasy.

Where we did put effort into being clever was two parts. Crash resilience and Alternate Representations.

Both are heavyweight topics in their own right, a dissertation or three, but for this context the motivation for resilience is easy enough – it allows us to redirect user workloads to relaunched instances (recovery), to others or other devices across the network (redirection) for collaboration or stronger security compartments, or to a future version of ourselves (upgrade).

For alternate representations, well we don’t all look at the world with the same eyes or desires now do we? The one that should be obvious is accessibility where the screen reader tradition is to snapshot composition state back into a document form and then navigate through voice synthesis and keyboard controls.

A less obvious one is for debugging – be able to visually inspect and interact with the dynamics of the software rather than what it presents itself as. The other side of the same coin comes from malware reversing as that kind of work leverages the same kind of mechanisms. Process parasites (or as some call it, ‘Fileless malware‘) have been a thing for a very long time now, yet the generalised tools to shake them out have been sorely lacking for just as long.

In closing, there is a lot of technical detail that had to be omitted for this piece to not be completely indigestible but at the high level, there is not much more to say about the Arcan project as such. There are things left to do, but the end is in sight. Midnight draws near.