Peak-load benchmarks for operating systems: Linux runs over 5,000× slower than our virtual machines

We applied place-transition nets (PTNs) defined by System V semaphores (https://doi.org/10.1080/17445760.2026.2615010) to benchmark Linux (Ubuntu 24.04.4 LTS, kernel 6.17.0-35).

Using PTNs for matrix multiplication and arrays of concurrent multiplications, we compared Linux kernel performance with that of our virtual machines (https://doi.org/10.1080/17445760.2025.2490148).

A PTN executing 1,024 parallel multiplications of 6-bit data completed in 0.912 seconds on our VM, compared with 5,673.597 seconds on Linux running on the same hardware (AMD Ryzen 7 6800H @ 4.8 GHz, 32 GB RAM). The application contains 9,216 semaphores (places) and 8,192 processes (transitions).

The Linux execution time is more than 5,000 times slower than that of our VM. We believe this gap cannot be explained solely by system-call and context-switching overhead. Instead, it points to the efficiency of the System V semaphore implementation in the Linux sem.c kernel module.

We are interested in collaborating on projects aimed at implementing semaphores with wait-for-all semantics in Linux, both at the kernel level for processes and as a runtime mechanism for fast, futex-like thread synchronization. While futex_waitv provides wait-for-any semantics, wait-for-all semantics could help eliminate many deadlocks caused by sequential resource acquisition.

For modeling, we use Tina (The TINA toolbox Home Page - TIme petri Net Analyzer - by LAAS/CNRS) as an IDE and generate large PTN models with our own toolchains. Models are exported through our NDRtoALL plugin as .h files for the PVZ machine, then recompiled and executed as Linux applications.

Our basic tools are available on GitHub:

bb69ecfb-dc30-44a9-a6cd-e213a97e2ffa

2

@Dima Welcome to our forums. Thanks for sharing.

3

Is your project strictly focused on the “scheduling” part of the Linux kernel?

Or are you also looking the actual “net” benefits, acknowledging that not all benefits can be realized because of context constraints?



I think that summary is comparing apples and oranges!

You need to provide some context parameters before putting out such extremely one-sided statements.

  • Is that a single computer setup or is it a network of peer computers sharing the workload?

  • How much memory is dedicated to the “engine” (the whole network) performing the work?

  • How many independant tasks are being performed simultaneously?

  • Are the tasks compute-oriented, memory-oriented, or I/O-oriented?

  • What is the intended market for such a compute paradigm: super-computing? multi-national multi-server computing? SME-server computing? desktop computing?

“Interesting” results … butLOTS of unknowns!

4

Thank you, dear Eric, for your interest in our R&D. We use PTNs as a graphical language of concurrent programming, developing virtual machines for MCU/GPU to run our applications in view of dedicated hardware (chips) implementation. Sleptsov and Salwicki amendments to Petri nets make them fast and massively parallel. Recently, as I proved that System V semaphores = inhibitor Petri nets, we compile our nets into data for a program called pvzm.c that uses semop() and fork() to run the net as Linux application. Today, to supplement our toolchains, I uploaded benchmark nets and applications to my GitHub. You can run them on your Linux. Thus, at present we benchmark a single computer though we can extend our benchmarks on clusters. The point is System V semaphores can be implemented much more efficiently both in the kernel and partially as runtime, like futexes. They resolve many deadlocks, and we can think of deadlock recognition on the fly, say with a system application started by a kernel. We also offered extended greedy semaphores, which have lots of advantages, and we think of the corresponding system call semopk() implementation. We worked in the kernel before implementing our novel stack of protocols e6 - https://www.ietf.org/archive/id/draft-zaitsev-e6-network-00.txt With warm regards, Dima, https://dimazaitsev.github.io/

5

Thank you for your kind welcome, dear Hydn – Dima

6

Still trying to wrap my head around what is done.

Are you creating this “scheduler” as a captive, single-task component sheduler, albeit massive, complex and parallel task?

7

It is an abstract machine based on a place-transition net. A place is represented by a semaphore. Transition is a child process created by fork that keeps repeating a single semop() system call, which sops correspond to the transition incident arcs. Thus, we use a System V array of semaphores as a computer because the system is Turing complete. Our benchmarks’ idea is using System V semaphores as a computer. And we can make this computer much faster – Dima

8

I’ve bookmarked this to revisit periodically.

I used semaphores at the “mini” scale, to avoid conflicts between 3 programs … way back!

I think I am good at abstract, but I’m afraid that this concept is too abstract for me, so I will leave it to others to give their interpretation regarding the intended “market” for this concept.

No intent to cast aspersions, but I’m out of my league here!

:face_with_spiral_eyes:

9

Thank you for starting the discussion, dear Eric. Without detail, it is rather simple - System V semaphores with wait-for-all semantics and 3 operations, P, V, and Z, is a computer, like a Turing machine. We can compile any program into it and compute using the Linux kernel as a computer. It inflicts a peak workload on the kernel to evaluate its reliability and performance. The market is similar to those for meters—we measure which OS, kernel, or kernel module is better. I mean for using it as a benchmark while we can do much more but it is another story :slight_smile: – Dima

10

I beg your pardon, I can not just take these numbers for granted because of the single fact that top 500 supercomputers in the world are fuelled by Linux

11

I invite you to check it yourself, dear Eugene. I uploaded benchmarks to my github. You can compare times for computing in Linux and on our VM application. Mind the current teaser contains 2 benchmarks; the biggest runs about 2 hours on my Linux machine. We plan to upload all generators of models and complete toolchains after publishing our next research paper. – Dima P.S. you will be tired of waiting for completion of the biggest benchmark, which is a good motivation to rewrite at least SVS efficiently; we dream of catching deadlocks and optimizing kernel work on-the-fly in the future.

12

@Dima
These are truly compelling results. It would be fascinating to explore whether Petri Nets could help identify and address inefficiencies in Linux’s parallelization mechanisms at a deeper level.

13

Welcome to the community @roman35,nice to meet you! :+1:

14

Thank you @Bombilla . Nice to meet you too!

15

Thank you, Roman. The test reveals lack of parallelization in SVS implementation. A single kernel thread processes sequentially sops items using 2 passages: (1) check and (2) apply or block. Then, sops of blocked semop() are checked and possibly resumed. No magic, just lookup, two nested levels—square time complexity with minor heuristics. I invite kernel programmers to the UoD#1 university of the year in the UK PhD program to address the issue and implement greedy semaphores. – Dima