Measuring Input-to-Photon Latency (Because ‘Wayland Feels Off’ Isn’t a Metric)

21 min read Original article ↗
  1. Home/
  2. Articles/
  3. Measuring Input-to-Photon Latency (Because ‘Wayland Feels Off’ Isn’t a Metric)/

··20 mins

Am I Wayland Yet? #

I’ll admit, I’m a bit late to the party. While Wayland adoption was gaining traction and getting shipped by more and more distros, I was still relying on X11. By choice. I had tried out logging into a Wayland session a few years ago, but lots of my day-to-day use cases broke as I was running an NVIDIA card and that was known to be flaky with Wayland for a long time.

On a recent homelab housekeeping session I decided to distrohop and went for Fedora KDE. A bit nervous as I remembered my first experiences with Wayland, but whatever, plasma-workspace-x11 is only a few keystrokes away. A few minutes later, I was greeted by a fully functioning desktop on Wayland. I guess I was being too cautious as we’ve been Mostly Wayland for a pretty long time now.

Since I’m into competitive shooters and my main game even comes with a native Linux build, I recently moved most of my gaming activity onto Linux. As the years stack up, my reaction time certainly worsens by a few milliseconds each year, so I try to keep up with the kids with a fairly advantageous gaming environment using a 360 Hz display and matching that with stable FPS above the refresh rate. This isn’t a brag. I’d much rather go back to the heights of my gaming career when I was playing better on a much slower setup.

Of course I wouldn’t expect any perceptible system latency from a modern computer and OS. But still, something felt off when I first spawned on de_dust2. It felt like slight input lag, even though frametime was low without spikes. Back to installing X11 to make sure it’s not placebo. And of course, the game immediately felt as responsive as I was used to, either on Arch/X11 or Windows.

I knew I had some debugging and tweaking to do, but I didn’t want to do so without actual numbers to back up the subjective input lag.

One thing I tried was taking high-speed video with my phone and comparing the delay from click to screen change, but the frame rate was too coarse. I needed something repeatable. So I decided to build a device. Oh boy, what an amazing rabbit hole that got me in.

In this article I want to document the whole process, including every roadblock and every design decision in the build process. We’re building a device that allows us to measure system latency to compare different display server protocols and compositor configurations. Spoiler: I was too paranoid about Wayland.

Take this as lab notes with some prose. Scroll down if you’re just here for latency results comparing X11, Wayland and Windows.

Building a Latency Meter #

When talking about latency and trying to measure it, we first need to think about what kind of latency we care about and what is practicable to measure. Getting things to happen on the screen is a complicated endeavor with a lot of steps along the whole pipeline of clicking a mouse to something happening on screen.

block-beta columns 5 A space B space C space space space space space F space E space D A["USB
HID"] --> B["CPU
(OS, Game)"] B --> C["Render
Queue"] C --> D["Composite"] D --> E["Scanout"] E --> F["Display"]

Note: this is a vast oversimplification.

Measuring the actual compositor latency is not possible for me. So let’s assume for a second that we’re grug:

grug move mouse, grug see action. but feels slow. is grug okay? or is pc not okay?

So that’s what we’re going to measure: the time from USB HID event to a detectable screen luminance change at the sensor. Of course, we’re not explicitly measuring Wayland’s suspected latency here, but every other parameter being the same, it provides a repeatable basis for comparison.

Microcontroller #

How would we time a mouse input to measure latency? That’s easy, our latency meter becomes a mouse and sends the input. There are countless dev boards available that allow for HID interaction, luckily I had a few Arduino Pro Micros lying around. Those are perfect for a project like this. With the ATmega32U4 at 16 MHz it delivers enough speed for millisecond range measurements and comes with USB HID support. Easy to bootstrap and they come with analog/digital interfaces for sensors. There is a ready-made library for emulating a mouse and sending inputs via the Arduino USB to the host. I tested the polling rate to be at 1000Hz via evhz, so we have a potential peripheral input latency of 1ms which is acceptable and on par with modern mice.

Sensor #

We need a sensor that detects changes in the amount of photons emitted from an area of the screen. This is not the Large Hadron Collider, we’re talking about simple light sensors and these cost pennies. I grabbed a dusty photoresistor breakout from an old Arduino DIY kit I had lying around.

Photoresistor (LDR) sitting on a breadboard

First Sketch #

We hook the LDR to analog pin A3 and put the initial configuration constants to the main.h file. This keeps pin assignments and tuning parameters (thresholds, cycle count, delays) in one place, so iterating on the setup doesn’t require hunting through the logic later.

To make the tool easier to use, the measurement run shouldn’t start immediately on boot. Instead, we wait until the user presses a button. I ran out of through-hole push buttons, so I used a Cherry MX switch and wired it between pin 7 (interrupt-capable on ATmega32U4) and ground.

#pragma once

#include <Arduino.h>
#include "Mouse.h"
#include <math.h>
#include "display.h"

/// @brief Analog pin connected to signal of light sensor
static const uint8_t SENSOR_PIN = A3;
/// @brief Pin shorted to ground via button
static const uint8_t BUTTON_PIN = 7;
/// @brief Sensor threshold for registering a screen change event in 
/// 40 mV increments of the analog readout.
static const uint16_t BRIGHTNESS_THRESHOLD = 10;
/// @brief Number of measurements before calculating summary
static const uint8_t NUM_CYCLES = 20;

/// @brief Delay in ms between measurements, keep this at a number
/// not divisible by the screen refresh rate to avoid syncing to 
/// redraws. 
static const uint16_t MEASUREMENT_DELAY_MS = 287;
static const double MS_FACTOR = 1000.0;

With a simple Arduino sketch like this, a little global state is reasonable. Because we want the user to be able to abort a run mid-measurement and start over, we track two intents: “start a run” (when idle) and “restart” (when already running). An interrupt service routine (ISR) lets us capture the button press immediately, even while we’re busy polling the sensor.

In setup() we enable the internal pull-up on the button pin and attach an interrupt on the falling edge (i.e., when the button shorts the pin to ground). The ISR then sets either startRequested or restartRequested depending on whether we’re currently measuring. Adding a 200ms debounce avoids triggering the interrupt multiple times.

#include "main.h"

uint32_t latencies_us[NUM_CYCLES] = {0};
uint8_t cycle_index = 0;

volatile boolean startRequested = false;
volatile boolean restartRequested = false;
volatile boolean running = false;

/// @brief Inits pins and serial
void setup() {
    Serial.begin(115200);
    pinMode(BUTTON_PIN, INPUT_PULLUP);
    attachInterrupt(digitalPinToInterrupt(BUTTON_PIN), isr, FALLING);
}

void isr() {
    static unsigned long last_interrupt_time = 0;
    unsigned long interrupt_time = millis();
    if (interrupt_time - last_interrupt_time > 200) {
        (running ? restartRequested : startRequested) = true;
    }
    last_interrupt_time = interrupt_time;
}

Next is the core of the sketch: measure(). Each cycle begins by taking a baseline brightness reading from the sensor. We then timestamp the moment we trigger the left mouse click.

After the click, we continuously sample the sensor until the brightness differs enough from the baseline to plausibly indicate a screen update. When that happens, we take a second timestamp and compute the delta in microseconds, which is our measured end-to-end latency for that cycle.

A change in the sensor readout of course does not guarantee a screen action, we need to counter noise, ambient light or PWM backlight. We define a threshold of what counts as a change. How high the threshold should be is highly dependent on the sensor used and other factors like lighting conditions or PWM speed. This is easy to find out by trial-and-error, printing out raw readings onto serial and measuring the expected screen change.

void measure() {
    // get reference brightness
    const uint16_t baseline = analogRead(SENSOR_PIN);
    running = true;

    // reset timer, click mouse
    const unsigned long start = micros();
    Mouse.press(MOUSE_LEFT);

    while (true) {
        if (restartRequested) {
            return;
        }

        const int delta = analogRead(SENSOR_PIN) - baseline;

        // loop until brightness delta is bigger than threshold
        if (abs(delta) > BRIGHTNESS_THRESHOLD) {
            // save and sum measured latency
            const unsigned long latency = micros() - start;
            Mouse.release();

            // store cycle in the array
            if (cycle_index < NUM_CYCLES) {
                latencies_us[cycle_index] = latency;
            }

            Serial.println(latency);

            delay(MEASUREMENT_DELAY_MS);

            cycle_index++;
            break;
        }
    }
}

Once we can collect a batch of latencies, we can summarize the results. The function below computes the mean and (sample) standard deviation from the latencies_us array and returns both values in milliseconds for easier reading.

void computeStatsMs(double *mean_ms, double *sd_ms) {
    double sum_us = 0.0;
    double sd_us = 0.0;
    double mean_us = 0.0;
    double variance_us = 0.0;

    if (NUM_CYCLES == 1) {
        *mean_ms = static_cast<double>(latencies_us[0]) / MS_FACTOR;
        *sd_ms = 0.0;
        return;
    }

    // calculate mean
    for (const unsigned long latency: latencies_us) {
        sum_us += static_cast<double>(latency);
    }
    mean_us = sum_us / static_cast<double>(NUM_CYCLES);

    // calculate sample standard deviation
    for (const unsigned long latency: latencies_us) {
        const double diff_us = static_cast<double>(latency) - mean_us;
        variance_us += diff_us * diff_us;
    }
    sd_us = sqrt(variance_us / (NUM_CYCLES - 1));

    *mean_ms = mean_us / MS_FACTOR;
    *sd_ms = sd_us / MS_FACTOR;
}

Finally, we tie everything together in the main loop(). When a start is requested, we initialize HID, run cycles until we’ve collected NUM_CYCLES samples, and compute statistics. After a reset of the state we wait for another measurement run.

void loop()
{
  if (startRequested)
  {
    startRequested = false;
    restartRequested = false;
    Mouse.begin();

    while (cycle_index < NUM_CYCLES)
    {
      measure();

      if (restartRequested)
      {
        Mouse.release();
        cycle_index = 0;
        startRequested = false;
        restartRequested = false;
        running = false;
        break;
      }
    }

    if (cycle_index == NUM_CYCLES)
    {
      double mean_ms = 0.0;
      double sd_ms = 0.0;
      computeStatsMs(&mean_ms, &sd_ms);

      Serial.println(mean_ms);
      Serial.println(sd_ms);

      cycle_index = 0;
      running = false;
      startRequested = false;
      restartRequested = false;
      Mouse.end();
    }
  }
}

Unstable Results #

There it was, a measurable difference between X11 and Wayland. At least sometimes? Latency fluctuates between each run. Looking at 10 runs with 20 measurements each, X11 was indeed more responsive, but looking at individual cycles, the difference was far from clear. Standard deviation was fairly high and on some runs, Wayland was measured to be faster even though I could definitely still feel the input lag. It wasn’t consistent and accurate enough to give me values I would be confident in publishing.

What follows is a deep dive into a few topics I was only vaguely familiar with, and I may digress a lot from just showing some difference in input latency here. There are many tutorials about building latency meters, ranging from very simple to extremely advanced. Many of them in a similar fashion of brightness sensor attached to the screen. But I saw potential of learning something new here, so that’s why we’re building something more plug-and-play than initially considered.

As for the inconsistent first results, one culprit I was suspecting was the intrinsic latency of the program code. We take note of the starting timestamp, then invoke action via HID, access some registers, using the ADC to get an integer out of the voltage reading. This all costs time and isn’t perfectly predictable. But that was easy to dismiss as minuscule. Capturing the duration of sending mouse clicks and invoking the ADC with micros() yielded an average of 140μs.

While still an order of magnitude away from our desired resolution, there is some room for improvement here: By default, the ATmega32U4 ADC runs with a prescaler of 128 and 13 clock cycles are required for the ADC:

$$ f_{ADC} = \frac{16 \space \mathrm{MHz}}{128} = 125 \space \mathrm{kHz}, \quad T = \frac{13}{125 \space \mathrm{KHz}} ≈ 104 \space \mu\mathrm{s} $$

The datasheet, p. 300 & 390 tells us the ADC precision may drop from 10 bit to 8 bit when ADC clock is set higher than 200kHz, but with a prescaler of 16 I was still getting results precise enough.

In setup(), by adding ADCSRA = (ADCSRA & 0xf8) | 0x04 we set the lower 3 bits of the ADC control register to 0b100, which sets the prescaler to 16:

$$ f_{ADC} = \frac{16 \space \mathrm{MHz}}{16} = 1 \space \mathrm{MHz}, \quad T = \frac{13}{1 \space \mathrm{MHz}} = 13 \space \mu\mathrm{s} $$

Almost a tenfold increase in ADC read speed and surely enough, the internal latency decreased to about 40 μs. I suppose some of that remaining latency stems from the USB HID routine, but that is enough premature optimization for now.

Next up, I put some research into the sensor itself.

Optimizing Sensor Choice #

I assumed a passive component like a photoresistor would not add any significant latency and the choice of sensor would not really matter. Turns out it does, by several orders of magnitude. Most widespread components are photoresistors, phototransistors and photodiodes. Each of those comes with their own advantages and caveats.

Photoresistor #

As mentioned above and as they are abundant, my first choice was a photoresistor. Those things are very simple, the more light you shine on them, the less resistive they get.

Hooking up an oscilloscope to the signal pin revealed why they are not a good fit for a device like this, the resistance takes quite a while to fall after a rise in brightness, rise time is even slower. A change from bright to dark only reached the peak after more than 20ms.

Oscilloscope showing rise time of a photoresistor
1000mV / 5ms photoresistor signal versus LED toggle

Although I don’t know the actual LDR component used on this breakout, a very widespread component that looks very similar to my unit would be the VT90N2, its datasheet shows a typical response time of 78 milliseconds. Now this does not necessarily mean that it would add the full rise time to the total latency, as our threshold is fairly tight and it would probably trigger before the floor was reached. But I wanted more precision and repeatability.

Photodiode #

Photodiode on a breadboard

Photodiodes are very similar to solar cells. They output a small current when exposed to light. That current can be used as a very fast and linear measurement of light intensity. A commonly used component for brightness reading of visible light is the Vishay BPW34. They offer a rise/fall time on the order of 100 ns in datasheet conditions.

Oscilloscope showing fall time of a photodiode
50mV / 100us photodiode signal versus LED toggle

Now that’s what I’m talking about. Time scale is set to 100us and we see a complete fall within three units. But as you can see, I had to adjust the voltage scale down to 50mV to be able to catch the change. This was just an LED, testing it on an actual screen I faced another challenge:

Oscilloscope showing screen noise from photodiode
50mV / 500us photodiode signal of noise

After reaching the top, we start too see fluctuations with 360 Hz. This matches my screen refresh rate exactly, so we’re picking up something from the screen hardware itself, rather than just the displayed action. This is too noisy for our static threshold, so to be able to actually use that diode for measuring it with the Arduino’s 0-5V ADC range, a transimpedance amplifier circuit is required.

Analog readout of a photodiode
Unusable amplitude on a screen change

Luckily, there’s another option which provides a great middle ground and is good enough as a trade-off between accuracy and sensitivity for this project:

Phototransistor #

Phototransistor on a breadboard

The next contender was a phototransistor, namely the Vishay TEMT6000. Phototransistors are basically photodiodes with built-in signal amplification. This amplification obviously reduces response time, but they are very convenient to use as there are no further active components necessary, so they’re just as easy to integrate as photoresistors.

Oscilloscope showing fall time of a phototransistor
100mV / 500us phototransistor signal versus LED toggle

On the scope, this already looked promising. Fall time is much shorter and steeper than the LDR and coming close to the diode. We also have some 360Hz noise, but in a way less pronounced fashion and within a 200mV scale, we can easily filter that noise with a simple static threshold. And sure enough, I was now getting much more consistent latency readouts.

OLED Screen #

I want this thing to be portable and not rely on serial connection. An SSD1306-based I²C OLED screen can be sourced for a few bucks and there’s an easy-to-use library from Adafruit available. We initialize the screen and add helper functions to print measurements.

static constexpr uint8_t SCREEN_WIDTH = 128;
static constexpr uint8_t SCREEN_HEIGHT = 64;
static constexpr int16_t LOWER_CURSOR_Y = SCREEN_HEIGHT / 2 - 4;

/// @brief RX LED PIN to show fault
static constexpr uint8_t RX_LED_PIN = 17;

void initScreen()
{
  if (!display.begin(SSD1306_SWITCHCAPVCC, 0x3C))
  {
    // Fatal error - blink RX LED
    pinMode(RX_LED_PIN, OUTPUT);
    while (true)
    {
      digitalWrite(RX_LED_PIN, HIGH);
      delay(200);
      digitalWrite(RX_LED_PIN, LOW);
      delay(200);
    }
  }

  display.clearDisplay();
  display.setTextSize(1);
  display.setTextColor(WHITE);

  display.print("m2p-latency");
  display.setCursor(0, LOWER_CURSOR_Y);
  display.setTextSize(1);
  display.print("Starting in 5 seconds...");
  display.display();
}

void drawMsValue(float ms)
{
  display.setTextSize(2);
  display.setCursor(0, LOWER_CURSOR_Y);

  // screen has space for 5 digits, so we set the decimal place 
  // accordingly
  int digits = log10(ms) + 1;
  display.print(ms, 5 - digits);

  display.print(" ms");
  display.display();
}

// ... some more screen helper functions to draw surrounding text 
// omitted for brevity

We can draw the values of the last measurement between cycles and a summary at the end. Make sure to keep display logic away from the actual measurement functions, as the display I²C calls would introduce additional delay as they are fairly slow.

SSD1306 screen on a breadboard

Designing a Case #

To avoid ambient light hitting the sensor to allow for tighter thresholds, I decided to design a simple case to house the components. Everything should work with a snap/press fit and be printable with minimal supports.

alt text

The TEMT6000 sensor and OLED screen feature convenient hole cutouts, subtracting 0.1 mm in diameter allows for a tight fit. The Arduino is held in place by some rails next to its footprint with a similar tolerance.

alt text

To make printing easier, the small housing is separate and will be pushed into the lid. To avoid scratching the screen with the bare plastic, we add fleece pads to the sensor side. The housing sticks out about 0.3 mm less than the fleece pads, so the sensor is very close to the screen and is not affected by ambient light.

alt text

On the screen face, I’ve added a sensor plane symbol right where the sensor sits on the other side, similar to what’s found on cameras.

alt text

Results #

Holding the sensor to the screen, at the muzzle of a gun in Counter-Strike, on every click it measured the latency between clicking and the actual muzzle flash getting rendered on the screen.

Test conditions:

CPUAMD Ryzen 9 9950X3D
GPUNVIDIA RTX 4090
MotherboardASRock B650E Taichi
OSFedora 43
DE/CompositorKDE/KWin 6.5.4
GPU DriverProprietary 580.119.02
ScreenDell AW2725DF OLED
Refresh Rate360 Hz
VSyncOff
VRROff
Allow TearingOn
Framerate400 (engine capped)

And there it was, a clear difference: roughly double the input latency, more than I anticipated. Finally, I had hard numbers to back up my perception. The culprit was found very quickly as I now had a quick way of verifying every change or tweak I made. Taking 100 measurements:

On my machine, the extra lag came from running the game under XWayland rather than native Wayland. XWayland is a translation layer that gets used for applications that might not yet be fully compatible with Wayland. At the time of writing, XWayland is still widely used in gaming. It’s the default way to run Steam games via Proton for now and even for native games like CS2. Valve once tried to flip the switch but quickly reverted as you can tell from these comments out the start script cs2.sh:

# There is Wayland support in SDL but a recent (7/30/2025) attempt at
# allowing SDL to default to Wayland caused a number of customer issues so
# keep the default at X11 for now. Don't override any user setting so
# people can easily use Wayland if they want.
if [ "$UNAME" == "Linux" ]; then
	if [ -z "$SDL_VIDEO_DRIVER" ]; then
		export SDL_VIDEO_DRIVER=x11
	fi
fi

Which leads me to the fix. You can force the Wayland backend using environment variables, either with SDL_VIDEO_DRIVER=wayland for CS2 or for Proton games, using a more bleeding edge version like ProtonGE with PROTON_ENABLE_WAYLAND=1.

Keep in mind, this is experimental. Wine only recently added native Wayland support. I have not encountered any issues except for the Steam overlay not working, so that’s what I am going to go forward with.

I’ll be honest here. Even with the inaccurate photoresistor that was merely dangling off a breadboard and taped to my screen, I could quantify the input lag, and that’s when I found out that I was wrongly blaming Wayland when it was XWayland. But since the fix was—at least for my case—this simple, I decided to use the leftover cognitive capacity to make a fairly well finished latency meter. If I’d always have to open a serial console and tape stuff to my screen, I’d probably be much more hesitant to do any measurements in the future.

Even though I still have plenty of ideas to improve the meter’s performance and usability, I consider the current revision polished enough to publish. Next, I want to explore additional ways to cut input lag like evaluating NVIDIA Reflex. I am more than happy with my current Wayland setup and ~7ms total system latency feels amazing, but that’s just my love for tinkering.

The complete sketch, schematic and case files can be found on my GitHub repo.

Bill Of Materials #

ComponentApprox. Price Per Unit
Arduino Pro Micro (Amazon | AliExpress)$$6 USD
TEMT6000 Phototransistor (Amazon | AliExpress)$$2 USD
SSD1306 OLED Screen (Amazon | AliExpress)$$3 USD
Cherry MX Switch (Amazon | AliExpress)$$0.21 USD
~30 grams of PLA (Amazon | AliExpress)$$0.35 USD
Total$11.56 USD

Limitations #

As mentioned above, this measures end-to-end latency, not compositor latency in isolation. You have to be careful to keep all other parameters the same. VSync or VRR settings must match, you should cap the framerate at a 100% stable number and you must always match the same exact action on screen. Only then you can get a fair comparison between both display servers/compositor systems.

Next steps #

Right now, the latency meter does its job fine, but at runtime it’s an unconfigurable black box… literally. While me and a helpful tester (thanks Eli ♥) have tried it out on different panel types, changing variables like screen brightness and the like and the basic absolute threshold method worked well enough, there are certainly more challenging environments to prove its stability. I’m thinking of technologies like backlight strobing and CRTs. And why stop at the mouse? Keyboard is right there in the library, ready to be used. I know some people are very sensitive to terminal emulator input lag, so that’s a usecase that surely wants to be explored. Also, I dislike the configuration being hard coded. I don’t want the end user to have to burn a new firmware file to the MCU just because they have to change the threshold or some other value by a few units.

So my TODOs for the upcoming weeks are as follows:

  • Make threshold and baseline configurable
  • Make input trigger configurable
  • Add menu to screen
  • Add buttons for navigation
  • Design PCB

But most importantly, this thing fits my use case. What about you? If you have any suggestions or ideas, please do not hesitate to contact me!

V2: Sneak Peek #

March 2026 Update:

PCB version of the latency meter
Second iteration of deltaprobe, the spiritual successor of m2p-latency

You might have noticed the completed TODO list from above, I’m currently designing the third PCB revision. A proper menu navigation was added and testing parameters like thresholds can be configured. Also, an extension port allows either using your own sensor or triggering an external component. Everything that can be triggered via 3.3V logic level or a short (like a button) can be measured, think of camera flashes or similar equipment.

Since this takes a lot of time and effort, continuing this essay is not the highest priority right now. As soon as the final revision is done, I’ll come along with a proper update. Pinky promise.

Disclaimer #

Some links in this post are marked with an dollar sign to indicate they are affiliate links. These links help support my blog through a small commission if you make a purchase, at no extra cost to you. Please note, this post reflects my personal opinions and experiences, and the inclusion of affiliate links does not influence my content in any way.