A Completely Innocuous Blog Post about vPMU in QEMU - VulpineCitrus

Hey!

If like me, you spend a lot of time debugging the Linux kernel, especially with the goal of modifying the performance of user programs, you may often find yourself in the silly position where you need to run perf inside of a virtual machine (typically run using qemu) in order to obtain hardware performance counters.

In x86(_64) CPUs, the component responsible for providing you these pieces of information is called the Performance Monitoring Unit, or PMU. Your kernel, when it boots, figures out what brand and model your CPU is, and which PMU driver will be responsible for handling initialization and gathering of information.

In a virtualized environment, however, the hypervisor is responsible for providing a PMU to the guest! In QEMU, this is often done by enabling the KVM accelerator, and copying the host topology, as such:

-enable-kvm -cpu host

And with that, you can have a guest CPU with features such as:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   39 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          1
On-line CPU(s) list:             0
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
CPU family:                      6
Model:                           78
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       1
Stepping:                        3
BogoMIPS:                        5615.99
Flags:                           fpu vme de [...] arch_perfmon [...]
Virtualization:                  VT-x
Hypervisor vendor:               KVM
Virtualization type:             full
[...]

Which clearly shows arch_perfmon (the name of the x86 feature of hardware performance monitoring in Intel CPU cores, especially in the p6 family starting after Yonah cores) enabled!

On AMD, the feature has a different name (perfctr_core), and will be handled by the kvm-amd Linux kernel module.

And, yeah! Hope that helped.

…

still here?

…

good.

Look, i would not know what a vPMU is, what p6-family Intel cores have hardware-backed PMUs, nor how KVM plays a role in exposing those to the guest, if i had managed to make it all work as easily as all online documentation seems to imply.

This is a story about debugging, and bad development practices, and pain.

Mostly pain.

This is a blog post about how a single if statement guarding a code block with no warning log, made me lose a whole day of work investigating a potential bug.

This is a Blog Post About What Happens When You Fall Through the Cracks of People's Assumptions

## Part 1: This is the Part Where i Still Sound Somewhat Sane

The reason you don’t read my yapping about Linux as much these days is that most of the Linux-touching i do is actually for work. On a rare occasion, i happen upon a tangential problem posed, for example, by a benchmark setup problem. Even more rarely, that problem becomes a time sink hole that engulfs an entire day of my own time, and brings to light an interesting problem that will almost definitely trip other people in the future.

And so, i feel compelled to write about it.

The story of today brings us to QEMU, KVM, and early CPU feature recognition.

For complicated reasons, I needed to sample some hardware performance counters from within a QEMU instance running my own build of the Linux kernel. My first thought went along the lines of “oh well, i might need a perf binary compiled for my custom kernel, let me do that”, and i happily tweaked build scripts to install the binary.

Running it, i got:

~ # perf stat -d echo
event syntax error: 'cpu_core/TOPDOWN.SLOTS,metric-id=cpu_core!3TOPDOWN.SLOTS!3/,cpu_core/topdown-retiring,metric-id=cpu_core!3topdown!1retiring!3/,cpu_atom/TOPDOWN_RETIRING.ALL,metric-id=cpu_atom!3TOPDOWN_RETIRING.ALL!3/,cpu_core/topdown-bad-spec,metric-id=cpu_core!3t..'
                     \___ Bad event or PMU

Unable to find PMU or event on a PMU of 'cpu_core'


 Performance counter stats for 'echo':

              0.45 msec task-clock               #    0.215 CPUs utilized
                 1      context-switches         #    2.231 K/sec
                 0      cpu-migrations           #    0.000 /sec
                60      page-faults              #  133.875 K/sec
   <not supported>      cycles
   <not supported>      instructions
   <not supported>      branches
   <not supported>      branch-misses
   <not supported>      L1-dcache-loads
   <not supported>      L1-dcache-load-misses
   <not supported>      LLC-loads
   <not supported>      LLC-load-misses

       0.002081276 seconds time elapsed

       0.000392000 seconds user
       0.000392000 seconds sys

That’s odd. Okay, maybe hardware counters from within a VM is a bit weirder.

i then spend about five minutes searching online a combination of the words qemu, perf, and hardware counters. Eventually, the answers i find in various places tell me that the feature i am looking for is called “Virtualized Performance Monitoring Unit”, or “Virtualized PMU”, or “vPMU”. They also tell me the same information i relayed at the top of the post:

Enabling vPMU in QEMU is done by enabling KVM and copying the host CPU information: -enable-kvm -cpu host .

So i tried.

And it didn’t work. It. Didn’t. Work. Nothing changed. What was going on? Why was i falling into a case that was documented nowhere? Oh no, it’s happening again isn’t it, oh NO, NOT AGAI-

## Part 2: My Sanity Slowly Drains as I Skim DMesg and Kernel Code

In order to start debugging, i had to figure out what was happening in the kernel. For those who don’t know, perf interacts with an entire subsystem in the Linux kernel (the perf subsystem) to retrieve hardware events. Thus, if there was a bit of software that should know about the problems with the PMU, it should be the kernel.

Running dmesg | grep -i perf i got the following:

[    0.133409] Performance Events: unsupported p6 CPU model 183 no PMU driver, software events only.

Oh, wait. So my CPU is not supported? Why? My kernel is a bit old (6.12), and my CPU is, as you can guess, a Raptorlake (13th gen Intel, heterogeneous architecture with Raptorcove performance cores, and Enhanced Gracemont efficiency cores). Maybe there was a driver missing which was not in older Linux versions? That’s what i thought, until i also ran a Linux 6.18 build and obtained the same result.

On Linux v6.12, i hit this line (arch/x86/events/intel/p6.c#L272) in the following code:

__init int p6_pmu_init(void)
{
    x86_pmu = p6_pmu;

    switch (boot_cpu_data.x86_model) {
    case  1: /* Pentium Pro */
        x86_add_quirk(p6_pmu_rdpmc_quirk);
        break;

    case  3: /* Pentium II - Klamath */
    case  5: /* Pentium II - Deschutes */
    case  6: /* Pentium II - Mendocino */
        break;

    case  7: /* Pentium III - Katmai */
    case  8: /* Pentium III - Coppermine */
    case 10: /* Pentium III Xeon */
    case 11: /* Pentium III - Tualatin */
        break;

    case  9: /* Pentium M - Banias */
    case 13: /* Pentium M - Dothan */
        break;

    default:
        pr_cont("unsupported p6 CPU model %d ", boot_cpu_data.x86_model);
        return -ENODEV;
    }

    memcpy(hw_cache_event_ids, p6_hw_cache_event_ids,
        sizeof(hw_cache_event_ids));

    return 0;
}

That code initially had me guess that something about models not listed here made them incompatible with vPMU. Maybe only older versions of p6 CPUs could have virtualized PMUs? No.

Looking at v6.18, that function changed dramatically:

__init int p6_pmu_init(void)
{
    x86_pmu = p6_pmu;

    if (boot_cpu_data.x86_vfm == INTEL_PENTIUM_PRO)
        x86_add_quirk(p6_pmu_rdpmc_quirk);

    memcpy(hw_cache_event_ids, p6_hw_cache_event_ids,
        sizeof(hw_cache_event_ids));

    return 0;
}

And the new line i hit is located in the v6.18 version of intel_pmu_init:

__init int intel_pmu_init(void)
{
    // ...

    /* Architectural Perfmon was introduced starting with Core "Yonah" */
    if (!cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
        switch (boot_cpu_data.x86) {
        case  6:
            if (boot_cpu_data.x86_vfm < INTEL_CORE_YONAH)
                return p6_pmu_init();
            break;
        case 11:
            return knc_pmu_init();
        case 15:
            return p4_pmu_init();
        }

        pr_cont("unsupported CPU family %d model %d ",
            boot_cpu_data.x86, boot_cpu_data.x86_model);
        return -ENODEV;
    }
    // ...

So, “Architecture Perfmon was introduced starting with Core Yonah”, huh? This bit of code seems to be responsible for initializing support for the PMU on an Intel CPU at boot. i skipped the variable declarations, because the first bit of code in that function is a check for CPU capabilities on the CPU the kernel booted on. Specifically, we’re looking for an x86 feature called arch_perfmon. If that feature is absent, we then call the p6_pmu_init function that previously spewed the error log. The new organization of these functions clears up the earlier confusion: p6_pmu_init is only run for pre-Yonah p6 family cores in order to try and initialize the feature if it wasn’t already available. Failing that (and it will! Even if you comment out the core version check), we run into our pr_cont right there above, and return -ENODEV. Oops.

If you’re on an Intel CPU and run lscpu, you may find arch_perfmon in the list of Flags. This signals that you have architectural support for your PMU. If you’re on an AMD CPU, you will have something called perfctr_core.

So, wait, hold on, where is that X86_FEATURE_ARCH_PERFMON capability even enabled during boot? So, even at boot, the CPU is indicating it can’t support (v)PMU? Surely not. Surely that’s a mistake. Right?

There’s three places that use the X86_FEATURE_ARCH_PERFMON and that, at first glance, look relevant to this current situation:

arch/x86/events/intel/core.c, where it’s checked to potentially run a *_pmu_init function.
arch/x86/kernel/cpu/intel.c
arch/x86/kvm/cpuid.c

Now, if you do not know what cpuid is, don’t worry. We’ll come back to it. All you need to know, and what i knew at the time, is that it’s a CPU instruction used to probe features in, and identify CPUs. Let’s investigate the other file first, okay?

So, what’s going on in intel.c? Well,

static void init_intel(struct cpuinfo_x86 *c)
{
    early_init_intel(c);

    intel_workarounds(c);

    init_intel_cacheinfo(c);

    if (c->cpuid_level > 9) {
        unsigned eax = cpuid_eax(10);
        /* Check for version and the number of counters */
        if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
            set_cpu_cap(c, X86_FEATURE_ARCH_PERFMON);
    }

    // ...

Okay. Deep breaths. We’re gonna have to do it. We’re gonna have to-

## Part 3: We Have To Talk About CPUID

*inhales* (i’ve always wanted to do this)

The Intel(R) 64 and IA-32 Architecture Software Developer Manual, Volume 1, Chapter 21, explains the processor identification and feature determination instruction, called cpuid.

CPUID is complex to explain, but, essentially, you interact with it by setting CPU register eax to a value, and then you observe the output values in eax, ebx, ecx and potentially edx. For the purpose of probing PMU features, we set eax=10, as shown above with cpuid_eax(10). That function is not named cpuid_eax because it sets eax to 10 by the way, but because it returns only the value of the eax register after running cpuid.

The notation used within the manual to denote “call to CPUID with EAX=X” is CPUID.XH. You can then write the different values in the output registers as CPUID.XH:REGISTER.FIELD_NAME, or with a binary range after the register name. We’ll be using this notation here too, for coherence with the Intel Software Developer Manual.

You’re still reading a blog post about QEMU and perf support by the way.

Table 21-30 of the manual presents the list of fields for eax and ebx for CPUID.0AH. Not all fields really matter here, we are interested in CPUID.0AH:EAX[7:0] (aka VERSION, aka eax & 0xff), and CPUID.0AH:EAX[15:8]¹ (aka NUM_GP_CTRS, aka ((eax >> 8) & 0xff)), the number of general-purpose hardware counters per core. The manual says “This leaf is valid if CPUID.OAH:EAX[7:0] > 0 and MAX_LEAF ≥ 09H”. Those map exactly with the code above.

So, yeah, that code checks if we have a PMU. Great! Cool. Wait. So, if X86_FEATURE_ARCH_PERMON is not set, it means that CPUID.0AH:EAX.VERSION is null, or there’s no general-purpose hardware counters, or the maximum CPUID leaf is 9 or below. I actually know that it’s the first two, because i then modified the call as follows:

if (c->cpuid_level > 9) {
    unsigned eax = cpuid_eax(10);
    /* Check for version and the number of counters */
    if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
        set_cpu_cap(c, X86_FEATURE_ARCH_PERFMON);
    else {
        unsigned ebx = cpuid_ebx(10);
        unsigned ecx = cpuid_ecx(10);
        unsigned edx = cpuid_edx(10);

        pr_warn("THERE'S NO PMU AAAAH EAX=%x, EBX=%x ECX=%x EDX=%x",
            eax, ebx, ecx, edx);
    }
}

And what i got as a result was

THERE'S NO PMU AAAAH EAX=0, EBX=0, ECX=0, EDX=0

Fuck.

Ok, quick thinking. Who’s making those responses? Who’s the mastermind behind it all? That’s right: The Hypervisor.

## Part 4: KVM is a Little Liar

We get to bring in KVM now. Remember the third place where X86_FEATURE_ARCH_PERFMON was mentioned in that list above? The code is as follows:

case 0xa: { /* Architectural Performance Monitoring */
    union cpuid10_eax eax;
    union cpuid10_edx edx;

    if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
        entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
        break;
    }

    eax.split.version_id = kvm_pmu_cap.version;
    eax.split.num_counters = kvm_pmu_cap.num_counters_gp;
    eax.split.bit_width = kvm_pmu_cap.bit_width_gp;
    eax.split.mask_length = kvm_pmu_cap.events_mask_len;
    edx.split.num_counters_fixed = kvm_pmu_cap.num_counters_fixed;
    edx.split.bit_width_fixed = kvm_pmu_cap.bit_width_fixed;

    if (kvm_pmu_cap.version)
        edx.split.anythread_deprecated = 1;
    edx.split.reserved1 = 0;
    edx.split.reserved2 = 0;

    entry->eax = eax.full;
    entry->ebx = kvm_pmu_cap.events_mask;
    entry->ecx = 0;
    entry->edx = edx.full;
    break;
}

So this is a software implementation of CPUID.0AH! Neat. This is in a function called __do_cpuid_func in kvm/cpuid.c. If you follow the chain of calls all the way back to /virt/kvm/kvm_main.c (__do_cpuid_func ← do_cpuid_func ← get_cpuid_func ← kvm_dev_ioctl_get_cpuid ← kvm_arch_dev_ioctl ← kvm_dev_ioctl), you will find the definition of the ioctl handler on the device that becomes /dev/kvm. I hope you already know what an ioctl is. You wouldn’t be so deep into this post otherwise, probably.

That tracks with my knowledge of how virtualization works in KVM: some operations are really performed by catching an instruction/interrupt/whatever from the VM context, getting out of the VM context, and poking the hypervisor to query the result that should be given. Here, we’re just doing it with CPUID.0AH.

And so, that implementation of CPUID.0AH, how does it synthesize the answer? Well, as you can see, it checks for enable_pmu and X86_FEATURE_ARCH_PERFMON, and if either are false/disabled, it just sets everything to 0 and returns; otherwise, it shoves the information we want inside eax, ebx and edx.

i know that all of my registers were 0 when i printed them earlier, and kvm_pmu_cap.version must be above 0, because otherwise the host would have no PMU, right? But my host has a PMU? i checked at boot:

[    0.116715] Performance Events: XSAVE Architectural LBR, PEBS fmt4+-baseline,  AnyThread deprecated, Alderlake Hybrid events, 32-deep LBR, full-width counters, Intel PMU driver.
[    0.116715] core: cpu_core PMU driver:
[    0.116715] ... version:                   5
[    0.116715] ... bit width:                 48
[    0.116715] ... generic counters:          8
[    0.116715] ... generic bitmap:            00000000000000ff
[    0.116715] ... fixed-purpose counters:    4
[    0.116715] ... fixed-purpose bitmap:      000000000000000f
[    0.116715] ... value mask:                0000ffffffffffff
[    0.116715] ... max period:                00007fffffffffff
[    0.116715] ... global_ctrl mask:          0001000f000000ff

Yeah, version is 5! So something’s off. Either enable_pmu is false, or static_cpu_has fails to find X86_FEATURE_ARCH_PERFMON.

enable_pmu is defined at arch/x86/kvm/x86.c#L185 in v6.18. It is a parameter for the kvm kernel module that defaults to true. i couldn’t check the actual value, but, considering i had never touched the default settings on kvm, i assumed it was true? It couldn’t be otherwise. But, X86_FEATURE_ARCH_PERFMON is also enabled, right? But then that’s impossible. i mean, the if block should not trigger otherwise, so, like, what’s going on?

What’s going on??

## Part 5: The Meltdown

So, to summarize what we know so far:

My host machine boots.
The init_intel function runs, detects maximum leaf count above 9, and a CPUID.0AH:EAX.VERSION with more than 1 general-purpose counter per core, so it enables the X86_FEATURE_ARCH_PERFMON capability on the boot CPU.
Later, intel_pmu_init on the host does not trip on the check for that capability, and the PMU is properly initialized.
The virtual machine boots.
In init_intel, it calls cpuid_eax(10)
On the host, an ioctl is fired to /dev/kvm to synthesize the results of CPUID.0AH
On the host, kvm’s __do_cpuid_func trips on the if to check whether the PMU is virtualized, and, after setting all registers to 0, returns.
The guest machine receives the answer to cpuid_eax(10), and does not enable X86_FEATURE_ARCH_PERFMON on the boot CPU
The guest machine runs intel_pmu_init, which trips on the if block to check for the arch_perfmon capability.
The family of the CPU being p6, the guest kernel tries to run p6_pmu_init.
On Linux 6.12, the error “Unsupported CPU” happens directly inside p6_pmu_init, which returns -ENODEV, which will fail upwards to callers and fail the PMU initialization as a whole, leaving no access to hardware counters for perf.
On Linux 6.18, the error “Unsupported CPU” happens after we check for the model of the CPU, and realize it’s younger than/equal to a Yonah core, so the p6_pmu_init call is useless, and we fail right now, leaving no hardware counter available to perf.

At this point, i tried to fake the calls to cpuid_eax by hard-coding values i expected, and it did not work. i tried intense googling of more and more obscure keywords, and it did not work. i tried forcing the enable_pmu parameter on kvm and reloading kvm_intel too while we’re at it. it didn’t work.

During my googling, i had found a couple of KVM patches from folks trying to introduce changes to the way KVM handles Hybrid architectures, sometimes outright disabling it. Remember how my computer has a 13th gen Intel CPU? Most consumer-grade Intel CPUs starting with 12th gen are heterogeneous², that is to say that even though the CPU appears as one coherent unit, the actual cores inside behave differently: some are optimized for fast tasks but consume more energy, and others are optimized for tasks that don’t need to be fast, but can comfortably run slower or sparsely if that means reducing the power draw.

One patch i found discussed issues with KVM and heterogeneous architectures: unless you properly pin the virtualizing processes on the host, the guest could receive incoherent messages about hardware counters on its CPUs. Until KVM had a better way of reasoning between heterogeneous vCPUs and real heterogeneous CPUs ³, they argued, virtualized PMU should be disabled.

That patch seemed to make a point, but, checking for its name in the git log of my version of the kernel, there’s no trace of it.

Out of desperation, i end up trying to grep the individual words, KVM, PMU, and i skim the output, until i find:

4d7404e5ee00 KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)

OH COME ON-

## Part 6: This is Where i Complain. A Lot.

Let’s inspect commit 4d7404e5ee00:

commit 4d7404e5ee0066e9a9e8268675de8a273b568b08
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Feb 8 20:42:29 2023 +0000

    KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)

    Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
    KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
    gains a mechanism to let userspace opt-in to the dangers of exposing a
    hybrid vPMU to KVM guests.  Virtualizing a hybrid PMU, or at least part of
    a hybrid PMU, is possible, but it requires careful, deliberate
    configuration from userspace.

    E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
    prevent migrating a vCPU between a big core and a little core, userspace
    must enumerate a reasonable topology to the guest, and guest CPUID must be
    curated per vCPU to enumerate accurate vPMU capabilities.

    The last point is especially problematic, as KVM doesn't control which
    pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
    i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.

    Alternatively, userspace could enable vPMU support by enumerating the
    set of features that are common and coherent across all cores, e.g. by
    filtering PMU events and restricting guest capabilities.  But again, that
    requires userspace to take action far beyond reflecting KVM's supported
    feature set into the guest.

    For now, simply disable vPMU support on hybrid CPUs to avoid inducing
    seemingly random #GPs in guests, and punt support for hybrid CPUs to a
    future enabling effort.

And so on and so on. I’m sparing you the Signed-off-by’s and the Cc’s and everything.

The code of the commit is:

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index cdb91009701d..ee67ba625094 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -165,15 +165,27 @@ static inline void kvm_init_pmu_capability(void)
 {
        bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;

-       perf_get_x86_pmu_capability(&kvm_pmu_cap);
-
-        /*
-         * For Intel, only support guest architectural pmu
-         * on a host with architectural pmu.
-         */
-       if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp)
+       /*
+        * Hybrid PMUs don't play nice with virtualization without careful
+        * configuration by userspace, and KVM's APIs for reporting supported
+        * vPMU features do not account for hybrid PMUs.  Disable vPMU support
+        * for hybrid PMUs until KVM gains a way to let userspace opt-in.
+        */
+       if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
                enable_pmu = false;

+       if (enable_pmu) {
+               perf_get_x86_pmu_capability(&kvm_pmu_cap);
+
+               /*
+                * For Intel, only support guest architectural pmu
+                * on a host with architectural pmu.
+                */
+               if ((is_intel && !kvm_pmu_cap.version) ||
+                   !kvm_pmu_cap.num_counters_gp)
+                       enable_pmu = false;
+       }
+
        if (!enable_pmu) {
                memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
                return;

It replaced a check for kvm_pmu_cap.version and num_counters_gp (akin to what you do with CPUID.0AH) by a cute little if block that checks X86_FEATURE_HYBRID_CPU, and disables vPMU behind your back if that’s enabled.

Without warning.

In 6.18, there is still no warning there. They only added a memset to zero-out the kvm_host_pmu structure.

In short, my options are few:

If i feel like tweaking the host’s kernel, i could comment the line that disables enable_pmu, and try to work with that (for multiple reasons, it’s impractical as hell).
Move my experiments to another machine.

So, let’s recap together what went wrong:

When perf showed unsupported hardware counters, i searched documentation on how to enable vPMU, which had no visible date nor indications of assumptions regarding architecture (in all fairness, a retroactive decision like that to disable vPMU for all heterogeneous x86 CPUs is not something people track to then go update blog posts).
The error message in dmesg was originally in a function with a confusing semantic, that was fixed some time between 6.14 and 6.15 when it moved from p6_pmu_init to intel_pmu_init.
The error message talks about an unsupported CPU. It assumes the CPU not having a PMU exposed via CPUID.0AH is a problem with the model, so it exposes information related to your CPU model, even when that CPU model should, by all other indicators, have a PMU.
When i found patch sets on the Linux Kernel Mailing List that discussed vPMU in KVM, i had no way to track whether they had been merged or not. In fact, i ended up finding an earlier draft of the commit that bit my ass, but because it had slightly different commit name formatting, i gave up and assumed it had not been merged.
There was no error message about vPMU being forcefully disabled in KVM. This decision has several assumptions: the user does not need to know KVM is running, or they do not know what a hybrid x86 architecture is, or they will not know why it is disabled, and they will complain because they can’t fix it. It still baffles me that nobody thought to at least print something,

Look, i am used to this. i am used to falling through the cracks of people’s assumptions: i daily-drive Arch Linux, i run shit on QEMU via the command line only, i build my own kernel modules sometimes, i need old software, i jam programs together that were never meant to work together. There’s glue though, there’s things we, as people who touch computers have agreed on in order for communication to work properly across implementations. CPUID is one such example, and so are ioctls, kernel module parameters, network protocols, perf sample representations, or even logging messages.

Logging is the most person-oriented way of conveying that something went wrong, or any complex information that cannot be shoved in a bitfield for easy access: when i read the message about “Unsupported CPU”, that was a programmer, 14 years ago, telling me “Hey, if you hit this line of code, it means you plugged a CPU into your motherboard that does not have a PMU yet!”. For a second, someone’s thought process was conveyed to me, alongside their assumptions about the meaning of a Intel CPU reporting no arch_perfmon capability.

Look, i’m not blaming the Linux devs. Entirely. There may be reasonable arguments for why they did not add a log line in that if block, or why nobody thought about adding that at any point in reviewing the sets. Maybe they had assumptions about the users, about the systems, that they did not communicate.

i’m more mad about how we keep miscommunicating at each other, especially about assumptions we make about hardware, software, their interactions, and how i keep losing my time and my mind debugging the friction between all of them.

It’s tiring, frankly.

…

Oh and fuck Intel, really while we’re at it. That’s all. Bye!

Notice how the Intel Software Developer manual does bit ranges backwards? Ah, little endian architectures… ↩
To be more precise, mainstream desktop 12th gen Alderlake CPUs are still homogeneous. Thirteenth gen, aka Raptor Lake, are heterogeneous except for i3s, and similarly for the 14th gen. ↩
So, i’m not a KVM fox. i’m not even that much of an architecture fox, i just know people, and i hate hardware with a passion so intense that occasionally i end up learning way too much about it. My assumptions about the more precise problems with KVM and heterogeneous architectures, from what i understood, is that the available performance counters exposed on the host are those of the performance CPUs, which are typically more advanced. Those are also the sets exposed to the guest. However, if the guest is scheduled on an efficiency CPU, and tries to probe a counter that is only available on a performance CPU, it will fail, or there will be undefined behaviour that has never been accounted for anywhere so far in the kernel or perf. Another assumption is that exposing which CPU you’re currently scheduled on to the guest is… a gamble. Knowing your position in a topology, especially if you’re a malicious actor, especially if you’re scheduled with other users of a system, opens up a lot of possibilities to probe your cache neighbours, or even mess with them. ↩