Hey!
If like me, you spend a lot of time debugging the Linux kernel, especially with
the goal of modifying the performance of user programs, you may often find
yourself in the silly position where you need to run perf inside of a virtual
machine (typically run using qemu) in order to obtain hardware performance
counters.
In x86(_64) CPUs, the component responsible for providing you these pieces of information is called the Performance Monitoring Unit, or PMU. Your kernel, when it boots, figures out what brand and model your CPU is, and which PMU driver will be responsible for handling initialization and gathering of information.
In a virtualized environment, however, the hypervisor is responsible for providing a PMU to the guest! In QEMU, this is often done by enabling the KVM accelerator, and copying the host topology, as such:
-enable-kvm -cpu host
And with that, you can have a guest CPU with features such as:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
CPU family: 6
Model: 78
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Stepping: 3
BogoMIPS: 5615.99
Flags: fpu vme de [...] arch_perfmon [...]
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
[...]
Which clearly shows arch_perfmon (the name of the x86 feature of hardware performance
monitoring in Intel CPU cores, especially in the p6 family starting after Yonah cores) enabled!
On AMD, the feature has a different name (perfctr_core), and will be handled by the kvm-amd
Linux kernel module.
And, yeah! Hope that helped.
…
still here?
…
good.
Look, i would not know what a vPMU is, what p6-family Intel cores have hardware-backed PMUs, nor how KVM plays a role in exposing those to the guest, if i had managed to make it all work as easily as all online documentation seems to imply.
This is a story about debugging, and bad development practices, and pain.
Mostly pain.
This is a blog post about how a single if statement guarding a code block
with no warning log, made me lose a whole day of work investigating a potential
bug.
This is a Blog Post About What Happens When You Fall Through the Cracks of People's Assumptions
## Part 1: This is the Part Where i Still Sound Somewhat Sane
The reason you don’t read my yapping about Linux as much these days is that most of the Linux-touching i do is actually for work. On a rare occasion, i happen upon a tangential problem posed, for example, by a benchmark setup problem. Even more rarely, that problem becomes a time sink hole that engulfs an entire day of my own time, and brings to light an interesting problem that will almost definitely trip other people in the future.
And so, i feel compelled to write about it.
The story of today brings us to QEMU, KVM, and early CPU feature recognition.
For complicated reasons, I needed to sample some hardware performance counters
from within a QEMU instance running my own build of the Linux kernel. My first
thought went along the lines of “oh well, i might need a perf binary compiled
for my custom kernel, let me do that”, and i happily tweaked build scripts to
install the binary.
Running it, i got:
~ # perf stat -d echo
event syntax error: 'cpu_core/TOPDOWN.SLOTS,metric-id=cpu_core!3TOPDOWN.SLOTS!3/,cpu_core/topdown-retiring,metric-id=cpu_core!3topdown!1retiring!3/,cpu_atom/TOPDOWN_RETIRING.ALL,metric-id=cpu_atom!3TOPDOWN_RETIRING.ALL!3/,cpu_core/topdown-bad-spec,metric-id=cpu_core!3t..'
\___ Bad event or PMU
Unable to find PMU or event on a PMU of 'cpu_core'
Performance counter stats for 'echo':
0.45 msec task-clock # 0.215 CPUs utilized
1 context-switches # 2.231 K/sec
0 cpu-migrations # 0.000 /sec
60 page-faults # 133.875 K/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
<not supported> L1-dcache-loads
<not supported> L1-dcache-load-misses
<not supported> LLC-loads
<not supported> LLC-load-misses
0.002081276 seconds time elapsed
0.000392000 seconds user
0.000392000 seconds sys
That’s odd. Okay, maybe hardware counters from within a VM is a bit weirder.
i then spend about five minutes searching online a combination of the words
qemu, perf, and hardware counters. Eventually, the
answers
i
find
in
various
places tell me that the feature i am looking for is called “Virtualized
Performance Monitoring Unit”, or “Virtualized PMU”, or “vPMU”. They also tell
me the same information i relayed at the top of the post:
Enabling vPMU in QEMU is done by enabling KVM and copying the host CPU information:
-enable-kvm -cpu host.
So i tried.
And it didn’t work. It. Didn’t. Work. Nothing changed. What was going on? Why was i falling into a case that was documented nowhere? Oh no, it’s happening again isn’t it, oh NO, NOT AGAI-
## Part 2: My Sanity Slowly Drains as I Skim DMesg and Kernel Code
In order to start debugging, i had to figure out what was happening in the
kernel. For those who don’t know, perf interacts with an entire subsystem in
the Linux kernel (the perf subsystem) to retrieve hardware events. Thus, if
there was a bit of software that should know about the problems with the PMU, it
should be the kernel.
Running dmesg | grep -i perf i got the following:
[ 0.133409] Performance Events: unsupported p6 CPU model 183 no PMU driver, software events only.
Oh, wait. So my CPU is not supported? Why? My kernel is a bit old (6.12), and my CPU is, as you can guess, a Raptorlake (13th gen Intel, heterogeneous architecture with Raptorcove performance cores, and Enhanced Gracemont efficiency cores). Maybe there was a driver missing which was not in older Linux versions? That’s what i thought, until i also ran a Linux 6.18 build and obtained the same result.
On Linux v6.12, i hit this line (arch/x86/events/intel/p6.c#L272) in the following code:
__init int p6_pmu_init(void)
{
x86_pmu = p6_pmu;
switch (boot_cpu_data.x86_model) {
case 1: /* Pentium Pro */
x86_add_quirk(p6_pmu_rdpmc_quirk);
break;
case 3: /* Pentium II - Klamath */
case 5: /* Pentium II - Deschutes */
case 6: /* Pentium II - Mendocino */
break;
case 7: /* Pentium III - Katmai */
case 8: /* Pentium III - Coppermine */
case 10: /* Pentium III Xeon */
case 11: /* Pentium III - Tualatin */
break;
case 9: /* Pentium M - Banias */
case 13: /* Pentium M - Dothan */
break;
default:
pr_cont("unsupported p6 CPU model %d ", boot_cpu_data.x86_model);
return -ENODEV;
}
memcpy(hw_cache_event_ids, p6_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
return 0;
}
That code initially had me guess that something about models not listed here made them incompatible with vPMU. Maybe only older versions of p6 CPUs could have virtualized PMUs? No.
Looking at v6.18, that function changed dramatically:
__init int p6_pmu_init(void)
{
x86_pmu = p6_pmu;
if (boot_cpu_data.x86_vfm == INTEL_PENTIUM_PRO)
x86_add_quirk(p6_pmu_rdpmc_quirk);
memcpy(hw_cache_event_ids, p6_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
return 0;
}
And the new line i hit is located in the v6.18 version of intel_pmu_init:
__init int intel_pmu_init(void)
{
// ...
/* Architectural Perfmon was introduced starting with Core "Yonah" */
if (!cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
switch (boot_cpu_data.x86) {
case 6:
if (boot_cpu_data.x86_vfm < INTEL_CORE_YONAH)
return p6_pmu_init();
break;
case 11:
return knc_pmu_init();
case 15:
return p4_pmu_init();
}
pr_cont("unsupported CPU family %d model %d ",
boot_cpu_data.x86, boot_cpu_data.x86_model);
return -ENODEV;
}
// ...
So, “Architecture Perfmon was introduced starting with Core Yonah”, huh? This
bit of code seems to be responsible for initializing support for the PMU on an
Intel CPU at boot. i skipped the variable declarations, because the first bit of
code in that function is a check for CPU capabilities on the CPU the kernel
booted on. Specifically, we’re looking for an x86 feature called arch_perfmon.
If that feature is absent, we then call the p6_pmu_init function that
previously spewed the error log. The new organization of these functions clears
up the earlier confusion: p6_pmu_init is only run for pre-Yonah p6 family
cores in order to try and initialize the feature if it wasn’t already available.
Failing that (and it will! Even if you comment out the core version check), we
run into our pr_cont right there above, and return -ENODEV. Oops.
If you’re on an Intel CPU and run lscpu, you may find arch_perfmon in the
list of Flags. This signals that you have architectural support for your PMU.
If you’re on an AMD CPU, you will have something called perfctr_core.
So, wait, hold on, where is that X86_FEATURE_ARCH_PERFMON capability even
enabled during boot? So, even at boot, the CPU is indicating it can’t support
(v)PMU? Surely not. Surely that’s a mistake. Right?
There’s three places that use the X86_FEATURE_ARCH_PERFMON and that, at first
glance, look relevant to this current situation:
arch/x86/events/intel/core.c, where it’s checked to potentially run a*_pmu_initfunction.arch/x86/kernel/cpu/intel.carch/x86/kvm/cpuid.c
Now, if you do not know what cpuid is, don’t worry. We’ll come back to it. All
you need to know, and what i knew at the time, is that it’s a CPU instruction
used to probe features in, and identify CPUs. Let’s investigate the other file
first, okay?
So, what’s going on in intel.c? Well,
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
intel_workarounds(c);
init_intel_cacheinfo(c);
if (c->cpuid_level > 9) {
unsigned eax = cpuid_eax(10);
/* Check for version and the number of counters */
if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
set_cpu_cap(c, X86_FEATURE_ARCH_PERFMON);
}
// ...
Okay. Deep breaths. We’re gonna have to do it. We’re gonna have to-
## Part 3: We Have To Talk About CPUID
*inhales* (i’ve always wanted to do this)
The Intel(R) 64 and IA-32 Architecture Software Developer
Manual,
Volume 1, Chapter 21, explains the processor identification and feature
determination instruction, called cpuid.
CPUID is complex to explain, but, essentially, you interact with it by setting
CPU register eax to a value, and then you observe the output values in eax,
ebx, ecx and potentially edx. For the purpose of probing PMU features, we
set eax=10, as shown above with cpuid_eax(10). That function is not named
cpuid_eax because it sets eax to 10 by the way, but because it returns only
the value of the eax register after running cpuid.
The notation used within the manual to denote “call to CPUID with EAX=X” is
CPUID.XH. You can then write the different values in the output registers as
CPUID.XH:REGISTER.FIELD_NAME, or with a binary range after the register name.
We’ll be using this notation here too, for coherence with the Intel Software
Developer Manual.
You’re still reading a blog post about QEMU and perf support by the way.
Table 21-30 of the manual presents the list of fields for eax and ebx for
CPUID.0AH. Not all fields really matter here, we are interested in
CPUID.0AH:EAX[7:0] (aka VERSION, aka eax & 0xff), and
CPUID.0AH:EAX[15:8]1 (aka NUM_GP_CTRS, aka ((eax >> 8) & 0xff)), the number of general-purpose hardware counters per core. The manual
says “This leaf is valid if CPUID.OAH:EAX[7:0] > 0 and MAX_LEAF ≥ 09H”.
Those map exactly with the code above.
So, yeah, that code checks if we have a PMU. Great! Cool. Wait. So, if
X86_FEATURE_ARCH_PERMON is not set, it means that CPUID.0AH:EAX.VERSION is
null, or there’s no general-purpose hardware counters, or the maximum CPUID leaf
is 9 or below. I actually know that it’s the first two, because i then
modified the call as follows:
if (c->cpuid_level > 9) {
unsigned eax = cpuid_eax(10);
/* Check for version and the number of counters */
if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
set_cpu_cap(c, X86_FEATURE_ARCH_PERFMON);
else {
unsigned ebx = cpuid_ebx(10);
unsigned ecx = cpuid_ecx(10);
unsigned edx = cpuid_edx(10);
pr_warn("THERE'S NO PMU AAAAH EAX=%x, EBX=%x ECX=%x EDX=%x",
eax, ebx, ecx, edx);
}
}
And what i got as a result was
THERE'S NO PMU AAAAH EAX=0, EBX=0, ECX=0, EDX=0
Fuck.
Ok, quick thinking. Who’s making those responses? Who’s the mastermind behind it all? That’s right: The Hypervisor.
## Part 4: KVM is a Little Liar
We get to bring in KVM now. Remember the third place where
X86_FEATURE_ARCH_PERFMON was mentioned in that list above? The code is as
follows:
case 0xa: { /* Architectural Performance Monitoring */
union cpuid10_eax eax;
union cpuid10_edx edx;
if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
break;
}
eax.split.version_id = kvm_pmu_cap.version;
eax.split.num_counters = kvm_pmu_cap.num_counters_gp;
eax.split.bit_width = kvm_pmu_cap.bit_width_gp;
eax.split.mask_length = kvm_pmu_cap.events_mask_len;
edx.split.num_counters_fixed = kvm_pmu_cap.num_counters_fixed;
edx.split.bit_width_fixed = kvm_pmu_cap.bit_width_fixed;
if (kvm_pmu_cap.version)
edx.split.anythread_deprecated = 1;
edx.split.reserved1 = 0;
edx.split.reserved2 = 0;
entry->eax = eax.full;
entry->ebx = kvm_pmu_cap.events_mask;
entry->ecx = 0;
entry->edx = edx.full;
break;
}
So this is a software implementation of CPUID.0AH! Neat. This is in a function
called __do_cpuid_func in kvm/cpuid.c. If you follow the chain of calls all
the way back to
/virt/kvm/kvm_main.c
(__do_cpuid_func ← do_cpuid_func ← get_cpuid_func ←
kvm_dev_ioctl_get_cpuid ← kvm_arch_dev_ioctl ← kvm_dev_ioctl),
you will find the definition of the ioctl handler on the device that becomes
/dev/kvm.
I hope you already know what an ioctl is.
You wouldn’t be so deep into this post otherwise, probably.
That tracks with my knowledge of how virtualization works in KVM: some
operations are really performed by catching an instruction/interrupt/whatever
from the VM context, getting out of the VM context, and poking the hypervisor to
query the result that should be given. Here, we’re just doing it with
CPUID.0AH.
And so, that implementation of CPUID.0AH, how does it synthesize the answer?
Well, as you can see, it checks for enable_pmu and X86_FEATURE_ARCH_PERFMON,
and if either are false/disabled, it just sets everything to 0 and returns;
otherwise, it shoves the information we want inside eax, ebx and edx.
i know that all of my registers were 0 when i printed them earlier, and
kvm_pmu_cap.version must be above 0, because otherwise the host would have
no PMU, right? But my host has a PMU? i checked at boot:
[ 0.116715] Performance Events: XSAVE Architectural LBR, PEBS fmt4+-baseline, AnyThread deprecated, Alderlake Hybrid events, 32-deep LBR, full-width counters, Intel PMU driver.
[ 0.116715] core: cpu_core PMU driver:
[ 0.116715] ... version: 5
[ 0.116715] ... bit width: 48
[ 0.116715] ... generic counters: 8
[ 0.116715] ... generic bitmap: 00000000000000ff
[ 0.116715] ... fixed-purpose counters: 4
[ 0.116715] ... fixed-purpose bitmap: 000000000000000f
[ 0.116715] ... value mask: 0000ffffffffffff
[ 0.116715] ... max period: 00007fffffffffff
[ 0.116715] ... global_ctrl mask: 0001000f000000ff
Yeah, version is 5! So something’s off. Either enable_pmu is false, or
static_cpu_has fails to find X86_FEATURE_ARCH_PERFMON.
enable_pmu is defined at
arch/x86/kvm/x86.c#L185
in v6.18. It is a parameter for the kvm kernel module that defaults to true.
i couldn’t check the actual value, but, considering i had never touched the
default settings on kvm, i assumed it was true? It couldn’t be otherwise. But,
X86_FEATURE_ARCH_PERFMON is also enabled, right? But then that’s impossible. i
mean, the if block should not trigger otherwise, so, like, what’s going on?
What’s going on??
## Part 5: The Meltdown
So, to summarize what we know so far:
- My host machine boots.
- The
init_intelfunction runs, detects maximum leaf count above 9, and aCPUID.0AH:EAX.VERSIONwith more than 1 general-purpose counter per core, so it enables theX86_FEATURE_ARCH_PERFMONcapability on the boot CPU. - Later,
intel_pmu_initon the host does not trip on the check for that capability, and the PMU is properly initialized. - The virtual machine boots.
- In
init_intel, it callscpuid_eax(10) - On the host, an ioctl is fired to
/dev/kvmto synthesize the results ofCPUID.0AH - On the host,
kvm’s__do_cpuid_functrips on the if to check whether the PMU is virtualized, and, after setting all registers to 0, returns. - The guest machine receives the answer to
cpuid_eax(10), and does not enableX86_FEATURE_ARCH_PERFMONon the boot CPU - The guest machine runs
intel_pmu_init, which trips on the if block to check for thearch_perfmoncapability. - The family of the CPU being
p6, the guest kernel tries to runp6_pmu_init. - On Linux 6.12, the error “Unsupported CPU” happens directly inside
p6_pmu_init, which returns-ENODEV, which will fail upwards to callers and fail the PMU initialization as a whole, leaving no access to hardware counters forperf. - On Linux 6.18, the error “Unsupported CPU” happens after we check for the
model of the CPU, and realize it’s younger than/equal to a Yonah core, so
the
p6_pmu_initcall is useless, and we fail right now, leaving no hardware counter available toperf.
At this point, i tried to fake the calls to cpuid_eax by hard-coding values i
expected, and it did not work. i tried intense googling of more and more obscure
keywords, and it did not work. i tried forcing the enable_pmu parameter on
kvm and reloading kvm_intel too while we’re at it. it didn’t work.
During my googling, i had found a couple of KVM patches from folks trying to introduce changes to the way KVM handles Hybrid architectures, sometimes outright disabling it. Remember how my computer has a 13th gen Intel CPU? Most consumer-grade Intel CPUs starting with 12th gen are heterogeneous2, that is to say that even though the CPU appears as one coherent unit, the actual cores inside behave differently: some are optimized for fast tasks but consume more energy, and others are optimized for tasks that don’t need to be fast, but can comfortably run slower or sparsely if that means reducing the power draw.
One patch i found discussed issues with KVM and heterogeneous architectures: unless you properly pin the virtualizing processes on the host, the guest could receive incoherent messages about hardware counters on its CPUs. Until KVM had a better way of reasoning between heterogeneous vCPUs and real heterogeneous CPUs 3, they argued, virtualized PMU should be disabled.
That patch seemed to make a point, but, checking for its name in the git log of
my version of the kernel, there’s no trace of it.
Out of desperation, i end up trying to grep the individual words, KVM, PMU,
and i skim the output, until i find:
4d7404e5ee00 KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
OH COME ON-
## Part 6: This is Where i Complain. A Lot.
Let’s inspect commit 4d7404e5ee00:
commit 4d7404e5ee0066e9a9e8268675de8a273b568b08
Author: Sean Christopherson <seanjc@google.com>
Date: Wed Feb 8 20:42:29 2023 +0000
KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
gains a mechanism to let userspace opt-in to the dangers of exposing a
hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of
a hybrid PMU, is possible, but it requires careful, deliberate
configuration from userspace.
E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
prevent migrating a vCPU between a big core and a little core, userspace
must enumerate a reasonable topology to the guest, and guest CPUID must be
curated per vCPU to enumerate accurate vPMU capabilities.
The last point is especially problematic, as KVM doesn't control which
pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.
Alternatively, userspace could enable vPMU support by enumerating the
set of features that are common and coherent across all cores, e.g. by
filtering PMU events and restricting guest capabilities. But again, that
requires userspace to take action far beyond reflecting KVM's supported
feature set into the guest.
For now, simply disable vPMU support on hybrid CPUs to avoid inducing
seemingly random #GPs in guests, and punt support for hybrid CPUs to a
future enabling effort.
And so on and so on. I’m sparing you the Signed-off-by’s and the Cc’s and everything.
The code of the commit is:
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index cdb91009701d..ee67ba625094 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -165,15 +165,27 @@ static inline void kvm_init_pmu_capability(void)
{
bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
- perf_get_x86_pmu_capability(&kvm_pmu_cap);
-
- /*
- * For Intel, only support guest architectural pmu
- * on a host with architectural pmu.
- */
- if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp)
+ /*
+ * Hybrid PMUs don't play nice with virtualization without careful
+ * configuration by userspace, and KVM's APIs for reporting supported
+ * vPMU features do not account for hybrid PMUs. Disable vPMU support
+ * for hybrid PMUs until KVM gains a way to let userspace opt-in.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
enable_pmu = false;
+ if (enable_pmu) {
+ perf_get_x86_pmu_capability(&kvm_pmu_cap);
+
+ /*
+ * For Intel, only support guest architectural pmu
+ * on a host with architectural pmu.
+ */
+ if ((is_intel && !kvm_pmu_cap.version) ||
+ !kvm_pmu_cap.num_counters_gp)
+ enable_pmu = false;
+ }
+
if (!enable_pmu) {
memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
return;
It replaced a check for kvm_pmu_cap.version and num_counters_gp (akin to
what you do with CPUID.0AH) by a cute little if block that checks
X86_FEATURE_HYBRID_CPU, and disables vPMU behind your back if that’s enabled.
Without warning.
In 6.18, there is still no warning there. They only added a memset to zero-out
the kvm_host_pmu structure.
In short, my options are few:
- If i feel like tweaking the host’s kernel, i could comment the line that
disables
enable_pmu, and try to work with that (for multiple reasons, it’s impractical as hell). - Move my experiments to another machine.
So, let’s recap together what went wrong:
- When
perfshowed unsupported hardware counters, i searched documentation on how to enable vPMU, which had no visible date nor indications of assumptions regarding architecture (in all fairness, a retroactive decision like that to disable vPMU for all heterogeneous x86 CPUs is not something people track to then go update blog posts). - The error message in
dmesgwas originally in a function with a confusing semantic, that was fixed some time between 6.14 and 6.15 when it moved fromp6_pmu_inittointel_pmu_init. - The error message talks about an unsupported CPU. It assumes the CPU not
having a PMU exposed via
CPUID.0AHis a problem with the model, so it exposes information related to your CPU model, even when that CPU model should, by all other indicators, have a PMU. - When i found patch sets on the Linux Kernel Mailing List that discussed vPMU in KVM, i had no way to track whether they had been merged or not. In fact, i ended up finding an earlier draft of the commit that bit my ass, but because it had slightly different commit name formatting, i gave up and assumed it had not been merged.
- There was no error message about vPMU being forcefully disabled in KVM. This decision has several assumptions: the user does not need to know KVM is running, or they do not know what a hybrid x86 architecture is, or they will not know why it is disabled, and they will complain because they can’t fix it. It still baffles me that nobody thought to at least print something,
Look, i am used to this. i am used to falling through the cracks of people’s assumptions: i daily-drive Arch Linux, i run shit on QEMU via the command line only, i build my own kernel modules sometimes, i need old software, i jam programs together that were never meant to work together. There’s glue though, there’s things we, as people who touch computers have agreed on in order for communication to work properly across implementations. CPUID is one such example, and so are ioctls, kernel module parameters, network protocols, perf sample representations, or even logging messages.
Logging is the most person-oriented way of conveying that something went wrong,
or any complex information that cannot be shoved in a bitfield for easy access:
when i read the message about “Unsupported CPU”, that was a programmer, 14 years
ago, telling me “Hey, if you hit this line of code, it means you plugged a CPU
into your motherboard that does not have a PMU yet!”. For a second, someone’s
thought process was conveyed to me, alongside their assumptions about the
meaning of a Intel CPU reporting no arch_perfmon capability.
Look, i’m not blaming the Linux devs. Entirely. There may be reasonable arguments for why they did not add a log line in that if block, or why nobody thought about adding that at any point in reviewing the sets. Maybe they had assumptions about the users, about the systems, that they did not communicate.
i’m more mad about how we keep miscommunicating at each other, especially about assumptions we make about hardware, software, their interactions, and how i keep losing my time and my mind debugging the friction between all of them.
It’s tiring, frankly.
…
Oh and fuck Intel, really while we’re at it. That’s all. Bye!
-
Notice how the Intel Software Developer manual does bit ranges backwards? Ah, little endian architectures… ↩
-
To be more precise, mainstream desktop 12th gen Alderlake CPUs are still homogeneous. Thirteenth gen, aka Raptor Lake, are heterogeneous except for i3s, and similarly for the 14th gen. ↩
-
So, i’m not a KVM fox. i’m not even that much of an architecture fox, i just know people, and i hate hardware with a passion so intense that occasionally i end up learning way too much about it. My assumptions about the more precise problems with KVM and heterogeneous architectures, from what i understood, is that the available performance counters exposed on the host are those of the performance CPUs, which are typically more advanced. Those are also the sets exposed to the guest. However, if the guest is scheduled on an efficiency CPU, and tries to probe a counter that is only available on a performance CPU, it will fail, or there will be undefined behaviour that has never been accounted for anywhere so far in the kernel or
perf. Another assumption is that exposing which CPU you’re currently scheduled on to the guest is… a gamble. Knowing your position in a topology, especially if you’re a malicious actor, especially if you’re scheduled with other users of a system, opens up a lot of possibilities to probe your cache neighbours, or even mess with them. ↩