Portal 2 crashes if you have 128 cores, due to 15 year old assumptions
twitter.comMAX_THREAD_IDS is defined as 128 in a file with a copyright notice of 2005, which was when the very first consumer dual core processors started getting released:
https://github.com/perilouswithadollarsign/cstrike15_src/blo...
Entirely reasonable that no one in 2005 would even consider the possibility of 128 core processors. I bet nobody reconsidered or even looked at that code ever since, until it started breaking because of today's extremely parallel cpus starting to get somewhat affordable for consumers.
it's weird because they used to use regular TLS variables and then they swapped to this at some point, pretty sure TF2 uses regular ones
>"The solution?"
# echo 0 > /sys/devices/system/cpu/cpu{121..127}/online
>"Disabling just one core makes the game start, but then it segfaults when the menu opens. Disabling six cores makes the menu work. Disabling seven makes the rest of the game work too."What path do you walk to be able to figure out this kind of black magic by yourself?
It's just like the end of Portal 1: you have to remove some cores in order to win the game. ;)
For anyone curious/hopeful, this only really disables the cores in terms of the kernel.
They're still consuming about the same power they would normally (when idle) :)
Edit: cpusets would probably handle this better dynamically. I'm not sure if that influences the count visible to a process
This isn't really black magic.
Just reading a book on the Linux kernel and it's structure should be enough to get you going.
To be fair, the issue is not that there are too many physical cores, or that someone set that constant to 128. The issue is the code that spawns 1 thread per core doesn't check that constant to limit how many threads it spawns.
One would wonder what this game is planning to do with 128 threads running simultaneously.
Does it matter? The 128 assumption for these kind of things is there just to allocate a fixed array of 128 thread ids, and going outside that array is a out of bounds magic with pointers.
It's totally reasonable to have a limit. It's totally reasonable to make that limit way higher than you'd think you'd ever need. It's only a bug if you don't enforce thay limit and fail to check for it. And, it's really easy to forget to do that when the easiest way to verify thesr bugs is "wait 15 years until hardware improves".
It’s totally not reasonable to have a variable number of things and to store them in a fixed length array. That’s how you get vulnerabilities.
At least check if is going to fit and crash otherwise.
I’m not super familiar with cgroups, but I have to assume there’s a parameter for limiting the number of cores available to a process, rather than doing it at the kernel level in /sys