I’ve recently told a few people that I write, that I have a blog, and then I try to describe what I write about. I’m kinda proud of some of the stuff that I’ve covered here on randomascii over the years but I struggle when trying to summarize it to a non-technical audience. So here goes:
I’ve got a few human interest stories such as sharing my grief in 2024, sharing my loss of my right ear in 2016, and sharing the fun of commute-challenge 2017 (2017 video here) and commute-challenge 2018 (2018 video here), plus the time I spoke about the commute challenge at Ignite Seattle. I’ve also got some quirky posts such as how to make a warm weather snowman, how to unicycle faster, and the time I unicycled a really long way. And I’ve got a summary of my entire career (part one and part two).
But what about the investigative reporting, the revealing of hidden stability or performance problems (usually in Windows) that I’m particularly proud of discovering? What about the bug fixes that have happened because of my work? How can I share (brag?) about those to my non-technical friends?
I decided to try writing one-paragraph summaries of some of the stories that I’m most proud of, that made an impact, and which can just possibly be understood by non-experts. Here goes.
Everybody on Windows has some desktop icons so this first issue is potentially relevant to all Windows users:
In 2021 I saw a post on twitter (back when I still used twitter) from a software developer who had a powerful computer that would frequently hang for 10 seconds or much longer. It sounded interesting so I investigated. It turns out that Windows had a bug where it would spend a lot of time rearranging icons on the desktop. If you doubled the number of icons it would spend four times as much time. This is called a quadratic algorithm and it meant that Windows Explorer collapsed under its own weight if you had just a few hundred icons, and the person reporting the problem had about a thousand – actually images that they had dropped there. The hilarious thing was that this would happen even if you had your desktop configured to not show desktop icons! This was reported to Microsoft and they have fixed it in Windows 11. My investigation means that most users are now safe from this problem. Here’s the full writeup that explains how I identified what the problm was – note that I was able to scientifically analyze the problem despite the fact that it was happening on a different computer on a different continent.
This next problem was some rare gmail hangs I noticed, and the fix ended up saving hundreds of MB of memory for all gmail users on Windows:
I had a very powerful computer (24-core CPU) that was mostly idle and yet I found that Chrome/gmail would frequently hang for several seconds at a time. I eventually traced this to a performance bug in Windows that Chrome and gmail were tickling, which caused problems when our IT department ran a scan. I made some changes to Chrome so that it wouldn’t tickle the bug and the problem went away. Microsoft also fixed their performance bug, but since my tweak also saved hundreds of MB of memory we kept it. And once again Chrome users were protected from weird performance issues. Here is the full writeup for “24-core CPU and I can’t type an email” which ended up being read over 125,000 times.
The next two problems are particularly esoteric. One was a bug deep inside Windows that many developers people had hit but I was the first to correctly diagnose, and the other was a Windows performance problem that was affecting Chrome developers – that blog post is one of my most popular:
For many years Chrome’s build system on Windows – the thing that turns source-code into a version of Chrome that you can run – would fail about 3% of the time. This is not how computers are supposed to work. Similar failures were happening at other companies but nobody was able to understand the problem well enough to do anything. Through some combination of persistence and luck and good intuition I realized that the crashes were caused by a disk-caching bug deep inside the Windows kernel. I worked with a friend at Microsoft to gather more information and he was able to find and fix the exact problem, making high-performance Windows computers around the world more reliable. Here is the full writeup for “Compiler bug? Linker bug? Windows Kernel Bug!”
Finally, my first “big hit” was an article I wrote when I noticed that when I was using my extremely powerful computer to build Chrome I often couldn’t even move my mouse, despite the fact that the machine was barely 50% busy. My machine was made useless, and I knew that this wasn’t how things were supposed to work. It turns out that destruction of processes that load gdi32.dll causes heavy contention on the same lock that is needed to update the mouse position. I know that sounds like gobbledygook but I don’t know how to get rid of any more of the jargon. The good news is that we were able to work around the issue by being very careful not to load gdi32.dll into the many processes we create when building Chrome, and this resolves the issue. Microsoft also slightly reduced their overhead. Here is the full writeup for “24-core CPU and I can’t move my mouse” which has been read almost 300,000 times, making it my second most popular blog post ever.
And in the number one spot…
My number one post, at over 400,000 readers and still read 20,000 times a year, is a recipe book for different ways to compare numbers on a computer to see if they are “close”. Doing this well is surprisingly tricky.