A body of work of an atypical programmer

If there’s one thing that has become clear to me after decades of experience in programming (and countless discussions in different communities), is that I don’t think like most people. I believe programmers don’t think like most people, but I’m not talking about just that, even among programmers I clearly think differently. And before people start throwing claims of arrogance, or narcissism; “different” doesn’t necessarily mean superior: just different.

The examples I’m going to provide are going to show cases in which I turned out to be right, but obviously there is a selection bias; this doesn’t mean that my way of thinking is always superior, or even in the majority of cases. Simply that sometimes it is.

Some people are inevitably going to focus on my persona, rather than the code, or the arguments, because that’s what humans do: as social animals it’s in our nature. I would ask you dear reader to try to avoid that, but inevitably I foresee some of you uttering the famous The Big Lebowski line: “you’re not wrong, you are just an asshole”.

Fine, but truth matters, doesn’t it? So am I right?

I’m going to start with very simple examples, but eventually they’ll get more complex and controversial.

Rookies

Let’s look into a simple question from StackOverflow: Pretty file size in Ruby? (e.g. MiB, GiB, etc.)

The top answer:

class Integer
  def to_filesize
    {
      'B'  => 1024,
      'KB' => 1024 * 1024,
      'MB' => 1024 * 1024 * 1024,
      'GB' => 1024 * 1024 * 1024 * 1024,
      'TB' => 1024 * 1024 * 1024 * 1024 * 1024
    }.each_pair { |e, s| return "#{(self.to_f / (s / 1024)).round(2)}#{e}" if self < s }
  end
end

My answer:

def filesize(size)
  units = %w[B KiB MiB GiB TiB Pib EiB ZiB]

  return '0.0 B' if size == 0
  exp = (Math.log(size) / Math.log(1024)).to_i
  exp = units.size - 1 if exp > units.size - 1

  '%.1f %s' % [size.to_f / 1024 ** exp, units[exp]]
end

By using mathematics we can find the exponent explicitly, and then if it’s bigger than the list of units, simply use the biggest one. The top answer errors when the number is bigger than 1 TiB. My answer works correctly for any number and it’s easy to use more units (or less), so it’s more maintainable.

But actually… there’s an extra line:

exp += 1 if (size.to_f / 1024 ** exp >= 1024 - 0.05)

This ensures that numbers that round up to the next exponent are better represented, for example 1023.95 GiB is shown as 1.0 TiB.

But because I’m a completionist, I wasn’t happy with a solution that just seemed to work, I wrote tests and benchmarks for all the solutions to ensure mine not only worked better, but it’s the most efficient, in some cases by orders of magnitude.

Seconds to process 1 million numbers:

 - filesize (mine):    2.15
 - Filesize:          15.53
 - number_to_human:  139.63
 - to_filesize:        2.41

Real

You might think my code is superior only in comparison to code written by rookies, but what about real world-class code? Let’s look at one of the first patches I sent to the git project: completion: simplify __git_remotes.

The original shell code for a helper function in the shell completion was doing basically:

for i in "$d/remotes"/*; do
	echo ${i#$d/remotes/}
done

The output of this is the list of files in a particular directory with the directory removed, for example if the directory was “$HOME/dev/libfoo/.git/remotes/origin“, the output would be “origin“.

The problem is that bash would throw the glob itself if there aren’t any files that match when the nullglob option is not set (the default). For example “echo /tmp/foobar* ” will output “/tmp/foobar*” instead of nothing.

The solution git developers thought of was to store in a variable if nullglob isn’t set, then set nullglob, and then unset nullglob if it wasn’t set:

local ngoff
shopt -q nullglob || ngoff=1
shopt -s nullglob
# code
[ "$ngoff" ] && shopt -u nullglob

That works… in bash, but it doesn’t work on zsh, which was my main motivation.

The second solution they came up with was to emulate the behavior of bash’s shopt into zsh’s setopt:

if [[ -n ${ZSH_VERSION-} ]]; then
	__git_shopt () {
		local option
		if [ $# -ne 2 ]; then
			echo "USAGE: $0 (-q|-s|-u) " >&2
			return 1
		fi
		case "$2" in
		nullglob)
			option="$2"
			;;
		*)
			echo "$0: invalid option: $2" >&2
			return 1
		esac
		case "$1" in
		-q)	setopt | grep -q "$option" ;;
		-u)	unsetopt "$option" ;;
		-s)	setopt "$option" ;;
		*)
			echo "$0: invalid flag: $1" >&2
			return 1
		esac
	}
else
	__git_shopt () {
		shopt "$@"
	}
fi

Presumably now the code worked for both bash and zsh, but I thought it was too convoluted and there should be a simple way to do that.

Turns out in UNIX there’s an easy way of listing the files inside a directory: ls:

test -d "$d/remotes" && ls -1 "$d/remotes"

Not only does this get rid of the need for the __git_shopt hack, but all the logic regarding nullglob, and even the three lines of the for loop get reduced to a single line.

37 lines of code turn into 1. How is this not objectively superior?

contrib/completion/git-completion.bash | 39 ++-------------------------------------

Yet somehow nobody else thought of this solution for six years. Nobody but me.

Linux

Although git’s code is world-class, I think there’s nothing of higher quality than linux, so let’s look at a simple patch I sent for linux.

Original code in kernel/params.c:

/* Lazy bastard, eh? */
#define STANDARD_PARAM_DEF(name, type, format, tmptype, strtolfn)      	\
	int param_set_##name(const char *val, const struct kernel_param *kp) \
	{								\
		tmptype l;						\
		int ret;						\
									\
		ret = strtolfn(val, 0, &l);				\
		if (ret < 0 || ((type)l != l))				\
			return ret < 0 ? ret : -EINVAL;			\
		*((type *)kp->arg) = l;					\
		return 0;						\
	}								\
	int param_get_##name(char *buffer, const struct kernel_param *kp) \
	{								\
		return scnprintf(buffer, PAGE_SIZE, format,		\
				*((type *)kp->arg));			\
	}								\
	struct kernel_param_ops param_ops_##name = {			\
		.set = param_set_##name,				\
		.get = param_get_##name,				\
	};								\
	EXPORT_SYMBOL(param_set_##name);				\
	EXPORT_SYMBOL(param_get_##name);				\
	EXPORT_SYMBOL(param_ops_##name)


STANDARD_PARAM_DEF(byte, unsigned char, "%hhu", unsigned long, kstrtoul);
STANDARD_PARAM_DEF(short, short, "%hi", long, kstrtol);
STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, kstrtoul);
STANDARD_PARAM_DEF(int, int, "%i", long, kstrtol);
STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, kstrtoul);
STANDARD_PARAM_DEF(long, long, "%li", long, kstrtol);
STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, kstrtoul);

If you are not familiar with C macros, this code may seem a little daunting, but all this is doing is copy-pasting from one place to another. For example the first call to STANDARD_PARAM_DEF is using byte as “name”, so param_set_##name is going to be replaced with param_set_byte. And inside that function, type is going to be replaced with “unsigned char” and tmptype with “unsigned long“.

So we get:

int param_set_byte(const char *val, const struct kernel_param *kp)
{
	unsigned long l;
	int ret;

	ret = kstrtoul(val, 0, &l);
	if (ret < 0 || ((unsigned char)l != l))
		return ret < 0 ? ret : -EINVAL;
	*((unsigned char *)kp->arg) = l;
	return 0;
}

You may think if linux developers put those checks there, it must be for a reason, and you would be kind of right, but in particular I would like to focus on the check (unsigned char)l != l. If the string is “0x100” that check would fail, because (usigned char)0x100 is 0 as that’s outside the range of UCHAR_MAX (0xff). So param_set_byte would fail for that value.

Which does make sense: param_set_byte cannot set more than a byte.

However, if we look at the internal implementation of kstrtoul, it looks very similar:

int _kstrtoul(const char *s, unsigned int base, unsigned long *res)
{
	unsigned long long tmp;
	int rv;

	rv = kstrtoull(s, base, &tmp);
	if (rv < 0)
		return rv;
	if (tmp != (unsigned long)tmp)
		return -ERANGE;
	*res = tmp;
	return 0;
}

It’s basically doing the same check, except it’s checking for unsigned long instead of unsigned char, which makes sense given the name: ul = unsigned long.

That’s a strong hint that if there’s a function for unsigned long, maybe there’s a function for unsigned char, and there is:

int kstrtou8(const char *s, unsigned int base, u8 *res)
{
	unsigned long long tmp;
	int rv;

	rv = kstrtoull(s, base, &tmp);
	if (rv < 0)
		return rv;
	if (tmp != (u8)tmp)
		return -ERANGE;
	*res = tmp;
	return 0;
}

So instead of calling kstrtoul — which indirectly calls kstrtoull — we can just call kstrtou8, and that would do the (unsigned char)l != l check itself (u8 = unsigned char).

int param_set_byte(const char *val, const struct kernel_param *kp)
{
	return kstrtou8(val, 0, (unsigned char *)kp->arg);
}

But remember that the macro was actually param_set_##name, and in that macro we can specify which strto* function to call:

STANDARD_PARAM_DEF(byte, unsigned char, "%hhu", unsigned long, kstrtoul);

So we simply change that to:

STANDARD_PARAM_DEF(byte, unsigned char, "%hhu", unsigned char, kstrtou8);

And in fact we do the same for all the other types:

STANDARD_PARAM_DEF(byte,	unsigned char,	"%hhu",	unsigned char,	kstrtou8);
STANDARD_PARAM_DEF(short,	short,		"%hi",	short,		kstrtos16);
STANDARD_PARAM_DEF(ushort,	unsigned short,	"%hu",	unsigned short,	kstrtou16);
STANDARD_PARAM_DEF(int,		int,		"%i",	int,		kstrtoint);
STANDARD_PARAM_DEF(uint,	unsigned int,	"%u",	unsigned int,	kstrtouint);
STANDARD_PARAM_DEF(long,	long,		"%li",	long,		kstrtol);
STANDARD_PARAM_DEF(ulong,	unsigned long,	"%lu",	unsigned long,	kstrtoul);

We can see now that the tmptype is the same as the type so we can rid of that argument, which is in fact not used anymore, and the final function ends up being:

int param_set_##name(const char *val, const struct kernel_param *kp) \
{								\
	return strtolfn(val, 0, (type *)kp->arg);		\
}								\

Which is much simpler.

Altogether it ends up as:

/* Lazy bastard, eh? */
#define STANDARD_PARAM_DEF(name, type, format, strtolfn)      		\
	int param_set_##name(const char *val, const struct kernel_param *kp) \
	{								\
		return strtolfn(val, 0, (type *)kp->arg);		\
	}								\
	int param_get_##name(char *buffer, const struct kernel_param *kp) \
	{								\
		return scnprintf(buffer, PAGE_SIZE, format,		\
				*((type *)kp->arg));			\
	}								\
	const struct kernel_param_ops param_ops_##name = {		\
		.set = param_set_##name,				\
		.get = param_get_##name,				\
	};								\
	EXPORT_SYMBOL(param_set_##name);				\
	EXPORT_SYMBOL(param_get_##name);				\
	EXPORT_SYMBOL(param_ops_##name)


STANDARD_PARAM_DEF(byte,	unsigned char,		"%hhu",		kstrtou8);
STANDARD_PARAM_DEF(short,	short,			"%hi",		kstrtos16);
STANDARD_PARAM_DEF(ushort,	unsigned short,		"%hu",		kstrtou16);
STANDARD_PARAM_DEF(int,		int,			"%i",		kstrtoint);
STANDARD_PARAM_DEF(uint,	unsigned int,		"%u",		kstrtouint);
STANDARD_PARAM_DEF(long,	long,			"%li",		kstrtol);
STANDARD_PARAM_DEF(ulong,	unsigned long,		"%lu",		kstrtoul);

Not only is the code simpler, more straightforward, and more maintainable, but it’s also more efficient since now it’s not doing multiple checks twice, but only once.

Why did nobody see this for so many years? I don’t know. I’m not an expert in this code, yet I was able to find this low-hanging fruit very easily.

Here’s the patch: params: improve standard definitions.

Troubleshooting

My different approach shows not only when writing code, but also when troubleshooting. In the git mailing list somebody reported a regression in the testing suite, and after multiple people speculated about the issue, I decided to get my hands dirty.

I was able to reproduce the issue, but curiously enough only when running the test scripts directly, not when running them trough prove. After spending an ungodly amount of hours trying to create a simple script that does the same as prove, I realized that perl can run shell scripts, so the issue only happened when bash was used directly, and I managed to remove all the noise from the test case to reproduce the problem in a few lines:

export COLUMNS=80
echo "COLUMNS: $COLUMNS"
touch /tmp/foo
echo "COLUMNS: $COLUMNS"

I reported this finding to the mailing list: the COLUMNS variable is changed. I didn’t know why, only that it did happen.

It took me a few more minutes of investigation to realize that this behavior is actually explained in bash’s manpage:

checkwinsize
If set, bash checks the window size after each external (non‐builtin) command and, if necessary, updates the values of LINES and COLUMNS. This option is enabled by default.

I think this is a mistake from bash, but either way, the reason why it didn’t cause problems before is that it wasn’t enabled by the default before, they changed it in bash 5.

My solution was to simply disable checkwinsize for the entire git’s test suite: test: fix for COLUMNS and bash 5. Other people proposed different solutions, but my proposal was the simplest and least intrusive.

Now, you can argue that this is very trivial and anybody could have found the issue, but that’s the thing: for some reason nobody else did. Before my solution people speculated that perhaps it depended on the terminal used, maybe some terminals used some LD_PRELOAD trickery for getenv(), or maybe some shells were setting COLUMNS wrongly.

I cut through all the noise and arrived directly at the culprit.

Indeed, running shopt -u checkwinsize right after exporting COLUMNS in test-lib.sh fixes the tests. Great work!
Alex Henrie

Explaining

My thinking also shows when explaining complex issues. One example is an update of GRUB that prevented the booting of many machines running Arch Linux.

There was a lot of discussions about what the actual issue was, who was potentially affected, what to do about it, and when was it completely fixed. The problem was exacerbated because the moderators of the reddit community r/archlinux made the false statement that the problem had been “fixed”.

The grub issue has been “fixed” and was never really a problem on Arch.
Morten Linderud

That’s a lie: the issue had not been fixed, and it was a problem on Arch Linux. To demonstrate that I decided to install GRUB on my machine, even though I had not used it in more than a decade, and started to dig deep.

The result was one my top posts on reddit: Explaining the actual GRUB reboot issue in detail, with 85,000 views and 97% upvote rate.

In that post I explained the problem step by step with a simplification of the different versions of the code written in shell script (which in my opinion is the easiest language to understand).

I’m not going to address every detail of the explanation — which you can read very easily on reddit — all I’m going to point out is the crux of the problem: one line change that added the command “fwsetup --is-supported“. The reason that line is problematic is that the option --is-supported was new: if you call “fwsetup --is-supported” in an older version of GRUB, it would be the same as calling “fwsetup” with does start the firmware setup UI (aka. BIOS UI).

The way I explained the issue was simple. Initially the menu was like this:

menuentry "1. Arch Linux"
menuentry "2. UEFI Firmware Settings"

Then, it was like this:

menuentry "1. Arch Linux"

fwsetup --is-supported &&
menuentry "2. UEFI Firmware Settings"

The code of fwsetup was initially like this:

fwsetup() {
	echo "Starting UEFI firmware"
	echo "REBOOTING" >&2
	exit
}

Then it was updated to:

fwsetup() {
	if [[ "$1" = --is-supported ]]; then
		test -n "$efi_ui_supported"
		return $?
	fi

	if [[ -z "$efi_ui_supported" ]]; then
		echo "ERROR: EFI UI not supported" >&2
		return 1
	fi

	echo "Starting UEFI firmware UI"
	echo "REBOOTING" >&2
	exit
}

So a menu created with a newer version of GRUB would call “fwsetup --is-supported” to check if there is a firmware setup UI, but if the installation of GRUB had an older fwsetup, that would unconditionally “reboot” the machine to the firmware setup UI.

One would think that shouldn’t be a problem, because nobody would have a menu script for the new version of GRUB with an old installation of GRUB, but because GRUB is so complex that I jokingly call it an operating system, that’s precisely what happened. GRUB requires two steps to be properly updated: grub-install and then grub-mkconfig.

To make matters worse, some distributions based on Arch Linux did call grub-mkconfig automatically, but not grub-install (in my opinion it should be the other way around).

That’s why Morten Linderud — an Arch Linux developer — argued that “it wasn’t really a problem on Arch Linux”: because vanilla Arch Linux doesn’t call grub-mkconfig automatically. But that’s a rationalization; one use case is a setup where grub-install is issued once, and after that grub-mkconfig is issued multiple times to update the configuration. Not to mention that one could write a hook to do that automatically.

So vanilla Arch Linux was affected as well. A user that manually called grub-mkconfig but not grub-install would have his machine unbootable and Arch Linux’s documentation did not explain when users should call one command and when the other, it only explained what each command did.

The “fix” that Linderud mentioned is a message that was added after an update recommending users to run both commands. In my view that’s not a fix, that’s a poor workaround at best.

Once I wrote my reddit post, everyone understood the problem, that it was present on vanilla Arch Linux too, and it wasn’t yet “fixed”.

But that’s not all, I provided a patch to actually fix the problem on Arch Linux, the patch was rejected because according to Arch Linux developers it was already “fixed” by the update message, and I provided another patch to GRUB so that the menu worked correctly on all systems (even ones running an old version of GRUB). The GRUB patch was also rejected on the basis that there were other “better” solutions already proposed, but those solutions didn’t actually fix the problem.

So neither Arch Linux nor GRUB ever fixed the problem. Arch Linux users had to learn how to make their machines bootable again, and eventually everyone somehow ran grub-install, rendering it a non-issue (although people still report GRUB updates making their machines unbootable).

In my view both Arch Linux developers and GRUB developers were wrong, and you may think because I’mt not an expert in either, I’m probably the one that is wrong, but as you will see in the next sections: experts are often wrong.

However, if I don’t use GRUB, why did I spend so much time on this issue? In my view GRUB is unnecessary shit. Something like GRUB might have been useful in BIOS times, but since the introduction of UEFI, bootloaders like GRUB should be obsolete, especially since linux supported EFI stubs. I created a video on 2012 showing a linux UEFI machine booting directly without any bootloader in seconds: Linux EFI stub boot. So why are people still using unnecessary complex software like GRUB in 2022? I don’t know.

This GRUB debacle prompted me to write another popular reddit post in which I explained how to boot directly into a linux kernel without any bootloader, because apparently people didn’t know you could do that: Why use a bootloader? Just boot directly into a unified kernel image.

Top brass

Some may think that perhaps these solutions were not found by top developers simply because they weren’t paying attention, so how about an example where Junio Hamano — the maintainer of the git project himself — was directly involved?

But your own stats disagrees with your opinion, so don’t invent a new thing, period.
Junio C Hamano

In this discussion in the git mailing list I proposed a patch where I added some information: “Comments-by: Jeff King“. In git lingo these are called “commit message trailers” and help identify people who aided in the development of the patch in some way.

The problem was that Junio wanted me to use a trailer that in his opinion was superior and more common: Helped-by.

Junio may be a good programmer, but he doesn’t understand statistics. He thought that because Comments-by was not among the top trailers, that meant no one used it and I was inventing a “new thing”. That’s why he gave me the order to not do that.

However, that’s not how power law distributions work. The top 10% of anything is not representative of the majority of the population. In this case 65% of unique trailers appeared only one time.

In fact, Junio’s own behavior destroyed his argument by ordering me to not do something he has regularly done. He had invented multiple times non-standard trailers that had been used only once.

I don’t think any reasonable person would side with Junio in ordering me to not do something he actually did multiple times.

If that wasn’t enough, the coding guidelines stated unequivocally that you could do that:

You can also create your own tag or use one that’s in common usage such as “Thanks-to:”, “Based-on-patch-by:”, or “Mentored-by:”.
SubmittingPatches

So Junio — the maintainer of git — was objectively wrong about something in the git project, and I was right.

The Comments-by trailer was perfectly fine.

Now, I know what many of you are thinking: why didn’t you just do what Junio said? Sure, I could have done that, but also, Junio could have changed the commit message and write whatever the fuck he wanted — as he has done countless times in the past, rewriting what I wrote in whatever way he liked and completely ignoring what I said.

At the end of the day, Junio was not my boss, he cannot order me to do anything. I do whatever I want in my own free time, and this was a voluntary contribution, nobody paid me to do it. Moreover, even a boss cannot order me to have a specific opinion. If Junio wants a different message, he has the power to change it, in opposition of my opinion.

But my opinion is my opinion. I think “Comments-by” is better than “Helped-by”, because it explains how this person helped. Junio is free to disagree. I’m not going to pretend I don’t have that opinion just because a powerful person thinks I’m wrong (especially when he is the one that is wrong).

I bow to no one.

What was particularly devastating in my response to Junio, is that I decided to use statistical code I’ve been developing for analyzing income distributions, and use it to show statistics of commit message trailers and the result followed a Pareto distribution pretty closely, which wouldn’t be surprising for people who understand information and statistics. Many people did like my message showing these stats: The top 1% of commit trailers.

In the end it remained Comments-by, even though it had only been used 3 times, but 75% of trailers had been used 3 times or less. So that was fine.

And yes, I understand this isn’t code, I could provide an example where I objectively proved Junio wrong in code, but that would require an entire blog post on its own, and I’m not sure how many people would understand it. This is easier.

Founder found wrong

The git maintainer wasn’t the first or the only “expert” I objectively proved wrong.

Edward Hervey was a co-founder of Collabora — the company behind GStreamer, the multimedia framework used in GNOME. When I reported a bug regarding an audio clip that wasn’t playing properly in GStreamer, Edward took issue with a pleb like me suggesting a core component of GStreamer could possibly have a problem.

Congratulations, you’ve just proven that the default 40ms is a better choice than 100microseconds. I applause your logic. Wait, you had a point ?

The moment you’ll understand that drift-tolerance is called as such because it’s the threshold at which we consider timestamp inconsistencies as drift and not as jitter you might be able to [censored] the [PG rated]. Both Olivier and Mark have proven their understanding of this problem in their comments, but you insist in childing around.

P.S. Now please, as a courtesy to those actually trying to figure out an acceptable way to counteract this minor odd issue, just stop spamming these bug reports repeating yourself 500 times in as many different ways of you disliking how GStreamer works despite the fact that it handles A/V sync for 99.99999999% of the files/use-cases out there in a much better way than any of the tools you’ve mentioned. WE GOT THE POINT, WE UNDERSTAND THE ISSUE AT HAND ! It just doesn’t help, makes up grumpy, and makes us not want to help you. Unlike kids, we prefer to ponder the issue at hand to try to find an acceptable fix which won’t introduce regressions (and yes, we’re thinking about it, the answer isn’t in screwing up audio sync for everyone else but rather in detecting the busted files at the demuxer level). Would you (or your employer) prefer a rapid fix that fixes this marginal braindead file playback while screwing up all the other use-cases ? Good night (or morning).
Edward Hervey

I’m not even going to bother pointing out all the falsehoods Edward claimed, I’m just going to explain what I call “drfit” and what I call “jitter“.

Drift: If some audio is supposed to play at second 1 but plays at second 1.1, there’s a diff of 0.1 seconds. If this drift is consistent, the audio that should play at second 2 will come at second 2.2, so the diff is 0.2 seconds. Over time, the gap increases, making the audio more and more out of sync. You might have noticed this behavior in a movie where the audio gradually drifts out of sync with the video. If this drift is somehow corrected, that would likely be a temporary solution as it will eventually be noticeable again.

Jitter: Jitter — on the other hand — refers to timing variations that are inconsistent. For example, if the first audio cue comes at second 1.1, but the second one comes at second 1.9, the first diff is +0.1, but the second one is -0.1, cancelling each other. While drift accumulates over time, jitter results in fluctuating timing that can cause momentary desynchronization but doesn’t necessarily worsen continuously over a long period.

Edward’s problem is that he didn’t attempt to understand what I meant by “jitter”, he simply assumed that I didn’t understand what they meant by “drift-tolerance” (I did).

To show that he didn’t even read what I said, here’s the comment he replied to:

Note that this is not drift, it’s jitter, but basesink is wrongly detecting it as drift, messing up all the playback.

With my patch now the playback would not be affected by jitter, you can even set a drift-tolerance of 1, and clips with any jitter would play fine.
Felipe Contreras

When he said “you’ve just proven that the default 40ms is a better choice than 100microseconds” he demonstrated he didn’t even read “you can even set a drift-tolerance of 1” (1 μs). I’m saying with my patch a jitter of 100 μs does work, not fail. I’m not saying 40 ms is a better drift-tolerance default, I’m saying with my patch it doesn’t matter. But he doesn’t care what I’m actually saying at all. He assumed that I said 100 μs drift-tolerance always works worse than 40 ms, but I said the opposite.

Reports like this are usually ignored. It wouldn’t matter how much work I spend trying to explain the issue, and even write code to reliably reproduce it, because the relevant people are not going to spend the minute it requires to actually read what I’m actually saying. But in this case something fortuitous happened.

Håvard Graff from Cisco explained that their users of live streaming were experiencing glitches, and based on real-life data they decided to increase the drift-tolerance to 225 ms, which is something I argued some people might need to do, but it wasn’t an ideal solution. Months later he reported issues with up to 500 ms of diff, which I argued could happen.

So everything I theorized, he had data showing it was actually happening. The diff was almost as my synthetic test: two jumps forward, two jumps backwards.

Once Håvard realized that what he was observing was precisely what I described as “jitter”, he made the following comment:

I think this is a great feature for BaseAudioSink to have, and would be very useful for all use of GStreamer involving networks.
Håvard Graff

This is what he had to say about the patch:

So basically this patch offers something as rare as both higher accuracy, it terms of allowing smaller alignment‑thresholds, as well as much smoother, less glitchy playback!
Håvard Graff

Note that he didn’t say “drift-tolerance”, he said “alignment-threshold”, because now the “drift” detection is split into three properties: “drift-tolerance”, “alignment-threshold”, and “discont-wait”.

So Edward was wrong: the issue wasn’t “marginal”, it wasn’t relegated only to “braindead” files, and my fix didn’t “screw up all the other use-cases”. The issue was that GStreamer was incorrectly detecting jitter as drift and correcting something that didn’t need correcting. Real-world data from Cisco users showed that.

Eventually the patch was merged: base a udiosink: delay the resyncing of timestamp vs ringbuffertime, and to this day the property “discont-wait” is still there on all audio sinks.

I don’t know how other programmers would have reacted to ad hominem attacks from a co-founder like this, but I knew I was right, so that was all there’s to it.

And for the record, I’m not and I wasn’t an expert on audio sync, I wasn’t familiar with the terms “drift” or “jitter”. All I knew is the data that I was seeing, the data that was possible, the behavior that I was seeing, and the behavior that I considered desirable. I based my conclusion on what was obvious to me, not any industry-wide accepted notions, or any “expertise”. My conclusion turned out to be right.

Edward on the other hand was wrong, the “experts” did not understand the issue at hand.

Misunderstanding time

Yukihiro Matsumoto — the creator of the Ruby language, also known as matz — got involved in a discussion regarding a bug report I made about how Ruby handled a specific time format.

%s means ‘time_t value from the epoch’. The epoch is fixed time point (1979-01-01 00:00 UTC). Offsetting it according to %z seems nonsense, or plain wrong. That’s the reason behid his rejection. If you were lucky to read Japanese, you’d have understood it.
Yukihiro Matsumoto

Judging from the previous examples, you can already guess: Yukihiro is wrong.

First of all, the year is 1970, not 1979, but the real issue is his misunderstanding of time moments. This moment while you are reading this article is not happening just wherever you are, it’s happening in Tokyo and London as well. So the epoch is not just “1970-01-01T00:00:00 UTC“, it’s also “1970-01-01T01:00:00 +0100“, since the epoch also happened in Berlin: both are different representations of the same moment in time.

Yukihiro assumed his expert in datetime — Tadayoshi Funaba — knew what he was talking about and tried to defend his reasoning, but Tadayoshi was just wrong. The more Yukihiro tried to justify Tadayoshi’s decision to reject the change, the more he realized it was unreasonable.

The issue was if “%z” (timezone) made sense in conjunction with “%s” (seconds since epoch), what both Tadayoshi and Yukihiro argued is that because %s assumes UTC, the timezone makes no sense. But %s does not assume UTC: UTC is only used to describe the epoch, but “1970-01-01T01:00:00 +0100” is also the epoch.

So, if you are in Berlin, and the time is “1970-01-01T01:00:00 +0100“, how many seconds since the epoch have passed? Zero. So %s is 0. But you can add the timezone, so “%s %z” of “1970-01-01T01:00:00 +0100” is “0 +0100“.

DateTime.parse('1970-01-01T00:00:00 UTC') ==
DateTime.parse('1970-01-01T01:00:00 +0100')
=> true

Where is the “nonsense”?

I wrote an entire post of this issue: My tone doesn’t make me wrong, or how I convinced the Ruby project to fix an inconsistency, because once again people focused on me, and not the technical issue at hand.

In the end Tadayoshi was forced to do the sensible thing: ext/date/date_core.c (rt_rewrite_frags): a new feature (not a bug fix) of strptime. applies offset even if the given date is not local time (%s and %Q). This is an exceptional feature and I do NOT recommend to use this at all. Thank you git community.

But even if you don’t find my understanding of time moments compelling, every programming language deals with “%s %z” correctly, even different implementations of Ruby deal with this correctly (e.g. Rubinius), even Ruby’s main implementation dealt with it correctly in the Time class, it was only the DateTime class that had a problem, and only when parsing, not when formatting. So I don’t even know why this became a contentious issue in the first place.

Either way, now DateTime parsing also works correctly, but it doesn’t really matter, Tadayoshi left the project after this, and now DateTime is deprecated.

I don’t know how else to say it, but I was right. Even when the creator of the language thought I was wrong, and many members of the community started attacking me personally, and even calling me racist for bringing up cultural differences, I was still right.

The CEO has no clue

When I was working at Nokia developing the Nokia N9, the new CEO back then — Stephen Elop — made the disastrous choice to switch to Windows Phone. We already had an operating system that could compete with Android: MeeGo, but Elop — who knew nothing of the mobile industry — thought Windows Phone was better.

As a result of some internal drama, Elop decided to send me a personal email (I don’t know why), and in that email he explained the reason why he chose Windows Phone, basically: MeeGo would take too long to deliver the next three products.

That was true, but why? The reason was that the people at the top decided to switch platforms for every single product, if one product uses OMAP, another Snapdragon, and another Intel, then yes: those products would take time, because developing for a different platform takes time. But if they made the decision to stay in the same platform, like OMAP, then the products could come very fast.

Elop probably didn’t even know who I was, but I was working on the hardware adaptation team, so that was precisely my expertise. Elop on the other hand came from Microsoft Office and knew nothing about the topic.

I explained to Elop why he was wrong and all he replied with was “I’ll have to respectfully disagree”. That was the end of it.

That is until the Nokia N9 was officially released and started to receive a lot of praise. That’s when I decided to make my disagreement with Elop public in a blog post, and it started to receive a lot of attention, even in the Finish press. Nokia engineer calls out Stephen Elop for killing MeeGo, says he has no idea what he’s talking about (I didn’t actually say he had no idea, merely that he was wrong).

A lot of people thought this was an error on my part, but just because I work in a company that doesn’t mean that I have to agree with everything the CEO says. In my view Nokia couldn’t fire me because they would look bad to the public, and they couldn’t ask me to change what I said because they can’t force me to have certain personal opinions.

Now we know that Stephen Elop was hired precisely to facilitate the selling of Nokia to Microsoft, and the way he decided to do that was by driving the price of the company to the ground, therefore making it easier for Microsoft to buy it. He achieved his goal, and was monetarily compensated for it.

Ironically, in the entire history of Windows Phone it only worked with a single platform: Snapdragon. By the time Elop killed MeeGo, it had already support for more platforms.

So Elop was wrong, I was right.

The whole story is quite sad because even in 2024 it’s very obvious the market needs a third option to compete with Android and iOS, and Nokia already had one that was much more open than Android, and aligned with the Linux philosophy. In some alternate universe where a certain company listened to a certain engineer the story would have been very different.

Drepper and Fedora obtuseness

This is a long and complex story that involved Ulrich Drepper, Linus Torvalds, Fedora maintainers, and me. It involved multiple layers of software, from machine-specific assembly code up to philosophy of software, but I’m not going to explain every detail of it, just the important stuff.

The story starts with people noticing strange audio when playing media with Flash software (yes, people still used that in 2010), but only after updating to Fedora 14, and the culprit was quickly determined to be glibc 2.13.

The change that broke Flash were some optimizations for memcpy, specifically for SSSE3. Technically what glibc did was correct (according to the C standard) but it did change the existing behavior, and Flash was not the only software relying on it.

According to Ulrich Drepper, the software that relied on the old behavior was “written by people who should never have been allowed to touch a keyboard”, so it wasn’t a problem in glibc.

This is where the difference in philosophy shows, because according to Linus Torvalds it shouldn’t matter that Flash was badly written, glibc should still not intentionally cause regressions, even if the code was technically correct. I agree with this philosophy, and I’ve written extensively about it: The Linux way: never ever break user experience.

I’ve debated people with the mentality of Drepper before, and in my experience it’s impossible to convince them that users matter. All they care about is that their code is “correct”, they don’t care how many users suffer as a result of their changes.

Drepper pointed the finger at Flash and refused to do anything to fix the issue other than a small mitigation tactic that only delayed the problem.

Fedora maintainers also pushed the blame and refused to do anything. They claimed Adobe software was broken, and they wouldn’t do anything about it, which infuriated Linus:

Quite frankly, I find your attitude to be annoying and downright stupid.
Linus Torvalds

I agree with Linus, and in fact this is one of the reasons I stopped using Fedora. It would have been very easy for Fedora maintainers to add a patch to restore the old behavior, at least temporarily, in fact, I did provide a very simple patch to do so: x86_64: fix for new memcpy behavior. They refused any responsibility towards their users and simply said “not out problem”.

Why should I use software that doesn’t care about me (the user) at all?

To me software that doesn’t care about its users is like food that doesn’t contain any calories: not really software.

I tried to make Drepper see reason:

Sure, code should use memcpy correctly, and if glibc has a compelling reason to break these programs, it should. As Ulrich mentions in comment #4 “There are going to be more and different implementations in the future and they shouldn’t be prevented from being used because of buggy programs.”

But today that’s not the case. Today, the regression can be fixed easily with a patch like what Linus is proposing, and there will be no downside whatsoever.

How about glibc breaks the behavior only when there’s an upside to breaking it?
Felipe Contreras

But you can’t force people to be reasonable. Drepper decided to knowingly introduce a regression and break existing software with no upside other than saving a couple of cycles per call.

Neither Drepper nor Fedora maintainers listened to Linus, me, or anybody. Linus kept trying to explain that even if Flash was badly written, glibc should still try to make it work, especially since there’s practically no cost, because that’s what users would expect. But those people don’t care about the users, Linus was wasting his breath.

I pursued a different approach, I argued that this had nothing to do with Flash, in theory there should be other software doing the same. Fedora maintainers laughed at this possibility and insisted it was only a hypothetical.

So I decided to prove them wrong and wrote a simple memcpy checker.

#include <string.h>

extern void __fortify_fail(const char *msg);

void *memcpy(void *dst, const void *src, size_t n)
{
	if (dst >= src + n || src >= dst + n)
		goto ok;
	__fortify_fail("memcpy overlap");
ok:
	return memmove(dst, src, n);
}

This can be compiled into a shared library, and then that library added in /etc/ld.so.preload. That’s a tip I got from Drepper.

With this checker I found issues in PulseAudio, zsh, and other software. Other people found issues in ImageMagick and unzip. Problems in GStreamer and Squashfs also emerged. And I bet there were many many others I’m not aware of.

So, once again, Fedora maintainers were wrong. This change in glibc not only broke Flash, but other software as well. They could have worked around it very easily with virtually no cost, at least temporarily, but instead they decided to screw their users for no gain.

This is not the first nor the last time glibc decided to fuck their users unilaterally. Recently they decided to simply remove DT_HASH, causing a lot of software to break, especially Steam video games with Easy Anti-Cheat. The glibc developers did not care how many users they screwed, went ahead with the change, and claimed it was up to the distributions to decide what to do about it. Fortunately this time I was using Arch Linux and they decided to patch glibc and restore the old behavior by building with --hash-style=both.

POSIX terminator

Let’s go for a highly technical example.

IFS=,
str='foo,bar,,roo,'
printf '"%s"\n' $str

You may think this seemingly innocuous shell code would have some obvious predictable outcome, but it’s precisely the opposite of what people expect. Not only does the behavior depend on the shell, but the shells that claim to be conforming to the POSIX standard aren’t — at least not if you interpret it as a reasonable human being — and the standard doesn’t explain how it should be interpreted.

At least that’s what I argued, and once again the “experts” disagreed with me:

How can you say that the current implementation that bash, dash, etc. use is not compliant to the POSIX specification?

And why do you not acknowledge that the logic on which you base your claim “‘,’ can terminate a field individually and end-of-string can terminate a field individually, so two of them in a row must have an empty field between them, and this negates the possibility that at the end of the string can be considered a single terminator” is flawed?
Emanuele Torre

Torre never made the effort to actually understand what I was actually saying, so let me clarify: I wasn’t saying bash wasn’t compliant with the POSIX standard, I said it wasn’t compliant if we interpret the POSIX standard in a certain way.

And that was my whole point: the POSIX standard could be interpreted in two ways.

This is what the standard said about Field Splitting (IFS):

Each occurrence in the input of an IFS character that is not IFS white space, along with any adjacent IFS white space, shall delimit a field, as described previously.
The Open Group Base Specifications Issue 7, 2018 edition

They key word here is “delimit“. If the string is “,foo,” — we can clearly see that the word “foo” is delimited, but what we don’t know is how it is delimited, and there’s two interpretations: a) separators or b) terminators.

Most modern programmers don’t even think about this because everyone uses separators, as they are the most supported (split method) and useful:

',foo,'.split(',')
[ '', 'foo', '' ]

This is what many people argue shells should do, and that’s what zsh does, but virtually every other shell delimits in terms of terminators.

The difference is that a separator not only ends the field, but starts the next one, while a terminator doesn’t start the next one. So “,foo,” is three fields with separators, and two fields with terminators. A common terminator delimiter is a semicolon: “one;two;three;“.

The “experts” argued that obviously the POSIX standard meant terminators, but I know how to read and it didn’t say that explicitly, nor was it implied anywhere.

Normally this would just be another pointless thread with a difference of opinion, except a famous programmer — Kevin Robert Elz (kre) — read this thread, agreed the standard was not clear, and filed a defect on the Austin Group Defect Tracker:

I didn’t really believe this when it was pointed out on a mailing list, but nowhere in XCU 2.6,5 (Field Splitting) does it say what happens when the expansion being split is not empty but contains no IFS characters.
kre

Believe it or not, as a result of this the POSIX standard was updated in 2024:

The shell shall use the byte sequences that form the characters in the value of the IFS variable as delimiters. Each of the characters <space>, <tab>, and <newline> which appears in the value of IFS shall be a single-byte delimiter. The shell shall use these delimiters as field terminators to split the results of expansions, along with other adjacent bytes, into separate fields, as described below. Note that these delimiters terminate a field; they do not, of themselves, cause a new field to start—subsequent bytes that are not from the results of an expansion, or that do not form IFS white-space characters are required for a new field to begin.
The Open Group Base Specifications Issue 8

Now the standard is clear: the delimiters in the IFS variable are terminators: 2.6.5 Field Splitting.

Once again the “experts” were wrong: the POSIX standard was not clear. Even the IEEE Computer Society agreed with that. By digging my heels in, I indirectly got the POSIX standard updated.

How many people can claim to have won an argument by getting the POSIX standard updated?

Woke nonsense

Although I tried to avoid any political differences, I believe at least one is warranted.

The Git project has been hijacked by companies with an obvious woke agenda like Microsoft and Google. One clear example of Microsoft pushing this agenda is Derrick Stolee’s attempt to insert woke language into the project:

This patch series officially adopts singular “they” as a contributor guideline; see Patch 4 for the change to the guidelines and the reasoning for the change. Before modifying the guidelines, documentation and comments are updated to not use gendered pronouns, which provides examples of how to use it.
Derrick Stolee

Now, if you agree with Derrick that the singular “they” is less awkward, by all means: use it. I’m not going to tell you how to speak. But Derrick didn’t express this as his personal opinion, but as established fact, and that is simply untrue.

Yes, some organizations do endorse the use of singular “they”, but not all. For a counterexample I provided a usage note of the American Heritage Dictionary: Updated Usage Note: They. And they made it very clear that support for this is not universal among linguists:

Resistance remains strongest when the sentence refers to a specific individual whose gender is unknown, rather than to a generic individual representative of anyone: in our 2015 survey, 58 percent of the Panel found We thank the anonymous reviewer for their helpful comments unacceptable. A sentence with a generic antecedent, A person at that level should not have to keep track of the hours they put in, was rejected by 48 percent (a substantial change from our 1996 survey, in which 80 percent rejected this same sentence). As for the use of they with antecedents such as anyone and everyone, pronouns that are grammatically singular but carry a plural meaning, by 2008, a majority of the Panel accepted such sentences as If anyone calls, tell them I can’t come to the phone (56 percent) and Everyone returned to their seats (59 percent).
American Heritage Dictionary of the English Language

Derrick wants to forgo all the nuance that goes into the discipline of linguistics and pretend the singular “they” is fine in every situation, despite the fact that linguists disagree, in particular when the sentence doesn’t have a semantic plural antecedent.

I did give examples of what “semantic plural antecedent” means and I went to great lengths to explain why some people don’t find the singular “they” natural and why it’s not a settled debate.

But most importantly I argued this was a debate for linguists, not something git developers should be weighing in.

What did Derrick reply? Nothing. Not a single person in what I would consider the “woke camp” deigned themselves to reply to even one of my messages. Not one. And for the record I wasn’t the only one pushing against this.

One of the ironies of the “tolerant” woke people is that they don’t actually tolerate differing opinions. Once you express an opinion that is contrary to the woke agenda, you are placed in the bin of deplorables and you are not even worthy of being listened to ever again. Which is kind of intolerant if you ask me.

I guess most programmers would have given up in frustration, but I knew I didn’t actually need them to listen to me, I decided to short-circuit their woke nonsense and workaround the language they deemed “noninclusive” by just changing the words:

I changed sentences such as “the user has to decide if he wants to free” to “the caller decides whether to free”; that avoids the gendered pronoun and any woke language nonsense in a way that is not controversial.

Perfect, I like this. This style is clearer, more neutral, and more on point.
Robert Karszniewicz

The irony of it all is that the changes to the documentation were only three. 99.9% of the documentation was already gender neutral. So we spent all this time arguing about changing the coding guidelines for nothing.

Even more ironic is that in one of the changes I in fact used the singular “they”: I changed “with her commit” to “with their commit”, but why? Because the antecedent was plural. The sentence began with “if somebody else”, that includes a lot of potential people, so when we are talking about a commit that a lot of somebodies could have made, “they” is natural, because we are talking about an individual that came from a pool of multiple people. Just like the example of the American Heritage Dictionary:

Everyone returned to their seats.
Example of semantic plural antecedent

Because everyone liked my approach and the maintainer was going to apply my non-political patch, Derrick was forced to eventually reply:

Felipe Contreras:

> Some people have a problem with using a female reviewer or a female developer as an example, and since this is an irrelevant detail, let’s say goodbye to our illustrative female colleagues.

I find this message to be snarky and underhanded instead of actually describing the goals at hand. Citing the reason as stated is not the purpose of these gender-neutral recommendations.
Derrick Stolee

Derrick believes that whatever purpose he had with these changes, is the purpose. Then he makes unwarranted assumptions such as that anyone who does not use gendered pronouns would have an issue with “her commit”, which is simply not true.

Junio Hamano took the commit message that I wrote, dumped it into the trash, replaced with what Derrick wrote, and didn’t say a word about it. If you read the commit message in doc: avoid using the gender of other people, it would I appear that I wrote Derrick’s nonsense. In those cases Junio should add the trailer: “Commit-message-by: Derrick Stolee”, but he didn’t.

This is one example why I say it doesn’t really matter if I disagree with Junio on a commit message: he just does whatever the fuck he wants.

In my view the commit message doesn’t matter much (although it does matter some), the important thing is that the language of the documentation doesn’t include woke nonsense. So I won the battle.

However, the singular “they” eventually made it into the coding guidelines with the note that it might appear “ungrammatical and unnatural” to some people.

Ultimately this proves my intuition that the objective of the woke camp was never to improve the documentation, because the documentation was already “fixed”. Their true objective was to advance the woke agenda. By including the notion of the singular “they” into the coding guidelines, they can then use this as ammunition on other projects by claiming: “the Git project already embraces inclusive language in their guidelines [link]”.

Now, some people who consider themselves woke might be annoyed by this whole section, perhaps even offended, but I did get rid of gendered pronouns, didn’t I? In fact, I even used the singular “they”, which is what Derrick claimed he wanted in the first place. So what’s the problem?

I’m not going to speculate any further as to why woke people have a problem with my dissenting opinion, even though I implemented exactly what they claimed they wanted. I’ll leave that as an open question for the reader, but it is curious.

At the end of the day the git official documentation doesn’t contain gendered pronouns, and I wrote the relevant patch (ignore the apocryphal invalid commit message). Although we shouldn’t have had that debate in the first place.

Conclusion

Progress requires people who think outside the box and aren’t afraid of expressing their opinion, even if it rocks the boat.

The modern tendency of shunning dissenting opinions achieves precisely the opposite: it stifles progress.

In this post I shared only a few instances in which my different way of thinking was proven right. You may think that these achievements are so small that you don’t need a person like me on your team, but of course these achievements have to be small if I am to explain them succinctly in a blog post. My biggest achievements would require a lot more explanation, and likely few would understand what’s so good about them.

Some people say they would rather have a person who is easy to work with rather than a really good programmer, but those people probably haven’t worked on a team doing really challenging work (which isn’t necessarily a bad thing). When a problem is really challenging, you want people on your team who are logical and rational, and therefore often right (or at least not wrong).

You want people on your team who are capable of saying “you are wrong”, not just to you, but to the boss, or even the entire team, because sometimes everyone is wrong.

Sometimes you need people who aren’t afraid of being called “asshole”.

Do you think when the CrowdStrike outage crashed millions of machines anybody cared how “easy to work with” was the developer who made the mistake? How about when the Space Shuttle Challenger exploded? Do you think anyone cared how likable were the people who dared to raise red flags? No, all that mattered was the truth.

Truth is like poetry.
And most people fucking hate poetry.
Some guy

One of the beautiful unique aspects of code is that it’s objective: either it works, or it doesn’t. When the code of someone you dislike works in more use cases or is more efficient/simple/maintainable, there’s only one rational conclusion to draw: he wasn’t wrong.

We tend to like people who think like us, but sometimes progress needs precisely the opposite.

Tell me how I’m wrong.