Settings

Theme

Linux kernel swear counts

vidarholen.net

65 points by crashoverdrive 13 years ago · 25 comments

Reader

gizmo686 13 years ago

Its worth looking at the scale of the swear count. Linux has about 16629976 lines of code, and I'd estimate from the graph that it has about 370* swear words (excluding penguin). If you look at the second graph, that is less then 1 swear in 300000 lines.

I checked this on the source tree for 3.8.0. The numbers appear to be inflated by allowing the swear words to be part of other words.

For example, "shit" appears in 121 lines, but " shit " only appears in 10 lines. Looking at the offending lines, there is only one swearword that is missed by excluding spaces.

"fuck" appears 29 times, all of which are some conjugation of the verb (and some lines have duplicates I'm not counting).

"crap" appears 161 times, 20 of which are part of "scrap"

"bastard" appears 17 times, 6 of which go to email addressed hosted at "lazybastard.org" and "you-bastards.com"

"penguin" appears 99 times, two of which are jokes.

  • kleiba 13 years ago

    If you want to check the various words in isolation, surrounding spaces might cost you some matches, e.g. at the end of a sentence ("It's a piece of shit.") or when followed by a comma. Also, did you ignore case ("Shit happens.")?

    How about trying \b[Ss][Hh][Ii][Tt]\b and the likes?

    • gizmo686 13 years ago

      There were few enough curse words that I manually checked the output of not requiring spaces. Regarding the case sensitivity, it looks like I missed 12 instances of swearing because of that. Also, grep has a "-i" parameter, which makes it case insensitive.

DarMontou 13 years ago

I'm curious about the motivations for coding profanity. I've occasionally included comments like "don't f* with this unless you understand x, y, and z" in an attempt to protect fragile sections of code from careless collaborators. Nearly identical comments without profanity seemed ineffective. Within common conversations I know that profanity often carries an implication of violence, usually for the purpose of intimidation. I also find that profanity is frequently used for comedic relief.

Hopefully these counts don't indicate increasing fragility or violent disagreements within the kernel. Does anyone with kernel experience have any insight into common purposes for kernel profanity?

  • azernik 13 years ago

    Looking through this, the two biggest categories seem to be complaints about buggy/weird hardware (that driver writers have to work around) and complaints about compiler quirks. I can imagine other projects having similarly-motivated cursing at annoying library weirdnesses.

    The motive seems to be to acknowledge to the reader that, yes, this code is ugly, and it's not the writer's fault; it's the product of some bugginess external to the codebase that really isn't possible to fix. Blame "fucking gcc", or Sun for having the nerve to "take such nice parts and fuck up the programming interface" (to quote two examples from the Linux code). The target of the anger is most definitely not the intended audience.

  • gizmo686 13 years ago

    From a quick grep of the source, it looks like a contributing factor is Matsushita Electric Industrial. If you are interested, I posted the output of grep to pastebin [0].

    [0]http://pastebin.com/MNZF1Vz0

    EDIT: This is against linux-3.8.0 from Mint's repository.

    • anabis 13 years ago

      The group's name has change from Matsushita to Panasonic, so this would bring the count down in the future.

      • gizmo686 13 years ago

        I doubt they retroactivly change code/comments because the original organization changed their name. The Matsushita references will probably stay there until they bitrot and get removed/rewritten.

shaggyfrog 13 years ago

Can anyone explain the inclusion of "penguin"? I know Tux is the mascot and all, but is it some kind of inside-joke-swear-thing?

  • DarMontou 13 years ago

    If you look at the grep output that gizmo686 posted to pastebin (see below), you can search for penguin. There are a lot of web addresses, email addresses, maintainer roles (chief penguin), and logo references. There are a few instances of variable names containing penguin as well.

    http://pastebin.com/MNZF1Vz0

foobarbazqux 13 years ago

I like jwz's classic post about swear words in Mozilla:

http://www.jwz.org/doc/censorzilla.html

ColinWright 13 years ago

When this was sumitted some years ago there was some discussion. It might be worth comparing that with the comments here:

https://news.ycombinator.com/item?id=850761

It has been submitted a few more times, none with comments:

https://news.ycombinator.com/item?id=2070056

https://news.ycombinator.com/item?id=4045103

https://news.ycombinator.com/item?id=6307849

The last of these was just 14 hours ago - the trailing slash defeating the HN dup detector.

m_ram 13 years ago

A casual search of code in Debian [1] shows that this is not limited to the Linux kernel. Thankfully there's no equivalent to the Parents Television Council [2] or Focus on the Family [3] for open source projects.

[1] http://codesearch.debian.net/search?q=fuck

[2] https://en.wikipedia.org/wiki/Parent%27s_Television_Council

[3] https://en.wikipedia.org/wiki/Focus_on_the_Family

aylons 13 years ago

I wonder what caused the peak of shit just before 3.0.9, and what reversed it.

The spike just before 3.2.17 must be a glitch, but if it isn't, it's very intriguing.

  • ucarion 13 years ago

    It seems as if a lot of lines of code were removed or not measured at that particular measurement; despite the drop in the usage of 'shit', its occurrence per line in fact jumps up at that moment.

forkrulassail 13 years ago

I love how penguin is halfway between bastard and crap, I'll update my swear word dictionary.

NAFV_P 13 years ago

There is an area around 2.4.36.5 where the fall and rise of "fuck" and "shit" closely resemble each other. I'm guessing it's a macro which concatenates the two expletives for use in the code.

  • ajkjk 13 years ago

    or a variable named 'fuckshit'

    • NAFV_P 13 years ago

      That's what I'm thinking, they used the ## preprocessor operator to stick "fuck" and "shit". Oh, if this is linux, then the identifier is more likely to be "fuck_shit".

chatman 13 years ago

With people like Linus Torvalds at the helm of the project, what else can be expected?

  • primelens 13 years ago

    That's a little harsh. Yes he has a tendency to be - erm, 'vehement' about the quality of code, but there is no one else anyone would rather have at the helm of the kernel project.

  • gizmo686 13 years ago

    An increase in the rate of swear words.

  • frozenport 13 years ago

    Not sure what you mean. Are there a lot or a little?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection