Destroying C with 20 lines of Haskell: wc
0xd34df00d.meThis doesn't seem to be comparing anything like the same thing. Does the Haskell version really do the same thing as the C version? Does it handle all of the same error cases, providing the same quality of error messages if they occur? Does it handle localization? If not, that makes the comparison very skewed, as unhammer already pointed out. Sure, if you strip out all of the things that the people who wrote wc actually spent all that time on then you can go faster, but failing to note the differences is simply dishonest.
Sup, author here.
> Does the Haskell version really do the same thing as the C version?
It counts bytes, words and lines and, modulo intended Unicode space handling and unintended bugs, does the same thing.
Indeed, it does not count things like max line length or char count, but those can be plugged in without significant performance overhead (and that's what the second part is gonna be about).
> Does it handle all of the same error cases, providing the same quality of error messages if they occur?
There are no error messages at this point. Although I don't really see how this should affect performance.
> Does it handle localization?
If you mean counting multi-byte characters, then not yet. Although I'm pretty convinced it does not require doing something much more complicated — but maybe I'm wrong, we'll see in the next part.
Also, thanks for the feedback, those are important questions! Something to keep in mind when writing subsequent posts.
Reproducing what your code does in the most simple, naive C program possible, I can beat the existing wc utility, taking only around 40% of the time that wc takes.
So until you put in locale handling, alternate line endings, option handling, and error handling, I don't see that your post is at all convincing.
Quite the opposite.
So I look forward to a Haskell version that supports everything the wc has so we can get a fair comparison.
> So until you put in locale handling, alternate line endings, option handling, and error handling, I don't see that your post is at all convincing.
That's precisely what the second part would be about.
And, if I succeed, IMO, that's where Haskell would really shine (because composability and local reasoning), and where I would be able to claim to achieve something — the stuff in the post we're discussing is indeed trivial (I didn't want to say that in the post itself though as I think it'll look like I'm belittling the guy who did the original post), while modularizing this is _fun_.
FWIW, my version, now compiled with -O3, is 8 times faster than wc. And I haven't even tried to optimise it.
I look forward to your results.
Here's the source code for GNU wc - https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;... .
It's easy to see that it's not the result of 10+ years of low-level optimizations to eek out the most performance.
Your test code probably hits the MB_CUR_MAX>1 path at line 361. (Check your locale setting!)
The main loop is:
This is a much more complicated implementation than your code. Among other things, note how it uses isprint/iswprint on each character, and how these are locale dependent.402 if (!in_shift && is_basic (*p)) 403 { 404 /* Handle most ASCII characters quickly, without calling 405 mbrtowc(). */ 406 n = 1; 407 wide_char = *p; 408 wide = false; 409 } ... 443 switch (wide_char) 444 { 445 case '\n': 446 lines++; 447 FALLTHROUGH; 448 case '\r': 449 case '\f': 450 if (linepos > linelength) 451 linelength = linepos; 452 linepos = 0; 453 goto mb_word_separator; 454 case '\t': 455 linepos += 8 - (linepos % 8); 456 goto mb_word_separator; 457 case ' ': 458 linepos++; 459 FALLTHROUGH; 460 case '\v': 461 mb_word_separator: 462 words += in_word; 463 in_word = false; 464 break; 465 default: 466 if (wide && iswprint (wide_char)) .... 480 else if (!wide && isprint (to_uchar (*p))) 481 { 482 linepos++; 483 if (isspace (to_uchar (*p))) 484 goto mb_word_separator; 485 in_word = true; 486 } 487 break; 488 }Even in when character = byte, the main loop uses the same logic:
Your benchmark only uses the characters:555 default: 556 if (isprint (to_uchar (p[-1]))) 557 { 558 linepos++; 559 if (isspace (to_uchar (p[-1])) 560 || isnbspace (to_uchar (p[-1]))) 561 goto word_separator; 562 in_word = true; 563 } 564 break; 565 }
which means it comes nowhere near being a good test which verifies the two programs do the same thing."\n\r ',-./0123456789ABCDEFGHIJKLMNOPQRSTUVYZabcdefghijklmnopqrstuvwxyz"The following should be a more difficult test set to reproduce wc's output. I create the test set with Python:
then print the output using two different locales:with open("testset.dat", "wb") as f: for b1 in range(256): for b2 in range(256): for b3 in range(256): _ = f.write(bytes((b1, b2, b3)))% env LANG=en_US.UTF-8 wc testset.dat 196608 1523713 50331648 testset.dat % env LANG=C wc testset.dat 196608 1152001 50331648 testset.datThis is a great point about handling printable vs non-printable characters that I originally missed when I read wc code. Thank you for pointing this out!
Could you elaborate on how you read the wc code?
I ask because the essay and your comments until now show no insight from reading the code.
I find it difficult to understand how anyone could miss that (rather significant) part of the core algorithm, and then assert the differences are due to only "modulo intended Unicode space handling" and the like.
Until now I had assumed you had a lay understanding of wc, and had not read the code.
I was mostly curious about how wc handles spaces and whether ignoring non-ascii spaces brings me closer or farther from what wc does. So I focused on that, and this specific printable characters handling didn't caught my eye.
On a meta level, I wasn't even considering that the notion of a word might be different from "a sequence of characters that aren't space characters".
Live and learn indeed.
Thank you for the clarification.
Good point. I was thinking the same thing too.
I will wait for what the next article presents, as the end of this article states:
> Stay tuned for a second part, where we will investigate shaping all this into an actual wc substitute, where different statistics can be turned on or off, all while not computing what the user hasn’t requested.
> This doesn't seem to be comparing anything like the same thing.
This is a fair point, and I believe this whole series of 'Beating C with foo' posts could have been better named.
But I'm of the opinion that the whole series is about showcasing various language's strengths and weaknesses, while using GNU wc as a benchmark.
From this perspective, I've learnt a bit about several languages I knew nothing about, so I rather like these articles, despite the fact that their titles might be a bit misleading.
It's all about tone.
Everyone remembers being a brash know-nothing teenager, so just by the headline "DESTROYING C" you get that vibe.
I remember having a blog about Haskell circa 2002 where I would smugly enumerate the ways in which Guido van Rossum was wrong. "GUIDO IS WRONG! PART 4" my post titles would say.
He's not handling unicode and it's been well demonstrated in the past that unicode handling adds a lot to computation time in tests like these (it often comes up in grep benchmarks too).
I'd be interested to see the same tests but with a non-unicode local set, IIRC:
I'd wager the C version would perform much better in that test.LC_ALL=CDon't forget to export the variable, or run like "LC_ALL=C wc ...", or otherwise the locale won't apply.
Does it handle multiple files, and stdin?
Anyway, destroying Haskell with 100 lines of C :-)
https://raw.githubusercontent.com/gdevic/minix1/master/comma...
I see and raise, to one line:
https://www.ioccc.org/2019/burton/prog.c
It is the Burton entry from The International Obfuscated C Code Contest, but seems like only handling ascii
https://www.ioccc.org/years-spoiler.html
Limits:
"Requires the C locale and ASCII character set. Input should be less than ten million octets to avoid this problem."
Isn't 100 lines too much for a simple utility like wc? I get that it has many edge cases to cover, but edge cases usually require some different handling when you run into them anyway. I'd rather use my one-liner (including calculation in a single pass, and even parallel processing!):
wc:{sum (({1};ceiling 0.5*sum differ " "=;{1+count x})@\:) peach read0 x}Do you get a speed-up if you use getc_unlocked() instead of getc()? And if you write your own isspace()? As far as I know, isspace() is locale sensitive.
isspace is defined as a macro at the top of the file, and it's definitely not locale sensitive ;)
Off-topic but tangentially related. There was Reddit post[0] about a clothes dryer's eco mode being a scam since it consumes more power than a regular drying cycle. Here[1] is a single comment thread noting that the user was measuring power output of the entire house. The user quickly changed their tune when this was highlighted.
[0]: https://old.reddit.com/r/dataisbeautiful/comments/eyevca/my_...
[1]: https://old.reddit.com/r/dataisbeautiful/comments/eyevca/my_...
They explicitly say in the introduction that this is a toy version, and then in the conclusion that a future post will look at a more complete substitute.
> failing to note the differences is simply dishonest.
Like yourself, I think any reader could understand that this is meant to be a toy implementation. The author even says so in the very first paragraph and at several points in the article thereafter. How is that dishonest?
Because that makes the programs sufficiently different that the conclusion about "smashing" seems dishonest.
Apples smash Oranges, etc.
So I just wrote the most naive version of wc I could think of in C, matching the capability of this Haskell version, and I smoked the system wc ... my code, unoptimised, was over twice as fast.
$ time wc Backups/Tera2/files.txt
1123699 2283439 161361844 Backups/Tera2/files.txt
real 0m2.010s
user 0m1.964s
sys 0m0.020s
$ time naive Backups/Tera2/files.txt L: 1123699
W: 2283439
C: 161361844
real 0m0.864s
user 0m0.835s
sys 0m0.028s
Smoked !!(See my comment elsewhere[0] for a more sensible comment)
[0] https://news.ycombinator.com/item?id=22234673
--------
Update: I just compiled with -O3 and got user time of 0.24 secs. This version is 8 times faster than the system wc.
Just put together a naive version of wc in Go and got similar perf gains over the wc that ships with macOS.
That being said, unlike the OP, I don't think it means anything as wc is likely supporting more features than my program.
https://gist.github.com/felixge/aa70fc97e893a7eb0bd4c801f8f5...$ time wc test.txt 16500000 49252094 2059004431 test.txt real 0m5.930s user 0m5.491s sys 0m0.374s $ go build wc.go && time ./wc test.txt 16500000 49252094 2059004431 real 0m2.947s user 0m2.620s sys 0m0.299sI got 5.6x speedup on just wc alone with:
(obviously in both cases with prewarmed filesystem cache)export LANG=CThat doesn't seem to help on macOS. I suspect you're on Linux?
Not being great at reading Haskell, I have some questions I was hoping people here could answer:
* Does this cope with different whitespace, such as tabs?
* Does this cope with different settings of locale?
* Does this include the option of the "longest line"?
* Does this perform the character counts?
I'm pretty sure wc does all these, and that stripping them out would make it faster. If this Haskell version doesn't do that, and yet still compares against a fully-featured version of wc, the comparison hardly seems fair.
* isSpace handles tabs, but looking at a single byte at a time it won't handle all the multibyte space symbols you can have in unicode. If you read further down, they rip out the remains of unicode handling for further speed improvements.
* Looking at a single byte at a time, it presumably only handles the "C" locale :) They don't say what locale GNU wc was tested with (if it's not LANG=C, that benchmark should be re-run)
* --max-line-length? no. But I'm guessing GNU wc isn't benchmarked with that option on (can't find the invocation in the blog post though)
* data State { ws, bs, ls } keeps count of words, bytes (more honest than calling it characters) and lines.
Thanks for the reply ...
> ... further down, they rip out the remains of unicode handling ...
Ah. Well, that makes it a little unfair, surely.
> Looking at a single byte at a time, it presumably only handles the "C" locale ...
Again.
> --max-line-length? no. But I'm guessing GNU wc isn't benchmarked with that option on
I wonder if wc does the work anyway, and only reports it if asked, or if it actually changes the code path if it's not needed.
So this entire post feels ... intellectually dishonest. personally I'm all in favour of Haskell, and I wish I had the chance to use it "in anger" rather than just doing the occasional toy thingie that I do. But this post doesn't do it or its community any favours.
Disappointing.
Yes. "Destroying C". The whole post feels like youthful bravado untempered by experience.
The source code of GNU wc:
https://github.com/coreutils/coreutils/blob/master/src/wc.c
The word counting algorithm does seem to be much more complex.
But how would Haskell version of wc compare with C version of wc running with LC_ALL=C environment variable? UTF-8 locale is much slower than C locale in coreutils, it's a well-known fact, and their Haskell version of wc is already using fixed 8-bit characters.
wc was actually slower with LC_ALL=C as opposed to ru_RU.UTF-8 that my system normally runs with (about 10 s against 7.2 s).
Which actually raises a good question of whether I should have been comparing with that one — but that'd probably raise more questions and lead to more people accusing me of cheating in favour of Haskell.
The only way I can think of this could be true is if you made a mistake in setting an env var.
Well, counting characters is obviously different from counting bytes when the characters are UTF-8 encoded, or UTF-32.
Write me a haskell program that will init DRAM on PinePhone, setup PMIC and eMMC, load 25MiB of linux kernel and initramfs from eMMC to DRAM and will fit in hard limit of 32kB, and will do all of this in 300ms max.
Then I'll consider C destroyed.
Destroying Haskell with 1 line of C: https://www.ioccc.org/2019/burton/prog.c
I don't think it is very "hard" to "destroy" most of these programs, they were written a long time ago, they evolved with backwards compatibility or portability in mind, these can run on pretty much any system. It does seem a bit unfair to compare it to a quickly hacked together program that you test against one use case.
Like the other said, the second article will probably be more interesting.
But multiple people have been working full time on wc for decades! At least, if you believe the article.
Decades yes, but who said “full time”?
I've been working full time on 'wc' for only the last five years, since being promoted from full time working on 'cat'.
Once I've wrung every last drop of performance out of wc, I can move on to the next largest limiting factor in GNU performance, 'tail' :-)
That's what "decades of man-hours" means.
I would really like to see fewer of these clickbait post titles. The author honestly admits here (props!) https://news.ycombinator.com/item?id=22235536 that they only used the title because it encourages discussion.
In that case, I would say that it's up to the community to not take clickbait titles like this seriously if we want to encourage reasoned, detailed content rather than borderline flaming with words like "destroy" and "smash". I personally don't enjoy this new Buzzfeed style of technical post at all.
Therefore, I have some comments about the actual code here, but going to keep them to myself so I don't encourage more people to follow OP's example.
I don't like these "Destroying C" titles.
Me neither. But experience shows that these titles lead to more folks looking at the post, leading to more feedback, leading to better writing/experimenting/etc in the long run.
Also, I was trying to make a reverence to the original post that was "beating C", with the connotation of further improving on that. I'm not a native speaker so my language model might be terribly flawed.
Ah, the weekly "I beat C using X language".
I don't understand why LOC was even brought up. You can put all your code on a single line in most languages. The Github code linked even has 26 lines of Haskell which makes this even more nonsense.
Not unimpressive, but this a bit of a hot take. If I'm reading it right, a single file benchmark (Big O who?) against a fully fledged production C program is not a complete measurement
Related: I have an F# snippet i run from Linqpad to keep or remove lines (based on keywords) from huge log files and it takes just a few seconds to breeze through the log and it’s done
I take it this is “destroying” in the sense of presidential debates or late-night satirical monologues, i.e. diverting but meaningless.
He is comparing a multi-threaded haskell version against a single-threaded C version???
"...There’s also a parallel version that relies on the monoidal structure of the problem a lot, and that one actually beats C"
coreutils wc is single-threaded, just checked.
From the introductionBut that post left me wondering: is it possible to do better without resorting to parallel processing? Turns out the answer is yes.Yes, your skimming was inaccurate.
Happy 33rd anniversary of Haskell devs claiming to destroy C.
os: CentOS Linux release 7.7.1908 (Core) cpu: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Average time for wc on the same test file: 0.8 seconds
Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz CentOS Linux release 7.7.1908 (Core)
average time: 0.8 seconds
> So we’ve managed to just smash a C program that was looked at by thousands of eyes of quite hardcore low-level Unix hackers over a few decades. We did this with a handful of lines of pure, mutation-less, idiomatic Haskell, achieving about 4 to 5 times of throughput of the C version and spending less than an hour on all the optimizations.
I've done many very arrogant things in my life, because I've been a strange guy with lack of self-esteem who doesn't have a clue how many things he doesn't know. I hope I've never been this arrogant, though.
Those are fairly trivial and well-known optimizations that I did (and I by no means am an expert in writing high-performant code), so all the honors go to GHC authors.
Thanks for replying. I want to tell you that I feel deep regret for my impulse to publically shame you - even though I've gotten a lot of points for this comment, and did not really receive criticism for it.
Hey - if you make performance optimizations and compare implementations, it's probably best not to jump to quick conclusions. I would advise to brush up on C to get a feel for performance. Or, in times where it's popular to shit on C, and claim that it's not really a low-level language anymore, I would advise to write some assembler (which I've barely done). In practice it's unlikely that you find yourself in a situation where you can write code in a high-level language that runs considerably faster than what you could realistically write in C. The only situation where that can happen that I can see, is when the program does something that is so complicated but seems so arbitrary that you simply can't bring yourself to invest more time than what you need to hack together a quick Python script, and what you would be willing to write in C would be not even the asymptotically best approach that you can see.
For a simple program like wc, that situation does not apply, and if you find surprising results, it's best to first check for other possible reasons than "thousands of graybeards were wrong".
No worries, that's a natural reaction!
> In practice it's unlikely that you find yourself in a situation where you can write code in a high-level language that runs considerably faster than what you could realistically write in C.
I do way more C++ (in fact, I don't do pure C at all), and aliasing has bitten me and my code performance more often than I'd like. While there are workarounds, I'd probably consider spending time and effort on them as rather unrealistic in a sense. So it surely doesn't contradict my world model if a language with a stricter type system (Haskell? Rust? ATS anyone?) achieves better results on at least some of the tasks with less effort and less dependence on implementation details.
Although ironically I'm going to write something low-level for the Haskell bytestrings library today evening, in C with intrinsics (so almost assembly modulo stuff like register allocation).
The github repo description is equally distasteful too:
> wc implemented in Haskell (significantly faster than GNU coreutils version — oops I did it again
For reference, I'm referring to the "oops I did it again" part. It's really hard to take that comment as "honours go to GHC authors".
Also, I suggest you try running the GNU wc with unicode turned off because unicode is computationally expensive and you're deliberately disabling unicode support in your own code anyway. I appreciate you said you'd add in the edge cases that GNU does in your next blog post but disabling unicode in GNU for this benchmark would show good faith that you're at least trying to compare like for like. And if GHC still out performs then you can at least legitimately say:
> My code outperforms GNU for non-unicode strings
Which currently you cannot because your claim is based on incorrect benchmarks.
> For reference, I'm referring to the "oops I did it again" part. It's really hard to take that comment as "honours go to GHC authors".
Was overly excited when I created the repo after obtaining the first results. Childish indeed, thanks for reminding, fixed.
> Also, I suggest you try running the GNU wc with unicode turned off because unicode is computationally expensive and you're deliberately disabling unicode support in your own code anyway.
I tried running wc as `LC_ALL=C wc file.txt`, and it (surprisingly for me) resulted in worse run time for wc (my default locale is ru_RU.UTF-8 for comparison). This reproduced on two machines of mine and also on a machine of a friend of mine who also gave my code a shot.
My bad for omitting this in the post, I'll update it accordingly.
wc running slower seems wrong. I wonder if it's doing additional sanitising because your LANG differs from your LC_ALL. I'd need to read through the code to get a handle on what it does and doesn't expect though but I definitely wouldn't expect wc to run slower.
Perhaps - just possibly maybe perhaps - he was suggesting its is his superior tool and technology that allowed his team to do this.
Do you think you took the most generous possible interpretation possible?
Given that I was reminded of myself, I concede it might be closer to the least generous one.
He is comparing apples and oranges and his conclusion can only be characterized as widely misleading and off the mark.
The most generous possible interpretation is to say that he's not being intentionally misleading but actually believes what he writes. I fail to see how that improves matters.
> he was suggesting its is his superior tool and technology that allowed his team to do this
He really thinks it makes sense that a "C program that was looked at by thousands of eyes of quite hardcore low-level Unix hackers over a few decades" is that much slower than something he threw together? That's like doing some back of the napkin math in a bar and declaring relativity wrong (while also not being Edward Witten). Any reasonable person would suspect that this isn't a very likely outcome and dig deeper to figure out if that's really true or not. My guess is he just implemented a very trivial and naive version of the C code and that's why it is faster.
It would be great if your post added value in the discussion.
The post is very insightful and well written. The title is provocative which is very common nowadays. However, there should be a "serious" section that puts things into perspective. As others have pointed out it leaves a bit of a taste otherwise.
Those are serious flaws. I think it’s misleading, not insightful at all.