Settings

Theme

Destroying C with 20 lines of Haskell: wc

0xd34df00d.me

27 points by develop7 6 years ago · 73 comments

Reader

notacoward 6 years ago

This doesn't seem to be comparing anything like the same thing. Does the Haskell version really do the same thing as the C version? Does it handle all of the same error cases, providing the same quality of error messages if they occur? Does it handle localization? If not, that makes the comparison very skewed, as unhammer already pointed out. Sure, if you strip out all of the things that the people who wrote wc actually spent all that time on then you can go faster, but failing to note the differences is simply dishonest.

  • 0xd34df00d 6 years ago

    Sup, author here.

    > Does the Haskell version really do the same thing as the C version?

    It counts bytes, words and lines and, modulo intended Unicode space handling and unintended bugs, does the same thing.

    Indeed, it does not count things like max line length or char count, but those can be plugged in without significant performance overhead (and that's what the second part is gonna be about).

    > Does it handle all of the same error cases, providing the same quality of error messages if they occur?

    There are no error messages at this point. Although I don't really see how this should affect performance.

    > Does it handle localization?

    If you mean counting multi-byte characters, then not yet. Although I'm pretty convinced it does not require doing something much more complicated — but maybe I'm wrong, we'll see in the next part.

    Also, thanks for the feedback, those are important questions! Something to keep in mind when writing subsequent posts.

    • ColinWright 6 years ago

      Reproducing what your code does in the most simple, naive C program possible, I can beat the existing wc utility, taking only around 40% of the time that wc takes.

      So until you put in locale handling, alternate line endings, option handling, and error handling, I don't see that your post is at all convincing.

      Quite the opposite.

      So I look forward to a Haskell version that supports everything the wc has so we can get a fair comparison.

      • 0xd34df00d 6 years ago

        > So until you put in locale handling, alternate line endings, option handling, and error handling, I don't see that your post is at all convincing.

        That's precisely what the second part would be about.

        And, if I succeed, IMO, that's where Haskell would really shine (because composability and local reasoning), and where I would be able to claim to achieve something — the stuff in the post we're discussing is indeed trivial (I didn't want to say that in the post itself though as I think it'll look like I'm belittling the guy who did the original post), while modularizing this is _fun_.

        • ColinWright 6 years ago

          FWIW, my version, now compiled with -O3, is 8 times faster than wc. And I haven't even tried to optimise it.

          I look forward to your results.

        • eesmith 6 years ago

          Here's the source code for GNU wc - https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;... .

          It's easy to see that it's not the result of 10+ years of low-level optimizations to eek out the most performance.

          Your test code probably hits the MB_CUR_MAX>1 path at line 361. (Check your locale setting!)

          The main loop is:

             402   if (!in_shift && is_basic (*p))
             403     {
             404       /* Handle most ASCII characters quickly, without calling
             405          mbrtowc().  */
             406       n = 1;
             407       wide_char = *p;
             408       wide = false;
             409     }
              ...
             443   switch (wide_char)
             444     {
             445     case '\n':
             446       lines++;
             447       FALLTHROUGH;
             448     case '\r':
             449     case '\f':
             450       if (linepos > linelength)
             451         linelength = linepos;
             452       linepos = 0;
             453       goto mb_word_separator;
             454     case '\t':
             455       linepos += 8 - (linepos % 8);
             456       goto mb_word_separator;
             457     case ' ':
             458       linepos++;
             459       FALLTHROUGH;
             460     case '\v':
             461     mb_word_separator:
             462       words += in_word;
             463       in_word = false;
             464       break;
             465     default:
             466       if (wide && iswprint (wide_char))
                            ....
             480       else if (!wide && isprint (to_uchar (*p)))
             481         {
             482           linepos++;
             483           if (isspace (to_uchar (*p)))
             484             goto mb_word_separator;
             485           in_word = true;
             486         }
             487       break;
             488     }
          
          This is a much more complicated implementation than your code. Among other things, note how it uses isprint/iswprint on each character, and how these are locale dependent.

          Even in when character = byte, the main loop uses the same logic:

             555     default:
             556       if (isprint (to_uchar (p[-1])))
             557         {
             558           linepos++;
             559           if (isspace (to_uchar (p[-1]))
             560               || isnbspace (to_uchar (p[-1])))
             561             goto word_separator;
             562           in_word = true;
             563         }
             564       break;
             565     }
          
          Your benchmark only uses the characters:

              "\n\r ',-./0123456789ABCDEFGHIJKLMNOPQRSTUVYZabcdefghijklmnopqrstuvwxyz"
          
          which means it comes nowhere near being a good test which verifies the two programs do the same thing.

          The following should be a more difficult test set to reproduce wc's output. I create the test set with Python:

              with open("testset.dat", "wb") as f:
                for b1 in range(256):
                  for b2 in range(256):
                    for b3 in range(256):
                      _ = f.write(bytes((b1, b2, b3)))
          
          then print the output using two different locales:

              % env LANG=en_US.UTF-8 wc testset.dat
                196608 1523713 50331648 testset.dat
              % env LANG=C wc testset.dat
                196608 1152001 50331648 testset.dat
          • 0xd34df00d 6 years ago

            This is a great point about handling printable vs non-printable characters that I originally missed when I read wc code. Thank you for pointing this out!

            • eesmith 6 years ago

              Could you elaborate on how you read the wc code?

              I ask because the essay and your comments until now show no insight from reading the code.

              I find it difficult to understand how anyone could miss that (rather significant) part of the core algorithm, and then assert the differences are due to only "modulo intended Unicode space handling" and the like.

              Until now I had assumed you had a lay understanding of wc, and had not read the code.

              • 0xd34df00d 6 years ago

                I was mostly curious about how wc handles spaces and whether ignoring non-ascii spaces brings me closer or farther from what wc does. So I focused on that, and this specific printable characters handling didn't caught my eye.

                On a meta level, I wasn't even considering that the notion of a word might be different from "a sequence of characters that aren't space characters".

                Live and learn indeed.

  • acl777 6 years ago

    Good point. I was thinking the same thing too.

    I will wait for what the next article presents, as the end of this article states:

    > Stay tuned for a second part, where we will investigate shaping all this into an actual wc substitute, where different statistics can be turned on or off, all while not computing what the user hasn’t requested.

  • phonebucket 6 years ago

    > This doesn't seem to be comparing anything like the same thing.

    This is a fair point, and I believe this whole series of 'Beating C with foo' posts could have been better named.

    But I'm of the opinion that the whole series is about showcasing various language's strengths and weaknesses, while using GNU wc as a benchmark.

    From this perspective, I've learnt a bit about several languages I knew nothing about, so I rather like these articles, despite the fact that their titles might be a bit misleading.

    • thanatropism 6 years ago

      It's all about tone.

      Everyone remembers being a brash know-nothing teenager, so just by the headline "DESTROYING C" you get that vibe.

      I remember having a blog about Haskell circa 2002 where I would smugly enumerate the ways in which Guido van Rossum was wrong. "GUIDO IS WRONG! PART 4" my post titles would say.

  • hnlmorg 6 years ago

    He's not handling unicode and it's been well demonstrated in the past that unicode handling adds a lot to computation time in tests like these (it often comes up in grep benchmarks too).

    I'd be interested to see the same tests but with a non-unicode local set, IIRC:

        LC_ALL=C
    
    I'd wager the C version would perform much better in that test.
    • jstimpfle 6 years ago

      Don't forget to export the variable, or run like "LC_ALL=C wc ...", or otherwise the locale won't apply.

  • jacobush 6 years ago

    Does it handle multiple files, and stdin?

    Anyway, destroying Haskell with 100 lines of C :-)

    https://raw.githubusercontent.com/gdevic/minix1/master/comma...

    • hrgiger 6 years ago

      I see and raise, to one line:

      https://www.ioccc.org/2019/burton/prog.c

      It is the Burton entry from The International Obfuscated C Code Contest, but seems like only handling ascii

      https://www.ioccc.org/years-spoiler.html

      Limits:

      "Requires the C locale and ASCII character set. Input should be less than ten million octets to avoid this problem."

      https://www.ioccc.org/2019/burton/hint.html

    • _v7gu 6 years ago

      Isn't 100 lines too much for a simple utility like wc? I get that it has many edge cases to cover, but edge cases usually require some different handling when you run into them anyway. I'd rather use my one-liner (including calculation in a single pass, and even parallel processing!):

          wc:{sum (({1};ceiling 0.5*sum differ " "=;{1+count x})@\:) peach read0 x}
    • jstimpfle 6 years ago

      Do you get a speed-up if you use getc_unlocked() instead of getc()? And if you write your own isspace()? As far as I know, isspace() is locale sensitive.

      • sauerbraten 6 years ago

        isspace is defined as a macro at the top of the file, and it's definitely not locale sensitive ;)

  • noxToken 6 years ago

    Off-topic but tangentially related. There was Reddit post[0] about a clothes dryer's eco mode being a scam since it consumes more power than a regular drying cycle. Here[1] is a single comment thread noting that the user was measuring power output of the entire house. The user quickly changed their tune when this was highlighted.

    [0]: https://old.reddit.com/r/dataisbeautiful/comments/eyevca/my_...

    [1]: https://old.reddit.com/r/dataisbeautiful/comments/eyevca/my_...

  • dddbbb 6 years ago

    They explicitly say in the introduction that this is a toy version, and then in the conclusion that a future post will look at a more complete substitute.

  • shawnz 6 years ago

    > failing to note the differences is simply dishonest.

    Like yourself, I think any reader could understand that this is meant to be a toy implementation. The author even says so in the very first paragraph and at several points in the article thereafter. How is that dishonest?

    • ckastner 6 years ago

      Because that makes the programs sufficiently different that the conclusion about "smashing" seems dishonest.

      Apples smash Oranges, etc.

ColinWright 6 years ago

So I just wrote the most naive version of wc I could think of in C, matching the capability of this Haskell version, and I smoked the system wc ... my code, unoptimised, was over twice as fast.

$ time wc Backups/Tera2/files.txt

  1123699   2283439 161361844 Backups/Tera2/files.txt

  real 0m2.010s
  user 0m1.964s
  sys 0m0.020s
$ time naive Backups/Tera2/files.txt

  L: 1123699
  W: 2283439
  C: 161361844

  real 0m0.864s
  user 0m0.835s
  sys 0m0.028s
Smoked !!

(See my comment elsewhere[0] for a more sensible comment)

[0] https://news.ycombinator.com/item?id=22234673

--------

Update: I just compiled with -O3 and got user time of 0.24 secs. This version is 8 times faster than the system wc.

  • felixge 6 years ago

    Just put together a naive version of wc in Go and got similar perf gains over the wc that ships with macOS.

    That being said, unlike the OP, I don't think it means anything as wc is likely supporting more features than my program.

        $ time wc test.txt
         16500000 49252094 2059004431 test.txt
    
        real 0m5.930s
        user 0m5.491s
        sys 0m0.374s
    
        $ go build wc.go && time ./wc test.txt
        16500000 49252094 2059004431
    
        real 0m2.947s
        user 0m2.620s
        sys 0m0.299s
    
    https://gist.github.com/felixge/aa70fc97e893a7eb0bd4c801f8f5...
  • zzzcpan 6 years ago

    I got 5.6x speedup on just wc alone with:

      export LANG=C
    
    (obviously in both cases with prewarmed filesystem cache)
ColinWright 6 years ago

Not being great at reading Haskell, I have some questions I was hoping people here could answer:

* Does this cope with different whitespace, such as tabs?

* Does this cope with different settings of locale?

* Does this include the option of the "longest line"?

* Does this perform the character counts?

I'm pretty sure wc does all these, and that stripping them out would make it faster. If this Haskell version doesn't do that, and yet still compares against a fully-featured version of wc, the comparison hardly seems fair.

  • _pvxk 6 years ago

    * isSpace handles tabs, but looking at a single byte at a time it won't handle all the multibyte space symbols you can have in unicode. If you read further down, they rip out the remains of unicode handling for further speed improvements.

    * Looking at a single byte at a time, it presumably only handles the "C" locale :) They don't say what locale GNU wc was tested with (if it's not LANG=C, that benchmark should be re-run)

    * --max-line-length? no. But I'm guessing GNU wc isn't benchmarked with that option on (can't find the invocation in the blog post though)

    * data State { ws, bs, ls } keeps count of words, bytes (more honest than calling it characters) and lines.

    • ColinWright 6 years ago

      Thanks for the reply ...

      > ... further down, they rip out the remains of unicode handling ...

      Ah. Well, that makes it a little unfair, surely.

      > Looking at a single byte at a time, it presumably only handles the "C" locale ...

      Again.

      > --max-line-length? no. But I'm guessing GNU wc isn't benchmarked with that option on

      I wonder if wc does the work anyway, and only reports it if asked, or if it actually changes the code path if it's not needed.

      So this entire post feels ... intellectually dishonest. personally I'm all in favour of Haskell, and I wish I had the chance to use it "in anger" rather than just doing the occasional toy thingie that I do. But this post doesn't do it or its community any favours.

      Disappointing.

      • inimino 6 years ago

        Yes. "Destroying C". The whole post feels like youthful bravado untempered by experience.

  • matheusmoreira 6 years ago

    The source code of GNU wc:

    https://github.com/coreutils/coreutils/blob/master/src/wc.c

    The word counting algorithm does seem to be much more complex.

mkup 6 years ago

But how would Haskell version of wc compare with C version of wc running with LC_ALL=C environment variable? UTF-8 locale is much slower than C locale in coreutils, it's a well-known fact, and their Haskell version of wc is already using fixed 8-bit characters.

  • 0xd34df00d 6 years ago

    wc was actually slower with LC_ALL=C as opposed to ru_RU.UTF-8 that my system normally runs with (about 10 s against 7.2 s).

    Which actually raises a good question of whether I should have been comparing with that one — but that'd probably raise more questions and lead to more people accusing me of cheating in favour of Haskell.

    • zzzcpan 6 years ago

      The only way I can think of this could be true is if you made a mistake in setting an env var.

    • classified 6 years ago

      Well, counting characters is obviously different from counting bytes when the characters are UTF-8 encoded, or UTF-32.

megous 6 years ago

Write me a haskell program that will init DRAM on PinePhone, setup PMIC and eMMC, load 25MiB of linux kernel and initramfs from eMMC to DRAM and will fit in hard limit of 32kB, and will do all of this in 300ms max.

Then I'll consider C destroyed.

GlitchMr 6 years ago

Destroying Haskell with 1 line of C: https://www.ioccc.org/2019/burton/prog.c

tsukurimashou 6 years ago

I don't think it is very "hard" to "destroy" most of these programs, they were written a long time ago, they evolved with backwards compatibility or portability in mind, these can run on pretty much any system. It does seem a bit unfair to compare it to a quickly hacked together program that you test against one use case.

Like the other said, the second article will probably be more interesting.

  • StavrosK 6 years ago

    But multiple people have been working full time on wc for decades! At least, if you believe the article.

    • iainmerrick 6 years ago

      Decades yes, but who said “full time”?

      • joosters 6 years ago

        I've been working full time on 'wc' for only the last five years, since being promoted from full time working on 'cat'.

        Once I've wrung every last drop of performance out of wc, I can move on to the next largest limiting factor in GNU performance, 'tail' :-)

      • StavrosK 6 years ago

        That's what "decades of man-hours" means.

quadrifoliate 6 years ago

I would really like to see fewer of these clickbait post titles. The author honestly admits here (props!) https://news.ycombinator.com/item?id=22235536 that they only used the title because it encourages discussion.

In that case, I would say that it's up to the community to not take clickbait titles like this seriously if we want to encourage reasoned, detailed content rather than borderline flaming with words like "destroy" and "smash". I personally don't enjoy this new Buzzfeed style of technical post at all.

Therefore, I have some comments about the actual code here, but going to keep them to myself so I don't encourage more people to follow OP's example.

bobowzki 6 years ago

I don't like these "Destroying C" titles.

  • 0xd34df00d 6 years ago

    Me neither. But experience shows that these titles lead to more folks looking at the post, leading to more feedback, leading to better writing/experimenting/etc in the long run.

    Also, I was trying to make a reverence to the original post that was "beating C", with the connotation of further improving on that. I'm not a native speaker so my language model might be terribly flawed.

jeen02 6 years ago

Ah, the weekly "I beat C using X language".

I don't understand why LOC was even brought up. You can put all your code on a single line in most languages. The Github code linked even has 26 lines of Haskell which makes this even more nonsense.

mhh__ 6 years ago

Not unimpressive, but this a bit of a hot take. If I'm reading it right, a single file benchmark (Big O who?) against a fully fledged production C program is not a complete measurement

throwawa66 6 years ago

Related: I have an F# snippet i run from Linqpad to keep or remove lines (based on keywords) from huge log files and it takes just a few seconds to breeze through the log and it’s done

iainmerrick 6 years ago

I take it this is “destroying” in the sense of presidential debates or late-night satirical monologues, i.e. diverting but meaningless.

TickleSteve 6 years ago

He is comparing a multi-threaded haskell version against a single-threaded C version???

"...There’s also a parallel version that relies on the monoidal structure of the problem a lot, and that one actually beats C"

coreutils wc is single-threaded, just checked.

  • divisonby0 6 years ago

      But that post left me wondering: is it possible to do better without resorting to parallel processing?
      Turns out the answer is yes.
    
    From the introduction
  • akdor1154 6 years ago

    Yes, your skimming was inaccurate.

divan 6 years ago

Happy 33rd anniversary of Haskell devs claiming to destroy C.

deathnoto 6 years ago

os: CentOS Linux release 7.7.1908 (Core) cpu: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz

Average time for wc on the same test file: 0.8 seconds

deathnoto 6 years ago

Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz CentOS Linux release 7.7.1908 (Core)

average time: 0.8 seconds

jstimpfle 6 years ago

> So we’ve managed to just smash a C program that was looked at by thousands of eyes of quite hardcore low-level Unix hackers over a few decades. We did this with a handful of lines of pure, mutation-less, idiomatic Haskell, achieving about 4 to 5 times of throughput of the C version and spending less than an hour on all the optimizations.

I've done many very arrogant things in my life, because I've been a strange guy with lack of self-esteem who doesn't have a clue how many things he doesn't know. I hope I've never been this arrogant, though.

  • 0xd34df00d 6 years ago

    Those are fairly trivial and well-known optimizations that I did (and I by no means am an expert in writing high-performant code), so all the honors go to GHC authors.

    • jstimpfle 6 years ago

      Thanks for replying. I want to tell you that I feel deep regret for my impulse to publically shame you - even though I've gotten a lot of points for this comment, and did not really receive criticism for it.

      Hey - if you make performance optimizations and compare implementations, it's probably best not to jump to quick conclusions. I would advise to brush up on C to get a feel for performance. Or, in times where it's popular to shit on C, and claim that it's not really a low-level language anymore, I would advise to write some assembler (which I've barely done). In practice it's unlikely that you find yourself in a situation where you can write code in a high-level language that runs considerably faster than what you could realistically write in C. The only situation where that can happen that I can see, is when the program does something that is so complicated but seems so arbitrary that you simply can't bring yourself to invest more time than what you need to hack together a quick Python script, and what you would be willing to write in C would be not even the asymptotically best approach that you can see.

      For a simple program like wc, that situation does not apply, and if you find surprising results, it's best to first check for other possible reasons than "thousands of graybeards were wrong".

      • 0xd34df00d 6 years ago

        No worries, that's a natural reaction!

        > In practice it's unlikely that you find yourself in a situation where you can write code in a high-level language that runs considerably faster than what you could realistically write in C.

        I do way more C++ (in fact, I don't do pure C at all), and aliasing has bitten me and my code performance more often than I'd like. While there are workarounds, I'd probably consider spending time and effort on them as rather unrealistic in a sense. So it surely doesn't contradict my world model if a language with a stricter type system (Haskell? Rust? ATS anyone?) achieves better results on at least some of the tasks with less effort and less dependence on implementation details.

        Although ironically I'm going to write something low-level for the Haskell bytestrings library today evening, in C with intrinsics (so almost assembly modulo stuff like register allocation).

    • hnlmorg 6 years ago

      The github repo description is equally distasteful too:

      > wc implemented in Haskell (significantly faster than GNU coreutils version — oops I did it again

      For reference, I'm referring to the "oops I did it again" part. It's really hard to take that comment as "honours go to GHC authors".

      Also, I suggest you try running the GNU wc with unicode turned off because unicode is computationally expensive and you're deliberately disabling unicode support in your own code anyway. I appreciate you said you'd add in the edge cases that GNU does in your next blog post but disabling unicode in GNU for this benchmark would show good faith that you're at least trying to compare like for like. And if GHC still out performs then you can at least legitimately say:

      > My code outperforms GNU for non-unicode strings

      Which currently you cannot because your claim is based on incorrect benchmarks.

      • 0xd34df00d 6 years ago

        > For reference, I'm referring to the "oops I did it again" part. It's really hard to take that comment as "honours go to GHC authors".

        Was overly excited when I created the repo after obtaining the first results. Childish indeed, thanks for reminding, fixed.

        > Also, I suggest you try running the GNU wc with unicode turned off because unicode is computationally expensive and you're deliberately disabling unicode support in your own code anyway.

        I tried running wc as `LC_ALL=C wc file.txt`, and it (surprisingly for me) resulted in worse run time for wc (my default locale is ru_RU.UTF-8 for comparison). This reproduced on two machines of mine and also on a machine of a friend of mine who also gave my code a shot.

        My bad for omitting this in the post, I'll update it accordingly.

        • hnlmorg 6 years ago

          wc running slower seems wrong. I wonder if it's doing additional sanitising because your LANG differs from your LC_ALL. I'd need to read through the code to get a handle on what it does and doesn't expect though but I definitely wouldn't expect wc to run slower.

  • pinkfoot 6 years ago

    Perhaps - just possibly maybe perhaps - he was suggesting its is his superior tool and technology that allowed his team to do this.

    Do you think you took the most generous possible interpretation possible?

    • jstimpfle 6 years ago

      Given that I was reminded of myself, I concede it might be closer to the least generous one.

    • armitron 6 years ago

      He is comparing apples and oranges and his conclusion can only be characterized as widely misleading and off the mark.

      The most generous possible interpretation is to say that he's not being intentionally misleading but actually believes what he writes. I fail to see how that improves matters.

    • AnIdiotOnTheNet 6 years ago

      > he was suggesting its is his superior tool and technology that allowed his team to do this

      He really thinks it makes sense that a "C program that was looked at by thousands of eyes of quite hardcore low-level Unix hackers over a few decades" is that much slower than something he threw together? That's like doing some back of the napkin math in a bar and declaring relativity wrong (while also not being Edward Witten). Any reasonable person would suspect that this isn't a very likely outcome and dig deeper to figure out if that's really true or not. My guess is he just implemented a very trivial and naive version of the C code and that's why it is faster.

  • dependenttypes 6 years ago

    It would be great if your post added value in the discussion.

b-3-n 6 years ago

The post is very insightful and well written. The title is provocative which is very common nowadays. However, there should be a "serious" section that puts things into perspective. As others have pointed out it leaves a bit of a taste otherwise.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection