Settings

Theme

Go enjoy Python3

blog.surgut.co.uk

106 points by crncosta 10 years ago · 75 comments

Reader

crawshaw 10 years ago

There are several ways to solve this in Go. The first that comes to mind, assuming you want to truncate to the first 12 runes, not bytes:

        func main() {
            v := []rune(os.Args[1])
            if len(v) > 12 {
                v = v[:12]
            }
            fmt.Println(string(v))
        }
Or more in the spirit of the C example in the post:

        func main() {
                res := make([]rune, 12)
                copy(res, []rune(os.Args[1]))
                fmt.Println(string(res))
        }
Note that res will stay on the stack, just like C.

I expect the author is trying to say something about Go that I'm not quite getting. Perhaps that it is not an expression-based language, so to make code readable you need to make use of multiple statements. That's by design, but I understand it may be unappealing if you want to program in an expression-heavy style.

  • jerf 10 years ago

    "I expect the author is trying to say something about Go that I'm not quite getting."

    I assume "Go sucks because, look, this one weird case is a bit ugly." (that is, as rhetoric, not dialectic; it is not literally claiming "one case bad" -> "Go is bad" in the logical sense.) A weird case that I've programmed many thousands of lines of Go code in but never once encountered. Taking a slice out of a string blind like that is actually a bit rare; usually in some way it turns out you actually have length information somewhere in the environment. It's hardly like "slice index out of bounds" is some sort of terrible error... it is, at least, arguable that Python is in the wrong here for being so willing to return a string generated by [0:12] that is not 12 bytes/characters in length, which seems like a reasonable assumption to make of such an operation.

    Now, if we want to talk about little examples like this, let's talk about sending on something like a channel in Python, to say nothing of Python's implementation of the "go" keyword... oh, yes, I see, suddenly this is an unfair way to compare languages.

    Yes, it is.

    • bsaul 10 years ago

      This posts shows two very common issues that programmer have with the GO language when they start using it (that includes me), especially since go is advertised as compiled with the feeling of a dynamic language :

      A low-level feeling when manipulating arrays (or slice), and a poor support for generic functions ( that would be math.min in this example).

      • jerf 10 years ago

        If it said that explicitly, I'd be fine with it.

        But given the last paragraph, I don't think that's the most likely interpretation.

        And it's still a terrible way to judge languages without a lot more context. All langauges have gotchas that fit into 3-5 lines. Python's got a pretty decent set: https://www.google.com/search?q=python%20gotchas It's still a good language.

        And let me be very clear: I'm not "defending" Go here... I quite like both Python and Go. I've got no trouble saying Python is incrementally easier than Go when it comes to dealing with strings (but both are beat by Perl). (Especially since the incremental advantage comes at a stiff performance price. Sometimes that's fine, sometimes that's not.) I'm specifically saying as computer language polyglot, this metric for measuring languages is terrible. It's a rationalization, not a rational argument.

        • bsaul 10 years ago

          I see your point, but after having coded a full (minor) project in Go, i can assure you that those two points alone (cumbersome array data structure and lack of generic code) made me rethink twice about using this language for the common "web service for CRUD to DB" use.

          Then i tried to see how did go data access layer libraries look and it finished to convince me not to use it unless performance and memory usage were a crucial matter.

    • rdtsc 10 years ago

      > , let's talk about sending on something like a channel in Python:

        import Queue; q=Queue.Queue(); q.put(1)
      
      > to say nothing of Python's implementation of the "go" keyword...

      Why would Python have a go keyword? Go doesn't have the "except" keyword that Python has, not sure what the point it?

    • pekk 10 years ago

      Go is frequently presented as a replacement of Python. When people hear that, it sets up an expectation that Go will have the same pleasant qualities of Python, when it doesn't, any more than Python has goroutines.

      • nkozyra 10 years ago

        Go is more frequently presented as a replacement for C++/Java with a syntax that feels more like an interpreted language like Python or Ruby.

        I find that to be totally true. It's certainly lighter to write in than C++ or Java, "go run" effectively feels like running the interpreter and eschewing {} and ; lends to the latter, as well.

        And Python has concurrency options as well - "goroutines" is, obviously, relegated to Go.

        • pekk 10 years ago

          Interviews with Pike et al. have always made clear that Go actually was made to compete with Java and C++. I wouldn't argue with that.

          But the rank and file as represented on HN among other places presents Go as a replacement for Python all the time. It's one of the most common memes about Go. And this sets up expectations Go wasn't designed to fulfill. When a new user honestly reports that Go doesn't fulfill those expectations, we yell at him as for making a dishonest and unfair comparison when really, we set up the dishonest and unfair comparison ourselves when we promoted Go as a replacement for Python. As long as we continue to promote Go that way, we should expect people to compare them, and we shouldn't yell at them for making honest reports that Go and Python are different in ways they are designed to be.

          • derefr 10 years ago

            I don't think anyone explicitly marketed Go as a replacement for Python; instead, Go was instead marketed as, for some use-cases (low-level-ish software) what you should have been using in the first place—places where you should have been using C++/Java, not Python, but where Python was used anyway because the alternatives were too unwieldy.

  • Jabbles 10 years ago

    fmt.Printf("%.12s", os.Args[1])

  • pjmlp 10 years ago

    I assume it has to do with Unicode support.

masklinn 10 years ago

> Simple enough, in essence given first argument, print it up to length 12. As an added this also deals with unicode correctly

That's not true, Python 3 uses codepoint-based indexing but it will break if combining characters are involved. For instance:

    > python3 test.py देवनागरीदेवनागरी
    देवनागरीदेवन
because there is no precombined version of the multi-codepoint grapheme clusters so some of these 10 user-visible characters takes more than a single you end up with 8 user-visible characters rather than the expected 10.

edit: the original version used the input string "ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇȋȏȗ" where clusters turn out to have precomposed versions after all. Replaced it by devanāgarī repeated once (in the devanāgarī script)

  • Veedrac 10 years ago

    The easy Python way:

        import sys
        import regex
        print(regex.match("\X{,12}", sys.argv[1]).group())
    
    with the regex[1] package that should be in the stdlib Any Day Now™.

    [1]: https://pypi.python.org/pypi/regex

    • Spiritus 10 years ago

      Interesting, I had no idea the `re` module was getting revamped. Scheduled for 3.5 or later?

      • Veedrac 10 years ago

        Certainly not 3.5, although a few years ago I would have told you almost the exact opposite.

        I wouldn't hold your breath. The issue tracker[1] suggests 3.7 or 3.8 as optimistic. Guido made some comment somewhere relatively recently, but I can't find where. It's entirely possible it will never actually happen; time doesn't seem to have made people more enthusiastic.

        It's a shame, because the new module is awesome.

        [1] http://bugs.python.org/msg230846

  • stevenbedrick 10 years ago

    Yup. A long time ago, while working on a project with some particularly gnarly Unicode issues, I got in the habit of thinking in terms of grapheme clusters instead of code points (or "characters", for whatever definition of "character" one wishes to use), and it has served me very well. Combining characters pop up in the most interesting places, often where and when you least expect them! ٩(•̃̾●̮̮̃̾•̃̾)۶

    Ruby's unicode_utils gem has a nice implementation of the standard grapheme cluster segmentation algorithm, and Python's wrapper around ICU works quite well. Go's concept of runes is certainly an improvement, but it doesn't handle combining characters out of the box...

    • masklinn 10 years ago

      > Combining characters pop up in the most interesting places, often where and when you least expect them! ٩(•̃̾●̮̮̃̾•̃̾)۶

      The good news is Unicode 8 will make them way more frequent! (alternate emoji skin colors are specified via combining characters) much as Unicode 6 made astral characters way more "in your face" (by standardising emoji in the SMP)

  • hahainternet 10 years ago

    That's a shame, it works as you'd expect in perl6:

      sub MAIN($s) { say $s.substr(0,12) }
    
      $ perl6 test.p6 ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇȋȏȗ
      ǎěǐǒǔa̐e̐i̐o̐u̐ȃȇ
    • masklinn 10 years ago

      Turns out there are precomposed versions of these clusters, so your system might just be using these.

      Could you retry with the input "देवनागरीदेवनागरी"?

      • hahainternet 10 years ago

        I'm not quite sure how to interpret the output as it doesn't render particularly kindly in my terminal:

          sub MAIN($s) {
          	say "{$s.chars}: $s";
          	my $b =  $s.substr(0,12);
          	say "{$b.chars}: $b";
          }
        
          $ perl6 hn-test2.p6 देवनागरीदेवनागरी
          16: देवनागरीदेवनागरी
          12: देवनागरीदेवन
        • masklinn 10 years ago

          So apparently perl6 is also "wrong" and operates on codepoints, your system composed my original string and each (base, diacritic) pair was pasted as a single precomposed character (I expect that if you try out the Python version on your system you'll also get the "right" answer).

          The new string is composed of 10 user-visible characters (5 character repeated twice) but 16 codepoints (and this time I carefully checked that there was no precomposed version):

              DEVANAGARI LETTER DA
              DEVANAGARI VOWEL SIGN E
              DEVANAGARI LETTER VA
              DEVANAGARI LETTER NA
              DEVANAGARI VOWEL SIGN AA
              DEVANAGARI LETTER GA
              DEVANAGARI LETTER RA
              DEVANAGARI VOWEL SIGN II
              DEVANAGARI LETTER DA
              DEVANAGARI VOWEL SIGN E
              DEVANAGARI LETTER VA
              DEVANAGARI LETTER NA
              DEVANAGARI VOWEL SIGN AA
              DEVANAGARI LETTER GA
              DEVANAGARI LETTER RA
              DEVANAGARI VOWEL SIGN II
          
          Operating on codepoints, both versions cut after the second DEVANAGARI LETTER NA (न) breaking that grapheme cluster (it should be ना) and not displaying the final two clusters ग and री.
          • raiph 10 years ago

            > So apparently perl6 is also "wrong" and operates on codepoints

            Yes and no. Yes, because the in-development Rakudo compiler is clearly currently giving the wrong result, and no because it operates on grapheme clusters (but has bugs).

            (You can work with codepoints if you really want to but the normal string/character functions that use the normal string type, Str, work -- or more accurately are supposed to work -- on the assumption that "character" == grapheme cluster; afaik it's supposed to match the Unicode default Extended Grapheme Cluster specification.)

            Fwiw I've filed a bug: https://rt.perl.org/Ticket/Display.html?id=125927

          • hahainternet 10 years ago

            Yeah you're right, a caveat in the docs says that current implementations aren't finished with this. I was under the impression the NFG work was done but I'll catch up with people on irc.

          • raiph 10 years ago

            > I expect that if you try out the Python version on your system you'll also get the "right" answer.

            I don't think so. In my tests standard python (2.7 and 3.5) ignores grapheme clusters.

            • masklinn 10 years ago

              Python ignores grapheme cluster, that point was about my original test case using grapheme clusters I later found out had precomposed equivalent, so a transfer chain performing NFC would leave the test case with no combining characters (or multi-codepoint grapheme clusters) left in it.

  • bmn_ 10 years ago

    Languages that cannot deal with graphemes are lame. I daresay this solution below should score 20 in OP's imaginary scale.

        $ perl -CADS -E'say $ARGV[0] =~ /(\X{5})/' देवनागरीदेवनागरी
        देवनागरी
    
    Length of input string is: 10 graphemes, 16 codepoints, 48 octets (UTF-8).

    Length of output string is: 5 graphemes, 8 codepoints, 24 octets (UTF-8).

flohofwoe 10 years ago

Doesn't the C version have a serious bug? If the input string has 12 or more characters, the destination string will not be zero-terminated.

From the strncpy docs:

"No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case, destination shall not be considered a null terminated C string (reading it as such would overflow)."

  • ansible 10 years ago

    I'm usually sticking +1s to the storage for any strings for this purpose. So if I want to operate on MAXLEN number of characters, I'll allocation MAXLEN+1 for the character array.

    And often times I'll be memset()'ing the destination to all NULLs when doing a string copy operation. I'm not real happy with string handling in C... as if that should be surprising to anyone.

    Say, is there nice, small, suitable for embedded use string library anyone would care to recommend in C? I just want a nice string type that carries around its length and storage length, handles copies properly, and has the usual utilities. I suppose I could just write one...

Ianvdl 10 years ago

The author awards some arbitrary points to C even though his implementation of the solution is broken. His similarly poor Go implementation receives zero of these arbitrary points.

Why does this deserve the attention of everyone here? The author did not compare languages, he compared his aptitude with these languages, and considered broken implementations to somehow be comparable.

A more meaningful comparison would be to implement simple, efficient, working solutions to these problems and comparing them. This, as it stands, does not lead to any useful discussion.

BinaryIdiot 10 years ago

I'm not sure what the takeaway is from this blog entry. Is it that Python 3 can do substrings easier than the other languages therefore we should use Python 3? That was what I thought it was, anyway.

Seems silly to pick a language based off this single, silly criteria otherwise why not JavaScript or probably other languages that can make the code even smaller?

console.log(mystring.substring(0, 12));

So it just seems arbitrary and weak in my opinion.

  • steeleduncan 10 years ago

    The entire scenario seems to have been constructed to highlight the runtime panic caused by out of bounds slices in Go. Either that or the well-known and well-discussed lack of generics.

_kst_ 10 years ago

There are at least three major flaws in the 7-line C program, even ignoring character set issues. (main returns int, argv[1] can be null, and strncpy doesn't always null-terminate the target). If you're going to compare languages, you should find someone who knows each of them well.

Daishiman 10 years ago

The Unicode situation in most languages is dismal.

Honestly though, the lack of generics for that Math.min function makes me happy I'm not programming in Go.

BossHogg 10 years ago

Article content aside, the slide out side menu that covers the scroll bar is incredibly annoying. Is that Blogger? Whatever it is needs to stop. Now.

Sir_Cmpwn 10 years ago

The C code there fails if the unicode string includes characters whose width is greater than one octet.

  • zokier 10 years ago

    Which is noted right in the post:

    > This treats things as byte-array instead of unicode, thus for unicode test it will end up printing just 車賈滑豈.

    • rakoo 10 years ago

      Which is useless then, because the output can't safely be considered a string anymore. I don't really see the point of writing the C "equivalent" and giving it any point when it doesn't even do the right thing.

      • masklinn 10 years ago

        None of the snippets comes even remotely close to doing the right thing so it doesn't really matter.

darkstalker 10 years ago

Rust version:

    fn main()
    {
        if let Some(arg) = std::env::args().nth(1)
        {
            println!("{}", arg.chars().take(12).collect::<String>()); // chars() iteraters over codepoints
        }
    }
  • Veedrac 10 years ago

    Idiomatic Rust would probably avoid allocations, which means something more like

        fn main() {
            if let Some(arg) = std::env::args().nth(1) {
                println!("{}", {
                    match arg.char_indices().nth(12) {
                        Some((idx, _)) => &arg[..idx],
                        None => &*arg
                    }
                });
            }
        }
    
    With the `unicode-segmentation` crate[1], you can just swap `char_indices()` with `grapheme_indices(true)`.

    [1] https://crates.io/crates/unicode-segmentation

Skunkleton 10 years ago

How is this on the front page of hacker news? What a shit post.

edofic 10 years ago

A mandatory smart-ass Haskell response

    import System.Environment (getArgs)
    main = do
      [str] <- getArgs
      putStrLn $ take 12 str
  • nicolast 10 years ago

    Now with more operators!

        import System.Environment (getArgs)
        main = putStrLn =<< take 12 . head <$> getArgs
    
    ;-)
  • joeyh 10 years ago

    The actual smart-ass haskell response is simply "take 12". The spec didn't specify this needed to be a impure shell command, so a pure function is obviously better.

  • coldtea 10 years ago

    Well, for smart-ass (and I know you meant it as a joke) is not very impressive. Don't do anything more than the others, and the syntax is not so great either.

    • Veedrac 10 years ago

      On the contrary, his is the only one that crashes when more arguments than expected are passed. Hooray progress!

_pmf_ 10 years ago

Of course, the C version could be just

    printf("(%.12s)\n", argv[1]);
  • pjmlp 10 years ago

    Assuming using 7 bit ASCII

    • _kst_ 10 years ago

      No, it merely assumes one byte per character. For example, it would work correctly in Latin-1 or EBCDIC.

      In any case, the problem statement (though it's a bit vague) requires building a truncated string, not just printing it.

jackielii 10 years ago

why can't I downvote this!!! erhhhh

IshKebab 10 years ago

Now try distributing your Python code as a single statically linked exe.

chapium 10 years ago

Completely off topic, so if you are looking for discussion about the article skip this.

The low contrast ratio and bright colors on this blog are a bit hard to read. I normally switch to readability mode in safari when I encounter this, but the sites layout prevents this from working.

  • jofer 10 years ago

    The text is black on white... Am I missing something?

  • BinaryIdiot 10 years ago

    Hmm, are you referring to something very specific? The contrast ratio is incredibly high (black text on white background). The navigation bar has terrible contrast but that's all I saw.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection