Go Data Structures (2009)

research.swtch.com

72 points by zachorr 13 years ago · 26 comments

Reader

>As an aside, there is a well-known gotcha in Java and other languages that when you slice a string to save a small piece, the reference to the original keeps the entire original string in memory even though only a small amount is still needed. Go has this gotcha too.

That is no longer the case in Java. String.substring() now makes a copy. I think it doesn't matter much which of the two approaches a language takes as long as everybody knows it. This needs to be in the language spec and can't be an implementation issue.

aaronblohowiak 13 years ago

for historical note, this change was made in may 2012 Java 7u6
- krakensden 13 years ago
  
  That's kind of a big deal for a point release...
  - mitchty 13 years ago
    
    There is a good reason for it however. http://www.javaadvent.com/2012/12/changes-to-stringsubstring...
    
    enneff 13 years ago
    
    Still, it is a change that could dramatically affect the behavior of some programs. A program that takes many references to one string might consume a lot of memory when those references become copies.
    
    twotwotwo 13 years ago
    
    Here's more nitty-gritty from Oracle: http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-Ma...
    If I'm following, a Java string used to be a []char and offset/count ints, and this change let them drop those ints. You saved RAM if you had a lot of little strings, but paid for extra copying if you took lots of substrings.
    Go slices/strings don't have a pointer to the "original" backing array, just a pointer to the first byte in this (sub)string. It doesn't need extra fields to do substrings by reference.
    I think part of the technical reason for the different string headers is that the Java designers didn't want their GC to have to handle "internal pointers" into strings/objects (maybe for performance reasons?), whereas the Go designers decided to support 'em (maybe to support more C-like code in Go?).
    
    fauigerzigerk 13 years ago
    
    Go does not support internal pointers into strings. You have to use slicing for that.
    
    twotwotwo 13 years ago
    
    Sorry, I mean that there's an internal pointer in Go's in-memory representation of the string, not that there's a naked byte pointer directly visible to the programmer.
    Go's GC's support for internal pointers means it can use a pointer-and-length representation for substring references. Java's lack of support for them means its string representation needs a pointer to the start of the char array and a separate offset and count in order to do the same substring-reference trick. (And, I'm saying, that helps explain why Java and Go now do substrings differently.)
    There are other places where Go's ability to use internal pointers is exposed more directly to the programmer: for example, Go lets you take the address of an array element or struct field and pass around the resulting pointer.
    
    fauigerzigerk 13 years ago
    
    >Go's GC's support for internal pointers means it can use a pointer-and-length representation for substring references
    Only if the String class is implemented in pure Java, which it currently is. But it doesn't have to be that way. Oracle could go around the Java language features and implement the String class in native code just as Go does with several builtin types. You may be right that it would be more difficult to do than in Go because of garbage collector specifics.
    But I guess the real issue is a philosophical one. Is it a good idea to let the standard library use features that are not available to users of the language?
    
    twotwotwo 13 years ago
    
    I'm saying I think it would take GC rearchitecting for Java to be able use a pointer into the middle of the string in its internal String representation, because Java's GC, unlike Go's, is currently not built such that a pointer into the middle of anything keeps that thing "alive" for GC purposes; you have to have a pointer to the beginning. Sun made that choice that in hopes they could write a faster GC that way, I suspect.
    Given that GC design, Go's two-word substring references (pointer into middle of string + count) wouldn't work; even if String were a builtin, with the no-internal-pointers GC design it would need to be at least three words (pointer to start of string, offset, count).
    tl;dr of my larger point is--I think Java needed a few extra bytes/String to support substrings by reference because of how its GC works differently from Go's, and I think that explains why Java decided to remove its substring-by-reference trick while Go didn't. (And I'm not trying to say either way is worse, just trying to really grok why they're different.)

iand 13 years ago

Note that this is from 2009. Although the main details have not changed, the int type is more commonly 64 bits now (since 64 bit architectures are much more common)

codezero 13 years ago
Do you know what version that happens in? I tested on my 32 and 64 bit platforms with golang 1.1 and a static definition of an integer results in type int (which is explicitly 32 bit)
```
  package main
  import "fmt"
  import "reflect"
  func main() {
      i := 3
      z := reflect.ValueOf(i)
      fmt.Printf("%s\n", z.Kind()) 
  }
  // $./test
  // int
  // $
```
It's my understanding that this intentional and won't change, only explicit declarations of int64 are 64-bit.
- aaronblohowiak 13 years ago
  
  It is implementation-specific (from the spec:)
  There is also a set of predeclared numeric types with implementation-specific sizes:
  uint either 32 or 64 bits
  int same size as uint
  uintptr an unsigned integer large enough to store the uninterpreted bits of a pointer value
- enneff 13 years ago
  
  The size of int on 64-bit systems was increased to 64 bits as of Go 1.1: http://golang.org/doc/go1.1#int
  - codezero 13 years ago
    
    Cool, thanks for the clarification, this makes sense!
- iand 13 years ago
  
  i := 3 means declare i to be an "int", which is the default numeric type. The size of that int will vary from platform to platform. See http://golang.org/ref/spec#Numeric_types
  - codezero 13 years ago
    
    Awesome, thanks!

codeflo 13 years ago

I wish there were a way to create custom data structures without casting to and from interface{} all the time. Heck, it would already help if there were a shorthand for interface{}, like "any" or something.

krakensden 13 years ago
The usual pattern is to use a type that requires the thing you pass in to have methods that you use for the data structure, like sort.Interface[1]. It's faster, safer, and better than using interface{}.
As for shorthand, behold!
```
    type any interface{}
```
[1]: http://golang.org/pkg/sort/#Interface
- codeflo 13 years ago
  
  That introduces a new named type though, i.e., the "any" in your package is different from the "any" in mine, which is not what I want.
  (Unless I'm mistaken here, which might very well be the case.)
  - krakensden 13 years ago
    
    Check it out: http://play.golang.org/p/l9yn0PRbrd
    Anyway, that's the point of go's type inference- if the object implements the necessary parts of the interface, it counts as that kind of object.

grannyg00se 13 years ago

make(* Point) seems much better than having a separate new keyword. Surprised to hear that was changed after just a few days.

oofabz 13 years ago

The "new" keyword is practically unused in modern Go development, but is kept for backwards compatibility. The usual way to make a point is "p := &Point{}", without using any keyword.
- rsc 13 years ago
  Not true. I count "new" being used about half as often as "&Point{}" in the Go standard library. That's not "practically unused".
  g% cg -c -f 'g/go/src/pkg.*\.go' '\bnew\(' | total 2 1485 g% cg -c -f 'g/go/src/pkg.*\.go' '\&[A-Za-z0-9_.]+\{' | total 2 3051 g% cg -c -f 'g/go/src/pkg.*\.go' . | total 2 430482 g%
  So 430,482 non-blank lines of code, 1485 lines with new, 3051 lines that look like a struct pointer literal.
  - trentmb 13 years ago
    
    If I had to guess, I think they meant that anyone writing new code will avoid using 'new'.

Settings

Go Data Structures (2009)

Keyboard Shortcuts