Go Data Structures (2009)
research.swtch.com>As an aside, there is a well-known gotcha in Java and other languages that when you slice a string to save a small piece, the reference to the original keeps the entire original string in memory even though only a small amount is still needed. Go has this gotcha too.
That is no longer the case in Java. String.substring() now makes a copy. I think it doesn't matter much which of the two approaches a language takes as long as everybody knows it. This needs to be in the language spec and can't be an implementation issue.
for historical note, this change was made in may 2012 Java 7u6
That's kind of a big deal for a point release...
There is a good reason for it however. http://www.javaadvent.com/2012/12/changes-to-stringsubstring...
Still, it is a change that could dramatically affect the behavior of some programs. A program that takes many references to one string might consume a lot of memory when those references become copies.
Here's more nitty-gritty from Oracle: http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-Ma...
If I'm following, a Java string used to be a []char and offset/count ints, and this change let them drop those ints. You saved RAM if you had a lot of little strings, but paid for extra copying if you took lots of substrings.
Go slices/strings don't have a pointer to the "original" backing array, just a pointer to the first byte in this (sub)string. It doesn't need extra fields to do substrings by reference.
I think part of the technical reason for the different string headers is that the Java designers didn't want their GC to have to handle "internal pointers" into strings/objects (maybe for performance reasons?), whereas the Go designers decided to support 'em (maybe to support more C-like code in Go?).
Go does not support internal pointers into strings. You have to use slicing for that.
Sorry, I mean that there's an internal pointer in Go's in-memory representation of the string, not that there's a naked byte pointer directly visible to the programmer.
Go's GC's support for internal pointers means it can use a pointer-and-length representation for substring references. Java's lack of support for them means its string representation needs a pointer to the start of the char array and a separate offset and count in order to do the same substring-reference trick. (And, I'm saying, that helps explain why Java and Go now do substrings differently.)
There are other places where Go's ability to use internal pointers is exposed more directly to the programmer: for example, Go lets you take the address of an array element or struct field and pass around the resulting pointer.
>Go's GC's support for internal pointers means it can use a pointer-and-length representation for substring references
Only if the String class is implemented in pure Java, which it currently is. But it doesn't have to be that way. Oracle could go around the Java language features and implement the String class in native code just as Go does with several builtin types. You may be right that it would be more difficult to do than in Go because of garbage collector specifics.
But I guess the real issue is a philosophical one. Is it a good idea to let the standard library use features that are not available to users of the language?
I'm saying I think it would take GC rearchitecting for Java to be able use a pointer into the middle of the string in its internal String representation, because Java's GC, unlike Go's, is currently not built such that a pointer into the middle of anything keeps that thing "alive" for GC purposes; you have to have a pointer to the beginning. Sun made that choice that in hopes they could write a faster GC that way, I suspect.
Given that GC design, Go's two-word substring references (pointer into middle of string + count) wouldn't work; even if String were a builtin, with the no-internal-pointers GC design it would need to be at least three words (pointer to start of string, offset, count).
tl;dr of my larger point is--I think Java needed a few extra bytes/String to support substrings by reference because of how its GC works differently from Go's, and I think that explains why Java decided to remove its substring-by-reference trick while Go didn't. (And I'm not trying to say either way is worse, just trying to really grok why they're different.)
Note that this is from 2009. Although the main details have not changed, the int type is more commonly 64 bits now (since 64 bit architectures are much more common)
Do you know what version that happens in? I tested on my 32 and 64 bit platforms with golang 1.1 and a static definition of an integer results in type int (which is explicitly 32 bit)
It's my understanding that this intentional and won't change, only explicit declarations of int64 are 64-bit.package main import "fmt" import "reflect" func main() { i := 3 z := reflect.ValueOf(i) fmt.Printf("%s\n", z.Kind()) } // $./test // int // $It is implementation-specific (from the spec:)
There is also a set of predeclared numeric types with implementation-specific sizes:
uint either 32 or 64 bits
int same size as uint
uintptr an unsigned integer large enough to store the uninterpreted bits of a pointer value
The size of int on 64-bit systems was increased to 64 bits as of Go 1.1: http://golang.org/doc/go1.1#int
Cool, thanks for the clarification, this makes sense!
i := 3 means declare i to be an "int", which is the default numeric type. The size of that int will vary from platform to platform. See http://golang.org/ref/spec#Numeric_types
Awesome, thanks!
I wish there were a way to create custom data structures without casting to and from interface{} all the time. Heck, it would already help if there were a shorthand for interface{}, like "any" or something.
The usual pattern is to use a type that requires the thing you pass in to have methods that you use for the data structure, like sort.Interface[1]. It's faster, safer, and better than using interface{}.
As for shorthand, behold!
[1]: http://golang.org/pkg/sort/#Interfacetype any interface{}That introduces a new named type though, i.e., the "any" in your package is different from the "any" in mine, which is not what I want.
(Unless I'm mistaken here, which might very well be the case.)
Check it out: http://play.golang.org/p/l9yn0PRbrd
Anyway, that's the point of go's type inference- if the object implements the necessary parts of the interface, it counts as that kind of object.
make(* Point) seems much better than having a separate new keyword. Surprised to hear that was changed after just a few days.
The "new" keyword is practically unused in modern Go development, but is kept for backwards compatibility. The usual way to make a point is "p := &Point{}", without using any keyword.
Not true. I count "new" being used about half as often as "&Point{}" in the Go standard library. That's not "practically unused".
So 430,482 non-blank lines of code, 1485 lines with new, 3051 lines that look like a struct pointer literal.g% cg -c -f 'g/go/src/pkg.*\.go' '\bnew\(' | total 2 1485 g% cg -c -f 'g/go/src/pkg.*\.go' '\&[A-Za-z0-9_.]+\{' | total 2 3051 g% cg -c -f 'g/go/src/pkg.*\.go' . | total 2 430482 g%If I had to guess, I think they meant that anyone writing new code will avoid using 'new'.