memory regions · golang go · Discussion #70257

I'm starting this discussion to collect early feedback on a draft design for a kind of region-based memory management in Go. There is no prototype yet, only a design and a preliminary evaluation.

Please read everything below before replying, especially the design discussion section.

(Feel free to skip the detailed design, unless you're interested.)

Background

The arena experiment adds a package consisting of a single type to the standard library: Arena. This type allows one to allocate data structures into it directly, and allows early release of the arena memory in bulk. In other words, it adds a form of region-based memory management to Go. The implementation is memory-safe insofar as use-after-frees will never result in memory corruption, only a potential crash. Arenas have achieved real performance wins, almost entirely due to earlier memory reuse and staving off GC execution.

Unfortunately, the proposal to add arenas to the standard library is on indefinite hold due to the fact that they compose poorly with the language and standard library.

For example, builtin types all need special cases in the implementation, and require explicit slice- and map-related methods. Additionally, choosing to arena-allocate a variable means that it can never be stack-allocated, not without more complexity in the compiler.

Furthermore, for an API to make use of arenas, it must accept an additional argument: the arena to allocate into. There are far too many APIs that would need to be updated to make this integrate well with the language, and it would make those APIs worse.

The text below proposes a composable replacement for arenas in the form of user-defined goroutine-local memory regions.

Goals

First and foremost, our main goal is to reduce resource costs associated with the GC. If we can't achieve that, then this proposal is pointless.

The second most important goal is composability. Specifically:

APIs should not need to be changed to take advantage of arena-like memory allocation patterns.
Regions must compose with standard library features, like sync.Pool and unique.Handle.
Regions must compose with existing optimizations, like stack allocation via escape analysis.

Finally, whatever we implement must be relatively easy to use and intuitive for intermediate-to-advanced Go developers. We must offer tools for discovering where regions might be worth it, and where they aren't working out.

Design

The core of this design revolves around a pair of functions that behave like annotations of function calls. It's useful to think of them as annotations, because crucially, they do not affect the correctness of code, bugs notwithstanding.

The annotations indicate whether the user expects most or all the memory allocated by some function call (and its callees) to stay local to that function (and its callees), and to be unreachable by the time that function returns. If these expectations hold, then that memory is eagerly reclaimed when the function returns, bypassing the garbage collector. If these expectations do not hold for some memory, then that memory is opted out of this early reclaim; management is passed on to the garbage collector as normal.

Below is the proposed new API which explains the semantics in more detail.

package region

// Do creates a new scope called a region, and calls f in
// that scope. The scope is destroyed when Do returns.
// 
// At the implementation's discretion, memory allocated by f
// and its callees may be implicitly bound to the region.
//
// Memory is automatically unbound from the region when it
// becomes reachable from another region, another goroutine,
// the caller of Do, its caller, or from any other memory not
// bound to this region.
//
// Any memory still bound to the region when it is destroyed is
// eagerly reclaimed by the runtime.
//
// This function exists to reduce resource costs by more
// effectively reusing memory, reducing pressure on the garbage
// collector.
// However, when used incorrectly, it may instead increase
// resource costs, as there is a cost to unbinding memory from
// the region.
// Always experiment and benchmark before committing to
// using a region.
//
// If Do is called within an active region, it creates a new one,
// and the old one is reinstated once f returns.
// However, memory cannot be rebound between regions.
// If memory created by an inner region is referenced by an outer
// region, it is not rebound to the outer region, but rather unbound
// completely.
// Memory created by an outer regions referenced by an inner region
// does not unbind anything, because the outer region always out-lives
// the inner region.
//
// Regions are local to the goroutines that create them,
// and do not propagate to newly created goroutines.
//
// Panics and calls to [runtime.Goexit] will destroy region
// scopes the same as if f returned, if they unwind past the
// call to Do.
func Do(f func())

// Ignore causes g and its callees to ignore the current
// region on the goroutine.
//
// Calling Ignore when not in an active region has no effect.
//
// The primary use-case for Ignore is to exclude memory that
// is known to outlive a region, to more effectively make use
// of regions. Using Ignore is less expensive than the unbinding
// process described in the documentation for [region.Do].
func Ignore(g func())

For some very basic examples, see the detailed design doc, or the next section.

Comparison with arenas

Where an arena might be used like...

func myFunc(buf []byte) error {
	a := arena.New()
	defer a.Free()

	data := new(MyBigComplexProto)
	if err := proto.UnmarshalOptions{Arena: a}.Unmarshal(buf, data); err != nil {
		return err
	}
	use(data)
}

... regions would be used like so:

func myFunc(buf []byte) error {
	var topLevelErr error
	region.Do(func() {
		data := new(MyBigComplexProto)
		if err := proto.Unmarshal(buf, data); err != nil {
			topLevelErr = err
			return
		}
		use(data)
	})
	return topLevelErr
}

You can think of a region as an implicit goroutine-local arena that lives for the duration of some function call. That goroutine-local arena is used for allocating all the memory needed by that function call and its callees (including maps, slices, structs, etc.). Thanks to some compiler and runtime magic (see below), if any of that memory would cause a use-after-free issue, it is automatically removed from the arena and handed off to the garbage collector instead.

In practice, we've found that the vast majority of arena uses tightly limit the arena's lifetime to that of a particular function, usually the one they are created in, like the example above. This fact suggests that regions will most likely be usable in most of the same circumstances as arenas.

Summary of benefits and costs

The core benefit is the potential for reduced GC overheads. An additional, more minor benefit is the potential for more efficient memory allocation. If the application code follows the region discipline, it makes much more sense to introduce a bump-pointer allocator for that memory (something like Immix; see the detailed design.

As alluded to in the previous section, some "magic" is required to dynamically escape memory from the region to the general heap. The magic is a goroutine-local write barrier (goroutine-local because it is only enabled on that goroutine, inside the region). We believe that we have a write barrier design that is cheap enough to make this worthwhile, incurring between 1–4% worst-case overhead when enabled globally, depending on the application (so it will be less in practice, limited to the goroutines that use it). We believe that this can be easily won back and then some in GC-heavy applications, provided their memory usage patterns line up with the region's assumptions.

However, this assumes that most or all memory in a region does not escape. The cost of promoting memory is higher, approximately the same cost as reallocating that memory on the heap (that is not how it would be implemented, but it gives you a sense of the cost).

Detailed design and implementation

For more details, please see the complete design document, which includes:

A preliminary performance evaluation and cost/benefit analysis.
Several proposed diagnostics to monitor use of regions.
Alternatives considered.
Prior art.

Detailed draft design.

(Note that the full design doc introduces a new term for memory 'escaping' a region ("fading") to avoid overloading with the compiler's 'escape analysis'. These mean the same thing.)

Design discussion

Below are a few discussion points that have come up often in early feedback, as well as my responses to those discussion points.

Goroutine-local region state seems problematic. Why is it OK?

Enabling region-based allocation for all variables created by a goroutine delivers a clear win if the vast majority of your memory allocated adheres to the region discipline. It's really OK if a small percentage (say, under 5%) of memory allocations escape from the region to the heap.

Also, note that the idea of implicitly opting-in memory was discarded for arenas, but that's because arenas can possibly introduce use-after-free crashes. If you use regions incorrectly, your program will not crash.

Will code owners need to consider applying `region.Ignore` everywhere?

One concern that was raised multiple times early in the design was whether region.Ignore would encourage tightly controlling allocations within regions so heavily that users would start pestering library owners to wrap certain portions of code in region.Ignore.

While this is something that could happen, I hope it would be rare, and I would encourage maintainers to push back on such requests if they occur. As mentioned in the previous discussion point, it's really OK if a small percentage of memory allocations escape from the region to the heap.

For example, I would explicitly advocate for not wrapping (*sync.Pool).New with region.Ignore in the standard library. Why? Because if you're using a Pool effectively, the number of steady-state allocations made should be quite small in practice, and easily overtaken by region allocations.

Given the concern however, perhaps we should remove region.Ignore from the design until we get more experience with it.

Possible extensions

Using PGO to automatically disable costly regions

If at compile time we see from a profile that the region slow paths are "hot" inside a particular region, the compiler can disable that region and potentially report that it did so. This technique has the potential to make monitoring more automatic.

Provide a `GOEXPERIMENT` to make every goroutine implicitly a region

This GOEXPERIMENT makes it easy to quickly turn regions on and off for an entire application. I suspect the majority of performance-sensitive Go applications, such as web services, would benefit from wrapping entire requests (usually captured by a single goroutine) in a region.

This idea is equivalent to enabling the request-oriented collector, an experimental garbage collector from early in Go's life, designed by Rick Hudson and Austin Clements. The difference between that design and this one is in the details: separately managed memory, and a much cheaper write barrier fast path.

This may also combine well with dynamically disabling regions with PGO.

Provide a `GODEBUG` to disable all regions

This allows for quicker rollback and experimentation. We can also extend this GODEBUG to work with compile-time hash bisection to identify costly regions efficiently. This is made possible due to the fact that regions do not change the semantics of the program.

Next steps

Although fairly fleshed out, this design does not yet have a prototype. Before making such an investment, we wish to gauge interest from the community.

Once we feel that broad interest exists, we may prioritize it. This would then involve building a prototype, available as a GOEXPERIMENT, which would then be used to steer the design, possibly enough toward approval. Note that we plan to remove arenas from the standard library once this prototype is created.