Error handling with errgroups

37 points by reltuk 4 years ago · 17 comments

Reader

> When we Wait() on the errgroup, it will cancel its context even when every spawned go routine has returned nil.

I wonder why. It's documented [1] but seems strange to me. It enforces that errgroup is single-use (rather than cyclical), but I might do that by outright panicking if Go is called after Wait returns. Goroutines in the group might launch others in the group, but presumably not after they've finished, so I'd expect after Wait returns that there are no more calls to Go. Unless folks are passing the group to existing goroutines and then those launch more tasks in the group? That seems like an undesirable pattern.

[1] though not on Wait, just on WithContext: https://pkg.go.dev/golang.org/x/sync/errgroup#WithContext

Laremere 4 years ago

The context derived from WithContext is scoped to the lifetime of errgroup (after they all return, or any return with an error). This way the function passed to Group.Go can spawn other go routines that will get properly cleaned up.
Why those other go routines would spawn processes that they don't clean up otherwise? They probably shouldn't, but canceling the context should always be safe, and is more likely to safely handle what otherwise would be a bug, than it is to cause a bug.
- scottlamb 4 years ago
  
  > canceling the context should always be safe, and is more likely to safely handle what otherwise would be a bug, than it is to cause a bug.
  They gave an example of a bug it caused, and I haven't seen any examples of bugs it would have prevented, so I'm not sure I agree.
  - reltukOP 4 years ago
    
    I think the contract in the context package is that WithCancel may leak resources if cancel is not called and the parent context is cancelable but never canceled. At least, that seems to be the behavior I see in the implementation here:
    https://github.com/golang/go/blob/master/src/context/context...
    So in this case, errgroup probably doesn't have a choice and needs to call cancel() at some point.
    
    scottlamb 4 years ago
    
    Ahh. Yeah, that's unfortunate but explains this choice.
    btw, I realized I'm not being entirely fair in saying it caused a bug. There would have been one anyway. If conflictsBuilder.Wait() failed, it wouldn't cancel the other two operations. It'd wait them out even though the overall operation was doomed. Maybe not a very noticeable bug under normal conditions but still not right.
jdoliner 4 years ago

Contexts spawn a goroutine to block on the Done channel. You always need to call cancel otherwise you'll leak that goroutine. It's an annoying thing that comes from the fact that contexts are implemented in user space and there's no way to block on a channel without block a goroutine.

tomcam 4 years ago

While we’re at it, can someone explain Go contexts for me? I suspect they are a way to keep thread safe data that would otherwise be global or, in the C world, static, but I’m not quite sure. The go documentation just describes how to use them, not why they should be used.

beltsazar 4 years ago

Contexts are a workaround for the inability to terminate a Goroutine, by canceling tasks in the Goroutine.
Suppose that you spawn a Goroutine for handling an HTTP request from client A. While the request is being processed (e.g. calling DB, external services, etc.), client A drops the TCP connection. With contexts we can notify the handler to cancel any ongoing/pending task. Otherwise, the handler will be still running the remaining tasks, hence wasting resources.
Note that if we pass a context to a function, it entirely depends on the function as to when or whether it will cancel its tasks. It's a cooperative multitasking after all.
In Rust, it's easy to cancel a future: Just drop it. It's not that the Rust's approach is perfect. Rust has the opposite problem: Since a future can be dropped anytime (in any await point), dropping a future in the middle of an execution might lead to an inconsistent state, if the future isn't properly implemented.
kubb 4 years ago

They're for cancellation. Commonly, when your goroutine is blocking while waiting for an IO, like a HTTP response, you want to set a deadline. You can do that by creating a suitable context and passing it to the API that makes the request. When the deadline is reached, the API will return an error. You can also cancel contexts yourself, e.g. in response to user input.
It can be any blocking API, not just IO, for example you can acquire semaphore.Weighted in a cancellable way. The API has to be designed to support context cancellation, and most of the standard library is.
- TheDong 4 years ago
  
  > The API has to be designed to support context cancellation, and most of the standard library is.
  Important parts of the standard library are not designed for contexts. Let's say I want to write data to a file and then cancel that. How do I do that? `os.File.Write`, `io.Copy`, `ioutil.WriteFile`... none of these let me cancel a write.
  The `io.Writer` interface not supporting contexts also means things like the compress and archive stdlib packages don't support contexts, among several others.
  Most of the rest of it has had contexts bolted on in nonstandard ways. If you want to use contexts, you can no longer use "net.Dial" or "net.DialTimeout", but must instead use the awkward "(&net.Dialer{}).DialContext" method. The http package has similar issues, including non-idiomatic context usage.
  The go stdlib was mostly built before contexts existed, has promised backwards compatibility, and it really shows.
  - kubb 4 years ago
    
    Good points!
tptacek 4 years ago

You get a request. To handle it, you fork off a bunch of goroutines, some of them doing lots of heavy lifting. Some trivial part of the request handler fails. You report the error back to the requestor, aborting the request. But you still have all these goroutines wandering around doing pointless work for a now-dead request.
Contexts are how you address that.
Groxx 4 years ago

It's a semantically better thread-local var[1] + a slightly more one-way thread interruption of sorts[2].
It's a performance nightmare when you put too many layers on it, but to some degree: meh. If that's your performance bottleneck, you have many options.
[1]: In that it's request-local and explicit, which makes it both clearer and much easier to control. For pure "optimize for fewer memory movements or cross-thread locks" purposes, go has nothing, you have to trust the runtime to schedule efficiently. Except maybe runtime.LockOSThread().
[2]: You can always make a new context that's not cancelled, or just not look at the Done channel. But by accident you get better cancellation behavior than e.g. java's thread interruption, because everyone still uses `catch (Exception e) { println(e) }` despite decades of education to not do so.
Tea418 4 years ago

They’re commonly used at API boundaries to pass cancellation signals and deadlines.
The Go blog has always good insights and details around things like this: https://go.dev/blog/context
conradludgate 4 years ago

The other replies are correct, but the data they've contain is also useful.
In our work, we make use of contexts for logging and tracing. When a request comes in, we updated the context to contain a request ID, any logs we perform in our functions make use of this context to extract that information in order to connect related logs.
Tracing also makes use of these contexts. When you create a new trace, you wrap a context. That context contains the trace parent. That way, any new traces made using that context, will be linked to the parent
But that's the limit. It's just a processing context, not processing data
openasocket 4 years ago

They can be used to store data scoped to a particular request or unit of work, but most of the time that's not why you would use them. Mostly they are used to control tasks and stop things when needed. Imagine you have a server receiving requests. And for each requests there are multiple things you may have to do: logging metadata, checking the cache, pulling data from your database, etc. Some of that may be done in serial, and other parts might be done in parallel (maybe you've got an SQL database and also a KV store with some ancillary data and you want to grab data from both places at once). But you want to make sure this request doesn't block forever, you want to return a response to the user within a given maximum time frame (even if that is just a 504 Gateway Timeout response), and if you reach the time limit, or one of those things you are doing in parallel fails, you want to quickly shut everything else down. That's what contexts are for, in essence. You make a context object and have each of your independent processes use it. The context object provides a couple ways for you to check and see if the context has been cancelled. The Done() method gives you a channel you can poll on. The Err() method returns non-nil if the context was cancelled.
So yeah if you have a bunch of different tasks you want to do and you want to be able to stop all of them at once, you want to use a context.
tomcam 4 years ago

Every one of these answers was golden! Thanks to you I became an appreciably better programmer in less than 5 minutes.

Settings

Error handling with errgroups

Keyboard Shortcuts