Go Error Propagation and API Contracts

May 23, 2024

Zürich, Schweiz

I’ve been thinking about errors in Go quite a bit lately and what has been bothering me about the practice of error design and usage in the community. The critique starts with this code:

1
2
3

if err != nil {
    return err
}

It’s such a common piece of code that it has become a trope thanks to folks feeling like the apparent verbosity is an affront to their humanity. The verbosity’s not what I am going to be writing about today, nor is it something I have frankly ever given two shits about. What I care more about is the meaning of that code. Let’s contextualize that code again in func F:

func F() error {
    if err := G(); err != nil {
        return err
    }
    return nil
}

What really happens when someone calls func F and func F returns an error? The answer is not all that complex: the error from func G is propagated to func F’s caller. This might appear somewhat innocuous at first, but it may not be all that innocent.

The Problem

Let’s reframe that code in Java¹ to see what I mean. Suppose the code starts out as this below where method f does not yet call method g:

// Before
package my.example;

public class DoesNotMatter {
    public void g() throws DomainSpecificGException { ... }

    public void f() { ... }
}

Then later the method f is to call method g. In order for that to work correctly in the language², the signature of f has to be adapted as follows:

// After
package my.example;

public class DoesNotMatter {
    public void g() throws DomainSpecificGException { ... }

    public void f() throws DomainSpecificGException {
        g()
    }
}

Note: Method f’s throws declaration gained DomainSpecificGException³ from method g.

That’s essentially what is happening in the Go code above with func F in the example, and it invites an important question: is it appropriate for whatever errors for func G returns to be also emitted by func F? The thing is, there is no single answer to this question. The answer is situationally specific and will have to be adjudicated each and every time you confront a situation where an inward error could be propagated outwards.

Sounds a bit annoying, right? Well, it’s not entirely Go’s fault here; this same problem applies in other languages⁴, too. So what is this problem, then?

Ultimately it’s about error domains. func G may have a different domain than func F, and it could be inappropriate for func F to return func G’s errors unadulterated to its callers.

A good example of this is to imagine some layering like this in an application. Consider the case of a policy engine that determine the privileges of users based on records stored somewhere:

policy engine: (*policy.Authorizer).MayPerform(ctx context.Context, user string, p Privilege) (bool, error)
storage engine: (*policystore.Store).LoadGrants(ctx context.Context, user string) ([]Grant, error)
database library: (*sql.DB).ExecContext(ctx context.Context, query string, args ...any) (Result, error)

We can easily imagine method (*policy.Authorizer).MayPerform calling method (*policystore.Store).LoadGrants to see types of permissions have been granted to the user.

package policy

import "policystore"

type Authorizer struct {
    store policystore.Store
}

func (a *Authorizer) MayPerform(ctx context.Context, user string, p Privilege) (bool, error) {
    grants, err := a.store.LoadGrants(ctx, user)
    if err != nil {
        return false, fmt.Errorf("storage: %v", err) // Make error from policystore opaque.
    }
    return p.GrantsSufficient(grants), nil
}

We can also imagine method (*policystore.Store).LoadGrants loading these permissions from a database using method (*sql.DB).ExecContext.

package policystore

type Store struct {
    db *sql.DB
}

var ErrNotFound = errors.New("policystore: user not found")

// Error models generic error conditions.
type Error string

func (err Error) Error() string { return string(err) }

func (s *Store) LoadGrants(ctx context.Context, user string) ([]Grant, error) {
    const q = `omitted`
    res, err := s.db.ExecContext(ctx, q, user)
    if err != nil {
        return nil, Error(err.Error())
    }
    // omitted: if empty result set is returned, return ErrNotFound.
    // omitted: produce grants from results and return.
}

So let’s imagine that the database library returns an error in that code flow. Would it be appropriate for the storage engine to return the raw SQL error from package sql to the policy engine? It’s likely not, because what is package policy going to do with a specific error returned from package sql by way of package policystorage? Not very much if it cares about maintaining API stability⁵. Instead, the storage engine’s package package policystore might have its own domain specific errors it would use to translate the local errors from package db into.

Note: Go tends to be conducive to multiple design disciplines. That said, there is generally a value that extraneous abstraction is eschewed, and I am a bit subscriber of that value myself. It’s perfectly reasonable to imagine some Go developers authoring the policy engine without a policy storage engine in between — i.e., interacting directly with SQL.
To me, letting the policy engine accept multiple storage engines sounds like a reasonable design choice and not too much of premature abstraction. You might, for instance, want an in-memory or empty store for testing or for certain production run modes.
The point of saying this is that what I described is a rhetorical example, not a canonical solution. There are multiple ways of slicing the metaphorical onion.

The Solution

Let’s try reframing that code in Java again in case it helps illuminate a potential solution to the original problem:

// Recoding the error domains
package my.example;

public class DoesNotMatter {
    public void g() throws DomainSpecificGException { ... }

    public void f() throws FException {
        try {
            g();
        } catch (final DomainSpecificGException ex) {
            throw new FException();
        }
    }
}

In this reworking, DomainSpecificGException is recoded to FException. This same thing can be done in Go:

With error sentinels, it looks like this:

func F() error {
    err := G()
    switch {
    case errors.Is(err, ErrSomethingDomainSpecificForG):
        return ErrFCondition
    case err == nil:
        return nil
    default:
        return fmt.Errorf("unknown: %v", err)
    }
}

With structured error values, it looks like this:

func F() error {
    err := G()
    var errDomainSpecific *SomethingDomainSpecificForGError
    switch {
    case errors.As(err, &errDomainSpecific):
        return ErrFCondition
    case err == nil:
        return nil
    default:
        return fmt.Errorf("unknown: %v", err)
    }
}

With opaque errors:

func F() error {
    if err := G(); err != nil {
        return fmt.Errorf("unknown: %v", err)
    }
    return nil
}

Now, this invites a fun discussion that is beyond the scope of this article:

when should we create an error domain?
when should we convert one error domain to another?

I’ve leave you as a practitioner to answer⁶ these questions, but the error handling options are effectively this:

Propagate the error along the call chain: return err
Make the error domain opaque: return fmt.Errorf("unknown: %v", err)
Convert the error from one domain to another: return ErrSentinel or return &MyError{ /* omitted */ }
This includes possibly interrogating the original error value with errors.Is and errors.As.

There’s no one-size-fits-all approach. Propagating the error along is super convenient for prototyping, but production-grade code must consider what is correct on a case-by-case basis.

Error Wrapping

Now, the astute reader among you may wonder: why has error wrapping not been discussed yet? Well, the truth of the matter is that error wrapping presents essentially the same problems that the original code did:

func F() error {
    if err := G(); err != nil {
        // Same problem as the original, except with the addition of some
        // debugging text.
        return fmt.Errorf("something: %w", err)
    }
    return nil
}

And that’s exactly what the blog post for error wrapping reminds the reader: a wrapped error becomes part of the outer function’s API.⁷

Closing Words

So, if your API does any of the following, you need to consider whether the returned error is situationally correct for the caller of your API to consume:

returning the error raw

1
2
3

if err := G(); err != nil {
    return err
}

wrapping the error with extra text

1
2
3

if err := G(); err != nil {
    return fmt.Errorf("something: %w", err)
}

wrapping the error inside another error type that implements Unwrap() error and similar

if err := G(); err != nil {
    return &MyError{
        Err: err,
    }
}

where *MyError implements

interface {
    error

    Unwrap() error  // or Unwrap() []error
}

If your API does any of those things above, you need to consider Hyrum’s Law and all of its attendant worries. It’s usually easier to go from an opaque or local sentinel or structured error type to a propagated error value or type (be it wrapped or not) later than it is the other way around.

I hope nobody comes away from this thinking that I am against raw error propagation or error wrapping. The only thing I am against is doing either rotely (thoughtlessly). Just make sure you understand the context in which you are developing and the error contract of the APIs you are using, and document your contract correctly. This is especially critical in APIs that consume or call user-implemented interface values or function values.

The act of maintaining situational awareness and making an active decision while programming is the very nature of software engineering. There are some things you can delegate to an IDE or other facility or tool to help you design or solve a small problem in isolation, but the act of design and considering constraints in totality requires deliberate human choice. Nothing can take that away.

Navigation: