Announcing pggen: A Golang ORM Alternative

Press enter or click to view image in full size

By: Ethan Pailes

At Opendoor, when we began building Go services we chose to use the GORM Golang library as our database library. As these services grew in complexity, we found that GORM did not meet all of our needs. Many engineers were spending more time than they liked translating SQL queries into GORM method calls when making database changes and had to continually put effort into the error-prone process of keeping Golang model structs in sync with the database schema. To fix these problems, we built pggen (short for “Postgres Generator”). We’ve been using pggen internally in many services for over a year and are happy to announce that we are releasing it under the MIT license.

Why Current Options Didn’t Work For Us

When adding database functionality to a service using GORM, engineers encountered friction in a few different places:

For developers fluent in SQL, extra time must be taken to translate a target query into the query builder calls to generate the SQL they already know they want.
Every time a new column or table is added to the database, the Go structs that mirror the structure of the table must be manually kept up to date.
Because it uses reflection, the interface that GORM presents is not type safe.

For complex queries it can often take significantly more time to translate the query into GORM calls than to just write the query directly in SQL. In several instances, engineers reported writing a SQL query in ten minutes or so and then spending hours translating it into the right query builder method calls.

GORM does offer the ability to execute raw SQL directly, but it does not offer many benefits over just using the database/sql package directly. You still must manually maintain the structs that the data gets loaded into, and the interface is not type safe. This is a pretty reasonable design decision in the ORM context, since SQL is not meant to be front-and-center for ORMs. However, it does mean that GORM’s raw SQL support is not a good alternative to a full database-first code generator.

Many of these problems are a feature of any ORM, but we think that a language with a fairly inexpressive static type system like Go is a particularly poor fit for the ORM model of database access.

Why Use `pggen`?

pggen was written as an attempt to address these pain points when writing database access code in our Go services. Rather than using a baroque embedded DSL to dynamically generate SQL, pggen uses SQL as its query language directly, reducing the translation step to copy and paste. pggen automatically generates model structs by reading the database schema, so your database layer Go code can never fall out of sync with your database. Query wrappers are generated using postgres’s own type inference, ensuring that there is no mismatch at the seams between Go and SQL. Finally, pggen generates model structs with GORM tags, allowing it to be used as a model struct generator for GORM.

Using SQL as a Query Language

At its core, pggen is a tool which takes a SQL query and generates a Go routine for executing that query. This means that when using pggen to work with the database you spend most of your mental effort thinking about SQL rather than the API of pggen itself.

If you need to go beyond trivial interactions with a SQL database, you are going to need to know SQL so that you can reason about performance and understand the exact semantics of the code your database library is generating. If you have to know SQL to effectively work with the database anyway, why not just focus on a single domain specific language rather than having to work with two? On top of being less work, focusing on SQL is good for your personal development as a programmer, since it is much more broadly applicable than even the most ubiquitous ORM DSL.

In our experience at Opendoor, especially for those developers who like to think about queries primarily in terms of SQL rather than in terms of host language constructs, pggen makes for a much smoother query authoring experience. To write a new query in pggen, you can arrive at the SQL you want in psql and then paste the result into a toml file. There are a few configuration options that allow you to tune the interface that pggen generates, but for the most part all the mental effort of adding a query with pggen is taken up by writing SQL.

Many database libraries start from the idea that databases are data repositories, so developers should be able to work with database objects in the same way they work with data objects in the host language. pggen instead starts from the idea of calling into a foreign language environment, so it feels more like an RPC framework than an object mapper. pggen does include plenty of features for mapping database tables to Go structs, but these are convenience features, not the primary programming paradigm.

As an example to show the advantages of using raw SQL over a query builder, consider this GORM based subroutine we have for fetching a list of security system installations attached to a given address:

func (store *Store) getInstallationsByAddress(ctx context.Context, addressUUID string, active bool) ([]models.Installation, error) {
    dbInstalls := []models.Installation{}
    err := store.DB.WithContext(ctx).
        Where("address_uuid = ?", addressUUID).
        Where("is_active = ?", active).
        Order("created_at DESC").
        Find(&dbInstalls).
        Error    if err != nil {
        return nil, err
    }
    return dbInstalls, nil
}

Writing this kind of code isn’t too bad, but the first time you do it you need to understand three different query building methods. It’s not entirely obvious how the two different Where calls will combine (though from context it seems like they are AND’d together), but you’ll probably want to pull up the docs just to be sure. If you don’t have an example to work off of, you’ll definitely need to pull up the docs. When writing SQL, the order of various query operators like ORDER BY matters, so does it matter here? Since Find uses reflection, you need to read the docs to figure out what it expects as its output variable, and it is not entirely obvious that the returned error needs to be pulled off of a field. Finally, you need to go through some ceremony around initializing the result variable, binding the context to the ORM handle, and checking the error.

This routine can be replaced by the following entry in your pggen.toml file:

[[query]]
name = "GetInstallationsByAddress"
return_type = "Installation"
body = "SELECT * FROM installations WHERE address_uuid = $1 AND is_active = $2"

Based on the entry in the pggen.toml file, pggen will generate a PGClient struct with a method for this query. In this case, the query can be executed with pgClient.GetInstallationsByAddress(ctx, addressUUID, active).

Even for relatively simple queries like this, we think pggen has a pretty clear advantage, but this advantage only grows as queries become more complex. Lowering the barrier to writing queries doesn’t just save time. It means that we end up writing more complex queries that we would have otherwise stitched together with multiple calls to simpler queries. An ergonomic database library makes it much easier to move computation to where the data is since it becomes easier to express the computation. This can result in much better performance than architectures that must pull large quantities of data into the application for processing in the application programming language.

Automatically Generating Model Structs

While a minimal query shim generator that just allows you to quickly wrap calls to queries or statements would be flexible enough to accomplish any database interactivity task, it is often useful to have a single canonical struct that represents a row for a particular table in the database. In our GORM based services, we need to manually maintain these structs, and developers often leave fields out or get the nullability of a field wrong which can lead to subtle bugs. To solve the problem of manually maintaining structs to represent tables in the database, pggen allows you to ask for a struct to be automatically derived from the table. pggen will figure out a good Go name for the struct as well as the names, types, and nullability of the fields.

These model structs come with a number of default methods on the generated database client. pggen will generate a handful of basic CRUD methods for each struct so you don’t have to worry about writing custom queries to do things like inserting new records, fetching a single row by the primary key, getting a list of records, updating a subset of fields in a record, or even performing an UPSERT. pggen will also automatically infer all relationships between tables directly from the database schema based on foreign key constraints, including differentiating between 1-many and 1–1 relationships based on the presence of UNIQUE constraints, and generate a few methods to easily fill in related tables.

These automatically generated model structs can interoperate with custom queries as well. Custom queries can be configured to return a model struct, which makes it easy for application code to mix custom query logic with logic using the automatically generated code for a given model.

Type Safe Queries

A major advantage of using pggen over a reflection-based database library is that the generated query wrappers have their argument and return types checked against the database. At Opendoor, we love the way that even Go’s relatively inexpressive static type system eliminates a large class of bugs, so it’s always sad when we need to rely too heavily on reflection to solve a problem. Normally, there are reasons to think static types could be good or bad for developer productivity: they help in reading code and figuring out how to work with unfamiliar APIs, but they require more time to write. When an external program like pggen is the one doing all the work to write the types, though, the drawbacks to this tradeoff disappear, and developers get to just benefit from good autocompletion in their IDEs and well-documented data formats.

GORM Compatibility

We have a lot of existing services written using GORM as the database layer, and those are not going away overnight. Big migrations are seldom a good idea, so it was important for us that any database-first code generator we adopted had a good story for gradually migrating away from GORM. When pggen generates model structs, it includes struct tags on all the fields to tell GORM about the structure of the table. While we didn’t go out of our way to ensure that pggen generates every possible GORM tag, we found that the subset that we did generate was enough to cover all the use cases in the services that we have migrated. Additionally, pggen includes an escape hatch configuration option allowing you to add arbitrary struct tags to any field in the generated struct, so migrations from GORM won’t be blocked by updates to the pggen tool.

The process of migrating a service from GORM over to pggen involves adding a table entry for the given table to the pggen.toml file and then switching all the database layer methods to use the new generated model struct rather than a manually maintained struct. At this point, new code can seamlessly use pggen to talk to the database and have the returned values interface smoothly with all the existing code that works with the model using GORM. Old GORM query building code can be translated to raw SQL or the automatically generated CRUD methods at the leisure of the development team. It is perfectly feasible to continue using the generated model struct with GORM for a long time if desired. In fact, if you chose you could continue to use GORM and just use pggen to automatically generate the structs for you so that you can avoid manual updates when you add new columns to the database.

Battle Tested

We’ve been using pggen in production for nearly a year now. At this point, it is very stable, and we’ve got plenty of experience with operating it in critical environments. While the core features described above were there from early in pggen’s life cycle, running pggen in production has given us the opportunity and impetus to flesh it out with several ancillary features such as:

support for transaction and connection manipulation
support for custom middleware to facilitate logging and performance tuning
support for multiple database drivers
support for using pggen as a library in a complex build system like Bazel
support for flexibly adjusting to changing database schemas in a running application

pggen comes with a number of examples showing how to accomplish common and not-so-common tasks, and if you adopt it you can be confident in its reliability.

Why Not Use `pggen`?

We think pggen is a great database library, but like everything in engineering it comes with a set of tradeoffs. Here we’ll try to acknowledge and address some of the drawbacks of pggen.

More Complex Builds

The biggest drawback to using pggen is the complexity it adds to builds. To adopt pggen, you must update your build process to have a connection to a database that has the current schema. Without this, pggen is not able to generate structs or methods. At Opendoor, we already had a test database available in our build pipelines that could be used for this purpose. However, if that isn’t the case for your project, this would require some upfront work.

Only Works with Postgres

Another drawback to pggen is its exclusive focus on Postgres as the database backend. Some applications, such as those that will be deployed in customer environments where the application developer does not have full infrastructure control, need to be able to support working with multiple databases. pggen is not the right tool for these kinds of applications.

We primarily use Postgres at Opendoor, so we haven’t prioritized support for any other databases, though we would welcome efforts to re-use the pggen code to apply to other databases. We are not likely to add support for other databases directly to pggen (if for no other reason than the name of the tool would no longer fit), but we are quite open to factoring all of the important parts of it out into a library that can be used to build similar tools that consume the same configuration format.

Dynamic SQL

While pggen makes it really easy to work with static SQL, it does not have the same kind of first class support for dynamic SQL that ORMs and query builders have. Dynamic SQL generation is generally the dominant paradigm for querying a database when using an ORM or query builder, but pggen needs a pre-written query to generate a shim for. For almost all use cases, static SQL is expressive enough, but in some cases, such as faceted search, determining the exact SQL and query parameters you will use is most naturally done at run time. The only support for dynamic SQL that pggen offers is the generated Scan methods on model structs that can still be used for loading result sets. Within Opendoor, some engineers find the string templating required when working with dynamic SQL for pggen about as easy to reason about as chained GORM calls, and others find it less elegant.

What About Other Database-First Code Generators?

At the time that we started working on pggen, we were aware of two other database-first code generators, xo and gnorm. Since then, an exciting new tool, sqlc, has come on the scene. At the time, neither xo or gnorm seemed like a good fit for us, largely because we wanted to make sure that our new tool would be able to offer a gradual offramp from GORM and because we wanted to make sure we could offer good support for associations between tables. To see a more in depth comparison between pggen and some of these tools, you can take a look at the pggen README.

Next Steps

pggen is fairly mature at this point, but we plan on continuing to work on it over the coming months and years. We have a number of ideas for quality-of-life enhancements such as:

teaching it to stand up a Postgres database on its own in order to make integration easier
adding utilities to make working with dynamic SQL easier

We also have some more ambitious ideas that might require changes to upstream Postgres:

exploring ways to get Postgres to allow us to support RETURNING
automatically inferring nullability for queries

Our hope, though, is that the wider software community will adopt pggen, in which case we expect to spend some of our initial maintenance effort on fixing any issues that new users encounter and answering any questions or feedback. For more information, you can take a look at the pggen repo on github.

Interested in joining the Opendoor team? We’re hiring! Check out open roles here.