Faster integration tests with Test Scenarios

Press enter or click to view image in full size

How we made testing our backend code faster by reducing the cost of common test setup

Currently, Faire’s backend codebase has over 15,000 tests. One of our core principles in testing is making sure that our tests reflect reality as much as possible. For this reason, most of our tests have a MySQL database running, which provides us with a real implementation of our persistence. We discovered that we achieve much higher fidelity with our test coverage by taking this approach. Given that Faire works with small businesses and independent entrepreneurs that rely on our platform for their livelihoods, it is of paramount importance to us to ship high quality software that minimizes bugs and downtime to better serve our community.

We also call web APIs directly in our tests to simulate the ways users would actually interact with the system. Combined with the real MySQL database instance running, our tests run in an environment that very closely resembles how we expect the system to behave in production. We have the satisfaction of knowing, for example, that we can test the API calls required to sign up as a retailer on the platform, add some products to their cart, and place an order.

We are able to test everything from the server receiving the request, to persisting changes made by the API, to finally passing through response filters and returning a response to the caller, which in this case would be our test.

One of the challenges when using a database during tests is that it needs to be completely cleared of data between test executions and then repopulated with test data. Populating our initial test state by executing API calls into the locally-running backend instance can be slow. It can take seconds to simply get our tests in a state where there is the right data to actually test what we want to test. If we assume, on average, it takes 1 second, we will have at least 250 minutes of test setup time across our test suite.

Running our tests in parallel allows us to keep the total time of our Continuous Integration (CI) builds down, but more optimization is always needed. We strive to always have our compile and test suite complete in under 15 minutes.

You might be wondering: why is setting up our tests slow to begin with? Even 1 second may seem excessive when running a local server and database, but consider that we will need to perform several API calls to create the initial brands and retailers, which are two different kinds of users that exist in our marketplace, and then perform several more API calls to create products. By executing the API calls, for each request, we need to do the following.

Check the user session and authenticate. (Plus some other miscellaneous start-of-request overhead).
Open database transaction(s).
Get the current relevant state from the database. This is primarily going to be SQL SELECT statements.
Execute changes to the database. This may be INSERT statements or UPDATE statements.
Commit the transaction.
Return the results, running through any API response filters.

If we could skip directly to the resulting state of the system, we save the overhead at each step listed above for each API request.

Test Scenarios

We came up with an optimization to quickly replay test setups. We named the pieces of our system that store what to replay Test Scenarios.

The main idea of Test Scenarios is the following.

When a test runs, it can run a Test Scenario as its first operation using a Test Scenario Runner.
The Test Scenario Runner will check if that particular Test Scenario has already been run before.
If it has not run before, we execute the API calls for the test setup, just like we would for a normal test. We record the application and database state after running the setup code.
If it has run before, we restore the recorded application and database state.

As long as at least 2 tests use the same Test Scenario, only the first test will need to run the expensive API calls and we will get some savings in the total time it takes to run tests.

To give a more concrete example of what a Test Scenario might look like, here is one of the most basic ones we are currently using.

data class BrandAndAdminTestScenario(
    val brandUserTester: BrandUserTester,
    val adminUserTester: AdminUserTester,
) : TestScenario {
  @Singleton
  class Factory @Inject constructor(
      private val brandUserTesterFactory: BrandUserTester.Factory,
      private val adminUserTesterFactory: AdminUserTester.Factory,
  ) : TestScenarioFactory<BrandAndAdminTestScenario> {
    override fun create(): BrandAndAdminTestScenario {
      return BrandAndAdminTestScenario(
        brandUserTester = brandUserTesterFactory.signUp(),
        adminUserTester = adminUserTesterFactory.getOrCreateDefault()
      )
    }
  }
}

The first time this Test Scenario is used we will create a BrandUserTester and an AdminUserTester, each of which represent a particular fully signed up and logged in user to our system. Setting up these users requires several API calls, which carries a non-negligible amount of overhead. The result of the API calls is a database state with a few dozen rows across several tables representing all the data related to this brand and admin user in the system.

Get Andrew Arnott’s stories in your inbox

Join Medium for free to get updates from this writer.

When we replay the Test Scenario, we are able to run a series of database INSERTs to restore the database state to have the exact same brand and admin user. In this way, a test that executes the test scenario will be in an identical starting state regardless of whether we are running the actual API calls for the first execution or just running database inserts on the second execution.

Capturing database state

In order to be able to replay tests, we need to be able to take a snapshot of the current state of the database. There’s a couple ways we could approach creating this snapshot.

Scan every single table using a SELECT * and keep a copy of any data we find.
Snapshot the entire database locally and restore it for each test replaying the Test Scenario.

We opted for #1 with an added optimization. We use a query interceptor to record which tables were written to such that we only have to scan the tables that have rows. We chose this approach because we were already using a similar process to discover which tables to truncate before running the next test.

Option #2 would also require more coordination with the local MySQL server and its place in the file system. That complexity did not seem worth pursuing compared to the ease-of-use with option #1.

As we developed our Test Scenario Runner and the related test infrastructure, we quickly realized that it must be the first code to run that modifies the state of the system. This is because of two reasons.

The database state we capture records the entire row data, including the IDs. If we insert any data into the database before replaying the SQL INSERTs from the Test Scenario, we could have a primary key collision.
It’s likely that the execution of the Test Scenario could have different outcomes with a different starting database state. We can’t guarantee the replayed SQL would actually be correct when added on to any possible existing application state.

In order to resolve this issue, we use the same query interceptor to ensure no queries have inserted data before replaying the Test Scenario.

Capturing application state

In order to capture application state, we need to specify which classes need to have their state stored and then restored. We have an interface we use for this purpose which has functions to save state and restore state.

/**
* If your test fake contains state, in order for a test scenario instance to be executed correctly, it needs to be
* able to restore that state. This interface defines the contract test fakes must implement in order to do this.
*/interface StatefulFake<T> {
  fun captureState(): T
  fun restoreState(state: T)
}

Currently, the only classes that need to save and restore state have been fake implementations of integrations with external systems. We often want to set up the default responses from these systems and being able to have that be covered by a Test Scenario is useful.

An example of where we capture the application state would be our TestClock. We use a Clock interface to provide the current time and use the interface across the application. In tests, we use the TestClock as the concrete implementation so that we can have the test advance time forward by specific increments, which allows us to keep any time-based behaviors deterministic in the tests.

/** A version of the Clock that can be used to change the system time without blocking.  */
class TestClock(
    now: ReadableInstant = DateTime(2016, 1, 1, 0, 0, 0, 0, DateTimeZone.UTC),
) : Clock(), StatefulFake<TestClock.State> {
  private val currentTimeMillis = AtomicLong(now.millis)  //... TestClock implementation  data class State(
      val currentTimeMillis: Long,
  )  override fun captureState(): State {
    return State(currentTimeMillis())
  }  override fun restoreState(state: State) {
    currentTimeMillis.set(state.currentTimeMillis)
  }
}

By restoring the TestClock’s state during the replay, we can ensure that tests using a Test Scenario will have a consistent starting application state.

This approach is somewhat cumbersome, however, and we may consider replacing it with something with more magic. We don’t actually have many parts of our application that need to store state as we aim to keep the database as our single source of truth, so it has not been a priority to-date.

Results and next steps

We are able to measure the time saved by utilizing Test Scenarios by first recording the amount of time it takes to run the scenario, and then recording the amount of time each replay of the scenario takes. The time saved is the difference between these amounts.

A recent test run shows roughly 13 minutes of test runtime reduction. One such example Test Scenario that is heavily reused and, thus, greatly benefits from test runtime reduction is the following.

BrandOrderApiTest$BrandOrderApiTestScenario$Factory ran 93 replays with an initial runtime of 1437ms, an average time of 296ms, and 106s (106082ms) time saved

Given the very rough estimate of at least 250 minutes of test setup, the 13 minutes we are currently saving with this system shows we have not adopted it nearly as widely as we could. However, it’s exciting to see examples like the BrandOrderApiTestScenario above because it shows how to utilize this pattern effectively and what kind of time savings we can achieve as we adopt Test Scenarios more widely.