Continuously Integrating a Major Rails Upgrade

How we upgraded from Rails 4 to 5 zero commits ahead of master.

For all intents and purposes, Rails 4.x has reached its end-of-life. The Rails maintenance guide explains the release series support while the Security Policy explains which versions will receive security patches and new versions.

For major security issues, all releases in the current major series, and the last release in the additional major series will receive patches and new versions. This is currently 6.0.x and 5.2.x.

If you’re running a Rails 4 application, now’s the time to seriously consider upgrading.

This is a guide outlining how a few engineers upgraded the Adwerx application from Rails 4.2 to 5.2 one changelog item, one configuration option, one process, and one fraction of the production workload at a time without interrupting the product engineering teams (~15 engineers) while they shipped new features.

We practice continuous delivery at Adwerx, so we opted to integrate all changes during the upgrade to master as they were made and approved. We created custom RuboCop rules to automate finding Rails-5-incompatible code in pull requests and autocorrect application code to be forward-compatible. We made the application dual-bootable and introduced the Rails 5 application to production with canary rollouts.

Press enter or click to view image in full size

Our upgrade plan at a high level

Continue reading if you’d like to learn more about how we did so and what we learned along the way.

The Application Footprint

To give you some context about the size of our application and why a framework upgrade might be challenging, here are some statistics current as of writing this:

Our Rails application:

About 7 years old
Started on Rails 3.2.8
146K lines
4,949 files
CodeClimate “B” maintainability
75% test coverage

In a typical week, our application processes:

~20M sidekiq jobs
~75K resque jobs
~50K pub sub messages
~10M HTTP requests across the website, API, and pixels

The devil is in the details of a Rails upgrade—sometimes between the lines. In an application this size and age, you’re lucky if you don’t have any monkey patches, private Rails/ActionPack API usages, forked/outdated gems, and configuration baggage from previous Rails versions. Like any team, we had a little bit of it all.

Our Approach

We borrowed from the playbooks of Shopify and GitHub for this upgrade. This meant dual-booting our application—the ability to run the application in either the old Rails version or newer Rails version from the same revision.

When dual-booting, the application uses separate Gemfiles and Gemfile.locks. Combine this with a global method to check the Rails version and you can upgrade your application to run both versions of Rails at once successfully. You can run tests against 4 and 5 at the same time in CI or run each version simultaneously in development. This allowed the upgrade team to make continuous changes to the application while other teams were rapidly shipping code and writing new features. At no point was the upgrade team holding a branch of upgrade work all at once to merge and reconcile conflicts.

If you’re wondering if this is the right approach for you, you might ask yourself the following questions:

How much of your code is covered by your test suite?
What’s at risk for your organization if something goes wrong?
Do you have the engineering capacity to follow through with this?

If you don’t have a thorough test suite, you might want to keep your changes separate from master for a longer time. If you’re not worried about the risk an upgrade like this poses to your organization, this might be overkill! If you don’t have the engineering capacity to execute on an approach like this, do something more pragmatic and don’t fret about it.

Execution

Outlining our milestones for this upgrade in broad strokes was critical to establishing a baseline of progress and sizing up the work. The section below contains our list of milestones and any challenges we encountered while completing them.

Approximately 9/10ths of this upgrade plan went swimmingly. We detected a nontrivial rate of cookie and/or session interoperability errors between the Rails versions in the last mile of rolling this out. Rolling back was a tactic that we used when we detected early signs of instability during rollout. I highly recommend a similar rollout plan for an application of this size and scope.

Do backwards-compatible upgrades

Step one for us was to identify all backwards-compatible work that can be done to the existing application (hint: use the Rails changelog!). This included setting up ApplicationRecord as the model superclass, optimistically updating gems, starting to educate team members on impending changes, and replacing _filter callbacks with _action alternatives. We found a good bit of work here and it all could be done before splitting our app in two.

Dual boot the app

Dual booting is the process of introducing a second Gemfile and Gemfile.lock pair. The new Gemfile contained updated versions of Rails and its dependencies. To get bundler to look at this new Gemfile, it’s as easy as setting a BUNDLE_GEMFILE ENV variable and setting it to a value of the relative path to the new Gemfile.

We copied our Gemfile into Gemfile_Rails_5 and started upgrading gems. Indicating to bundler to use this new Gemfile looked like BUNDLE_GEMFILE=Gemfile_Rails_5. Here’s a video by Rafael França about doing so. Shopify now has a gem to aid in this process if you’re interested.

We use docker-compose locally, so we added a convenient ENV var passing like this:

# docker-compose.ymlservices:
  app:
    entrypoint: script/entrypoint.sh
    ...
    environment:
      RAILS_5: ${RAILS_5}
      ...

And our docker-entrypoint.sh had a couple lines added to it:

# script/docker-entrypoint.shif [ ! -z "$RAILS_5" ]; then
  export BUNDLE_GEMFILE="Gemfile_Rails_5"
fi# ...exec "$@"

We could then start the application in Rails 5 with RAILS_5=1 docker-compose up and so on.

Code splitting

We were initially wary of introducing a predicate method in the code to split code paths between Rails 4 and 5, but ultimately this was essential and helped us get the application building successfully in both versions. Here’s what the code looked like:

# config/version.rbmodule Version
  def self.rails_5?
    Rails::VERSION::MAJOR == 5
  end
end

and in config/boot.rb

ENV['BUNDLE_GEMFILE'] ||= File.expand_path('../Gemfile', __dir__)require_relative 'version'
...

Now our application had a method to check the rails version. This meant we could conditionally execute code paths intended for only Rails 4 or 5. This became crucial in places we could not alter the Rails 4 code as it would change behavior, and needed to write the code differently in Rails 5 to get it working correctly.

# config/application.rbif Version.rails_5?
  config.load_defaults 5.2
end

We also used this predicate in the config / environment files to conditionally set newer config values and load_defaults 5.2.

Get Josh Bielick’s stories in your inbox

Join Medium for free to get updates from this writer.

The diff to remove the conditional code was +397 −1,342—we ended up needing this a lot and never had an issue with using this approach.

Start testing on the new version

Next, we were interested in starting our CI build on the Rails 5 application. We use CircleCI, so some details here will pertain only to that platform in particular.

We decided adding a second identical workflow for the Rails 5 build would be unobtrusive to current product development teams. We added a second workflow and labeled it accordingly. Since we needed to set BUNDLE_GEMFILE to Gemfile_Rails_5 on all processes in CI, we leveraged CircleCI Contexts. Contexts were a great way to introduce a new set of ENV variables specifically for Rails 5 and apply those ENV variables to jobs within our workflow. A couple lines of YAML allowed us to have all of our existing commands and jobs reused in the Rails 5 build.

Early on, we limited this build to branches matching our team’s branch name prefix: ENG-. Here’s an example of some of the CircleCI config:

# .circle/config.ymlworkflows:
  ...
  rails5:
    jobs:
      - spec: &rails5config
          context: rails5
          filters:
            branches:
              only:
                - /^ENG-.*/
      - spec_slow:
          <<: *rails5config
      - spec_redux:
          <<: *rails5config
      - compile_assets:
          <<: *rails5config

Later, when we had a passing build in Rails 5, we consolidated the Rails 5 jobs into the same workflow as the Rails 4 jobs as it was easier to reason about a single workflow and deploys dependent on all jobs.

Rule of Generation—automate fixes

Some of the upgrades to the application were formulaic. Things like belongs_to now has a different default value for optional, inheriting from a versioned migration superclass, and the method signature of deprecated controller test helpers. For these scenarios that would have required some tedious changes at scale, we decided to leverage RuboCop’s autocorrect functionality.

Since we were already writing RuboCop rules, we decided it would also be helpful to write some rules that could help developers not write Rails 5 incompatible code while we were upgrading to Rails 5. When possible, we wrote a rule to instruct a developer to account for deprecations and new option defaults in code identified by our RuboCop Cops. Most of the time we accompanied this cop with an autocorrector and in some cases we used the autocorrect on the entire codebase to automate fixes.

Here’s a cop we used to ensure all models that were added inherited from ApplicationRecord (this also includes an autocorrector):

inherit_from_application_record.rb

Another that did some heavy lifting for us was an autocorrect to update request spec method signatures to use keyword arguments.

use_new_request_spec_method_signature.rb

This was accompanied by a module patch for all request specs in Rails 4. We have a decent amount of request specs, so the diff for this autocorrect was around 95 files. It didn’t work perfectly and resulted in some infinite loops here and there trying to reposition hashes. Your mileage may vary.

Work through the changelog

We had an itemized list of changes from the changelog we’d need to address in our application. We spent a handful of sprints here working through these changes and leveraging the Version.rails_5? predicate where necessary.

In several cases, we couldn’t utilize Version.rails_5? because the issue we were addressing was too systemic to wrap with an if else. In these cases we introduced conditional monkey patches to wrap the behavior in Rails 4 and seamlessly upgrade the behavior of the code. Once we were fully Rails 5 we deleted these patches.

Get the CI build green

This was the hardest milestone to hit. Our test suite is around 8,000 tests and after making anticipated changes from the changelog, we were left with about 500 failing tests.

We burned down this list as fast as possible, some failures were solved easily (params used as a hash, passing true to association methods to reload associations) and others were hard-fought.

During this time, our QA team ran the exhaustive automation test suite against the application in a Rails 5 QA environment and combed the application thoroughly for any errors.

Gotchas we encountered:

Request specs method signature has changed.
mysql2 suddenly started setting the right STRICT sql_modes on connections.
Params no longer inherits from Hash .
ActiveRecord is time-precision-aware of columns and truncates the sub-second precision during attribute writing instead of truncating at the MySQL level during the INSERT/UPDATE.
ActiveRecord suddenly understood native MySQL json columns and those no longer required serialization.
Marshal serialization for cookies is removed.
Helper specs don’t seem to share the same CookieJar as the view itself.
Never override .new on a Rails model.

Define performance indicators

Rolling out a new framework version can be intimidating. If your business relies on your application performing well, you’ll want to identify some key metrics you’ll be watching while the rollout occurs—especially so if you can’t roll back.

After some discussion, it was evident we didn’t have quite the visibility we needed from our metrics to compare Rails 4 and 5 side-by-side, so we made a few updates so we could verify the success of our rollout.

First, we automatically tagged all error reports with the Rails version. Honeybadger made this somewhat easy, though we did need to hook into the before_notify for it to work correctly because global context was cleared frequently.

Next, we added tags to all EC2 hosts with the Rails version. This meant that all metrics sent via our Datadog Agent were tagged with the Rails version. Now all monitors, dashboards, and stats in Datadog could be attributed to a Rails version.

The last step was creating a rollout dashboard with comparative KPIs for Rails 4 vs. Rails 5 side by side.

For workers, we focused on job failure rate and latency. As long as the failure rates and latency were similar between Rails versions, we continued rollout. For web, we focused on response time, 3xx, 4xx, and 5xx, and error rate from the web tier.

Rollout

Workers

We started our rollout with the lowest-risk service: Sidekiq. Sidekiq powers millions of jobs per day for us and all of those jobs retry on failure.

When we started to rollout Rails 5 Sidekiq processes to 20% of our capacity, some failures started appearing, but the jobs that failed had an 80% chance of being retried on a Rails 4 worker. Things went smoothly and we had very few issues. Issues that did come up were handled quickly and retries were successful.

One thing we overlooked during this rollout was that asset compilation had slightly changed for our Rails 5 application assets. Some of our image assets generated different digest fingerprints during compilation than their Rails 4 Sprockets-compiled counterparts. This meant that when a Sidekiq worker sent an email with a link to an image in it, that link included a digest path for an asset that our Rails 4 server did not have. Our CDN was configured with our Rails 4 load balancer as an origin, so these requests resulted in 404s.

We quickly setup a new CDN distribution and conditionally sent requests to the Rails 5 origin when they matched that host. After that, Rails 5 assets were isolated from Rails 4 assets and things worked correctly.

We rolled out our Resque and Sneakers workers next. This was a bit more problematic as we had some major issues with constant resolution in the lib directory. Some clashing namespaces with global constants and eager loading presenting some cryptic, race-condition-like errors finding constants. Some of the behavior we noticed was due to Ruby’s eager constant resolution from within classes. Example: Array::String::Integer. This was later addressed in Ruby 2.5.

Web

Web and API tier were the last to roll out because they’re higher risk. Web transactions are typically coming from customers and their experience on the site is paramount.

To canary roll out a new web tier, we used weighted DNS routing with a low TTL. This allowed us to set a weight on the DNS records that pointed to both the Rails 4 and 5 load balancers and slowly send traffic to the Rails 5 application.

We began with 10% and started to see an increased rate of 422 responses. The errors were Invalid Authenticity Token. What we noticed is that many users received sporadic authenticity token errors at various points of the website and some users were having trouble logging in (or staying logged in).

We immediately rolled back. Upon inspection we theorized that per-form CSRF tokens could be to blame—when a user bounced between Rails 4 and 5, a subsequent request would not yield valid authenticity tokens because Rails 4 was unaware of the new CSRF scheme. Disabling this functionality did not significantly reduce the error rate, so we decided to take a different approach.

We verified that there were session interoperability issues between Rails 4 and 5 when bouncing back and forth, so our modified approach was to make a swift DNS change and be serving 100% Rails 5. This went predictably and we had almost no subsequent issues.