Press enter or click to view image in full size
In the previous post we discussed the “why” — we went over some of the benefits of integrating automatic testing into your development flow. In this post, we’ll go over the “how” — some guidelines for forming a healthy, safe and rapid development process around your test suite.
Continuous Integration (CI)
The first and most important thing you can do when dealing with tests is integrating them into your development and shipping process.
To fully enforce the test suite we need to make sure two conditions are satisfied:
- Code cannot be pushed directly into master — only pull requests should be used to introduce new code to the master branch.
- Pull requests must be approved by the CI server after running all tests and verifying they all pass.
Under the assumption that any new code includes tests (more on that on the next section), enforcing these two simple rules in your development process ensures that changes don’t break existing behavior and adds confidence in new functionality added to the system.
CI lays the foundation for having a robust and safe development process around your test suite.
Another important benefit of running tests on the CI server is making sure the system is not dependent in any way on the developer’s local machine since the test suite runs in a neutral, isolated environment.
Test Coverage
Ensuring no tests have failed isn’t enough — an empty test suite allows the project to pass CI since no tests have failed (0 tests = 0 failures). It means that a commit that deletes all the test suite actually passes CI but obviously it is not a healthy situation for a project to be in. We need to make sure our test suite actually covers our source code.
Test coverage measures what percentage of the source code is being ran by the test suite. The output of a test coverage report is pretty detailed — you can see each line of source code, colored by green if it was ran during the test suite or red if it hadn’t. Some tools also show the number of times each line has ran during the tests.
The report allows you to easily identify code paths that weren’t attended by the tests:
Press enter or click to view image in full size
Coverage reports can also be integrated into the CI process to get coverage insights on pull requests — has the test coverage increased or decreased comparing the two branches? in which files exactly has the coverage increased or decreased?
We’re using codecov.io to achieve that but there are many similar services that can be used for that purpose.
Press enter or click to view image in full size
The above status on the pull request shows us that the coverage has increased by 9.72% in this branch compared to the branch the pull request was opened against, which is generally a good sign.
If the PR introduces new code with no proper tests, the coverage percentage drops, a fact that can be used in order to automatically flag such pull requests and block merges until the coverage reaches a certain threshold accepted by the team members.
Enforce Pull Request Approval & Pay Attention To Test Code
Most code reviews concentrate on the source code itself. I believe the test suite’s code is no less important than the actual source code. Like any other code base, if your test suite’s code isn’t being regularly reviewed, it will eventually become unmaintainable and a burden on your team rather than an enabler for rapid development.
When reviewing a pull request, try going over the test suite’s code first. Look for places to improve on the following aspects:
- Do the tests actually validate and verify the behavior they claim to test?
- Are they comprehensive? do they cover enough cases?
- Is it clear from the test code/description what is it trying to validate?
- Is the test suite’s DRY ? Can we re-use existing functionality or extract test functionality to a shared?
Approval of at least one team member should be enforced on pull requests to make sure both the added functionality and the accompanied tests are at high standards.
Pick Inputs Wisely
First I would like to explain what I consider as inputs. Our code’s behavior is affected by two factors: direct input values and state. For example, a function that serves vodka to users by their age might look like that:
The user-id here is a direct input value, while the user record in the database is the state. We use fixtures to set up the state in which our test run, and arguments to pass direct input values to our test code.
When we think of our test inputs we have to take both state and direct input values into consideration.
Since testing every possible input combination is both impossible and counter productive, picking the right inputs can become the difference between an efficient test suite and a useless one.
But how do we pick the right inputs?
There are two very clear rules and a third, kind of obscure one:
- Pick values to cover all code paths:
a. fixture of user with birth date of more than 21 years ago + that user id
b. fixture of user with birth date of less than 21 years ago + that user id - Pick values to challenge your code:
a. fixture of user with birth date of exactly 21 years ago + that user id
b. passnilas auser_idargument
c. pass auser_idthat doesn’t have a matching record on the database
d. set up a fixture of a user without a birth date — relevant if that field isn’t mandatory — and pass itsuser_idas an argument - Pick values to cover more user stories:
“Cheating” on tests is pretty easy — you can easily follow the two above rules and have a 100% covered code with all test passing but that doesn’t necessarily means the test suite is good enough. Try to think about other states the system might be in and which inputs might be given to transition it. The feature (product) spec is your best candidate to get some ideas of which test cases should be added to the test suite. By covering more user stories we reduce the chances for bugs when releasing the feature.
As the developer assigned with the task, you know the system, its interactions and the spec you’re working by better than anyone else. Use that knowledge to make sure your test suite is comprehensive and covers a reasonable amount of cases.
Don’t Neglect Side Effects
A function has two distinct roles:
- It returns a value
- It might have one or more side effects
We have to make sure we test for both.
As an example let’s take a REST endpoint for user registration. A typical request spec would look like:Request: POST /register {"email": "my@email.com", password: "secret"}
Response: 201 CREATED {"id": 1, "email": "my@email.com"}
The server is expected to:
1. Create a new user record on the database with the user’s email and a randomly generated confirmation token
2. Send an email to my@email.com with a link allowing the user to confirm their email with the confirmation token
3. Return a 201 status code with the created user record. Obviously, the response should not include the confirmation token.
Get Erez Rabih’s stories in your inbox
Join Medium for free to get updates from this writer.
This is a classic case in which we have an input (the HTTP request), an output (the HTTP response) and two side effects: database changes and a confirmation email. The easiest test would be verifying a simple request <-> response flow but it leaves a large gap for bugs. We have to make sure we cover all of the endpoint’s responsibilities, including the side effects, to have a comprehensive test suite:
Note: this is only a happy path test — tests for duplicate emails, wrong email format and other failure scenarios should exist but I wanted to keep this short.
When these tests pass, we can be pretty sure our registration endpoint works as expected. Test will fail if any of the following occurs:
1. returned status code is not 201
2. we don’t return the created user and its id in the response body
3. we do return the confirmation token in the response body (security breach)
4. we don’t create a database record with the provided email
5. we don’t generate a confirmation token for the created user
6. we don’t send a confirmation email to the provided email.
The confirmation email is actually the only gap here since confirmation-sent? is mocked to avoid network calls in our test suite and we haven’t verified the content of the email.
Use Mocks/Stubs Carefully
Mocks are parts of the system we fake solely for test purposes. We use mocks because sometimes we can’t run all the system on our machine or because interacting with some parts is very time consuming — something we can’t afford in tests.
Mocks should be used very carefully and generally should be avoided where possible. We must remember that since they are fake, mocks take us further away from the system as it runs on our production servers. The gap mocks create between our test environment and our actual runtime environment allows bugs to leak from our test suite into our staging/production environments.
Let’s take an example of an application that uses redis to keep an AuthenticationToken=>UserID map. When the user logs in we issue a token and save it in redis with the user’s id as value. When the user performs a request with a token, we can receive the matching user id from redis for authentication:
If the user id is stored (in redis) under the provided token key, we fetch it from the database and return it, otherwise the token is unauthorized.
Let’s assume that for test purposes we decided to mock redis as a simple key value storage:
And finally to the test code itself:
The tests pass, but when deploying this code to staging/production all requests will be unauthorized and errors will be flying all over the place.
There are very small chances you can trace the reason for this massive failure from going over the code — There is a mild difference between the mock redis and the real one:
redis only stores binary strings as plain values so(redis/set "authorized-token" 1)yields "1" and not 1, when fetching the value.(redis/get "authorized-token") # => "1" and not 1
Since our mock didn’t take that into account the code fails when trying to fetch the user by its id from the database [user_authentication.clj, line 5].
The important point here is that by introducing the redis mock to our test suite, we created a gap between our test runtime environment and our production environment. This gap allowed a bug to go through our test suite into our servers. If we had used a “real” redis for the test environment as well, this bug wouldn’t have gone through our test suite into our production environment since the tests would fail.
When it makes sense prefer running dependencies rather than mocking them in test environment.
Mock As Accurate As possible
There are cases we have to use mocks for. In these cases, make sure to mock the tightest you can to the code flow.
Let’s take an example of a function that’s supposed to return a file metadata from a S3 bucket:
The implementation is very simple — we issue a GetObject request with the given bucket and filename path to S3 and if it didn’t raise an exception we’re all good.
When testing this method, we will likely mock the S3 GetObject request to avoid network calls
We set an example metadata, mock the s3/get-object call (line 4) and make sure we get the metadata back from the method. All tests are green.
A few days later a commit is made to change the original method:
We commit, CI goes through the PR and marks all tests as passing. Because we trust our test suite we deploy to production where things start to break.
The commit surrounded the bucket variable with double quotes (probably an innocent mistake) which made the function constantly return nil. The tests passed because the mock we made is too permissive — it doesn’t even check for the arguments it received and returns the metadata blindly. A well crafted test (and mock) would have alerted us something is wrong with our change:
The new mock will only return the metadata if the values we pass to get-file-metadata are passed into the :bucket-name and :key of s3/get-object respectively. As a result, this test would have failed the PR alerting us we’re no longer passing the values to the method as expected.
When using mocks, make sure they’re as tight as possible around the expected behavior.
Fix Bugs By First Creating A Failing Test
Bugs are inevitable. No matter how experienced, disciplined and professional your team members are, and even if your system has the most comprehensive test suite, they will find their way into your code.
When you are testing your code you produce example data sets (AKA initial state) and run your code on them. The majority of the bugs occur because our code was introduced with a state we haven’t considered in our test suite. Thus, a bug is an opportunity for us to expand our test suite to cover cases we haven’t taken into consideration. The way I like approaching bug fixes is the following:
- I reproduce the state that caused the bug to appear, and set the expected result
- The test suite should fail (otherwise we wouldn’t have had a bug at first place)
- I change the code to make the test pass
Following this procedure for bug fixing has some amazing advantages:
Firstly, we can be pretty sure we actually fixed the bug. It might sound obvious but I am sure you are well familiar with cases of bug-fix deployments that haven’t actually fixed anything. The case here is different — we “proved” there was a bug by demonstrating an example of a state our system doesn’t provide the desired results for.
Secondly, we expanded our test suite to include more edge (unpredicted) cases. The chances this bug will re-occur somewhere in the future are slim.
Conclusion
Making changes to large software systems is not an easy experience for a developer. The fear of breaking things might become paralyzing and reduce development velocity by an order of magnitude.
A healthy development process is a process that makes you trust your test suite. A process that allows you to add new features as well as change existing ones with as much confidence as possible.
Following the above guidelines is definitely a right step towards such a process. It increases the team’s trust in the test suite which allows the team members to apply changes in a more safe, rapid and confident manner.