By philwinder | September 26, 2016
The testing of microservices is inherently more difficult than testing monoliths due to the distributed nature of the code under test. But distributed applications are worth pursuing because by definition they are decoupled and scalable.
With planning, the result is a pipeline that automatically ensures quality. The automated assurance of quality becomes increasingly important in larger projects, because no one person wants to or is able to ensure the quality of the application as a whole.
This article provides some guidelines that I have developed whilst working on a range of software with microservice architectures. I attempt to align the concepts with best practice (references at the end of this article), but some of the terminology is my own.
The scope of the test should be limited. This reduces the amount of code to be tested, which reduces the chances of a developer not providing full coverage. It also reduces the boundary of the code under test, meaning reduced dependencies and fewer issues caused by external sources. Large boundaries are affected by proportionally large numbers of changes, any one of which could break the test.
Mike Cohn introduced the concept of a test pyramid, which suggests that the number of tests performed should be proportional to the scope of the test. This is due to two reasons. We want to ensure high test coverage of the code, so it is vital to have many well focused unit tests that do not change often (terminology introduced shortly). And we want few user-level tests because they tend to be slow (to setup and to run) and more brittle (a change to an api could completely break an entire workflow).
This is easier to imagine if we consider when developers use unit tests and when they use user-level tests. Unit tests are used to verify new functionality and ensure that previous functionality remains. User-level tests are to ensure that the user is able to follow a workflow and that requirements are met. Unit tests rarely change, because large parts of the codebase will remain static. But user tests may change often as they are often coupled to large amounts of functional code (attempt to reduce this!). Hence, it is safer and wise to have a large number of unit tests and as few user tests as possible.
Below is a list of testing scopes, starting with the smallest, most focused tests and ending with the largest, all-encompassing tests.
Unit tests are technology focused. Their goal is to verify functionality, quickly. They are an artifact of TDD, but in research driven development they are usually written to complement code. Ensuring that tests are as small and focused (in the same way code should be cohesive) allows for easy refactoring. Unit tests are used as a developer safety net, to ensure functionality has not inadvertently altered with the developers changes.
Test the service without external dependencies. For example, test the stateful services without the need for an external DB. Use a mocked or simulated version. Items to test:
- Wiring of internal subjects
- Inter-class communication and dependencies
Container (or equivalent binary)
Test the service as a container, with their external dependencies. Only test tasks specific to the service. No need to start entire application. For example, test stateful services with a real DB container and by using their external API. Items to test:
- Failure state: What happens when dependencies are not available (e.g. no db)?
Application or end-to-end tests
Verify that the application works as a whole, on the intended platform. Items to test:
- Does it start?
- Does it not crash?
- Are the deployment scripts working?
- Is the deployment platform responsive?
- Non-functional tests, like performance
- Any public APIs
From the point of view of a user, is the application functional? Items to test:
- UI testing – Link checking – Click through testing – Use case testing
- Workflows related to user interaction. E.g. create account, add to cart, checkout cart, order, delete account.
- Requirements testing
How many tests?
As a rule of thumb, I use line coverage to dictate the number of unit tests. And most people advise to keep the tests as clean and cohesive as possible, but in my experience, life’s (or deadlines’) too short.
In practise, I’ve found there can never be enough tests. Even if there are duplicate or unnecessary test, the minuscule storage, mental burden and cpu cycles that they consume does not warrant spending time to hunt them down and remove them until they start causing problems (i.e. they don’t work or they are flakey).
That is unless you expect that your unit tests are going to be used as documentation; like in a public library project, for example. In that case, it would be wise to minimise the number of unit tests to help users navigate the code.
I’d normally ask engineers to make a judgement call on whether spending time improving tests is worth their effort. I would consider the overall confidence in the application; am I confident enough to say that if I did a release, I’d be happy that everything would work as intended? If not, then consider improving the number/quality of tests.
Different environments test different scopes. The following table highlights the most important environments. One of the most underrated environments is staging. If tested services are automatically deployed to staging, it gives engineers a playground to verify their code before promotion to production. Allowing engineers themselves to perform “smoke” tests in these environments promotes the ownership of the service (DevOps).
|Build||Building code and containers||Unit, Component|
|Testing||Transient infrastructure for testing containers and applications||Container, Application, User|
|Staging||Permanent infrastructure for snapshot deployment|
|Production||Static production deployment|
A pipeline provides the continuous integration and delivery of code into production. They automate the acceptance of new code, which reduces the testing burden and promotes ownership of new code.
All events are initially triggered by the source control repository. The following is a table of repository actions that will produce a result.
|Commit in pull request (PR)||Build branch, deploy to testing|
|Commit/Merge in master||PR branch passing tests. Code reviewed.||Build snapshot, deploy to testing|
|Master commit passed testing||Deploy to staging|
|Source control release||Master passing tests. Manual smoke tests complete.||Build tagged version, deploy to testing, deploy to production|
We can invert that table. Environments will be triggered when:
|Build||New commit in branch/master/tag|
|Testing||Successful build of branch/snapshot/tag|
|Staging||Successful testing of snapshot|
|Production||Source control release|
Preferably, all steps will be integrated with source control. E.g. you can’t merge the PR until the branch has been built and successfully deployed to testing.
Generally, you can proceed to the following step when:
|Production||After manual QA acceptance|
|Staging||After successful testing on the main branch|
|Testing||After successful build|
|Build||After successful unit and component test and new PR|
There are a number of very good recent books.
Sam Newman’s “Building Microservices” has a thourough section on testing.
Eberhard Wolff’s “Microservices” covers roughly the same content.
Finally, Adrian Mouat’s “Using Docker” provides a “walkthrough” style example using Jenkins (yuck!).