Refining Automated Testing: Balancing Speed and Reliability in Modern Test Suites
Traditionally, automated tests are classified as unit tests, integration tests, and end-to-end tests. This classification is based on the scope of a test, though the distinction between the different types is not always clear. Unit tests have a narrow scope and usually exercise a single method or class. Integration tests validate the interaction between different components. End-to-end tests often exercise complete user flows on a platform or web application, involving several disparate systems.
As a codebase grows, slow and flaky tests start affecting developer productivity. It’s instructive to examine test suites from another dimension — speed and determinism.
Sources of Slowness and Non-Determinism
We know through intuition and experience that end-to-end and integration tests are slower and more flaky than unit tests, but why is that the case? Let’s consider the environment a test runs in.
Single Thread or Process
When a test runs in a single process, the code under test also runs in the same process. This precludes the creation of servers or databases in separate processes and connecting to them in the test. Tests that depend on servers or databases must use mocks or fakes.
These tests don’t make any blocking inter-process I/O calls, which removes a major source of slowness and non-determinism.
Single Machine
Some tests run across multiple processes, spinning up databases and servers in different processes from the test code and making blocking calls to them. They even make network calls but within the same machine.
The test code is now dependent on other processes to run reliably, which might not always be the case. The test code is now at the mercy of the operating system scheduler and other factors when it makes API calls. Although this introduces some slowness and flakiness in comparison with single-threaded tests, restricting to a single machine still prevents tests from making remote calls to other machines, which is the biggest source of non-determinism. Which brings us to…
Multiple Machines
These tests run with effectively no constraints. Especially with cloud environments being the norm for SaaS applications, test suites can spin up several cloud resources and run full system tests across multiple virtual machines. Since the test now has several dependencies, even one failing component can affect the entire test.
Designing a Test Suite
A good test suite provides several benefits:
- Maintainability – Well-tested code is more maintainable, allowing developers to add new features, fix bugs, and refactor code without fear of inadvertently breaking unrelated code.
- Documentation – Given how easily documentation about a service or feature goes out of date, well-written tests are often the best way to understand code behavior.
- Clean APIs – Black-box tests ensure that the code under test exposes the right API interfaces.
- Coverage – Extensive test coverage gives confidence in the release process to engineering and non-engineering stakeholders including sales and go-to-market teams.
For a test suite to be effective and reliable while making developers more productive, it must minimize slowness and non-determinism. The boundary between unit tests, integration tests, and end-to-end tests can also be fuzzy. Thus, when designing a test suite for a system, it can be helpful to instead think of the tests in terms of the resources they use.
Mike Cohn’s test pyramid gives a great starting point for thinking about how to structure the different classes of tests. Here we also use it to draw the analogy between scope-based tests and environment-based tests.
Conclusion
- Most of your tests should be fast and reliable single-threaded tests, targeting a narrow code section.
- Some of your tests should be single-machine tests, bringing up different local dependencies and testing how different components interact with each other.
- Few of your tests should run across remote machines, exercising the end-to-end flow of your application.
This structure tends to strike the right balance between extensive coverage, maximizing speed, and minimizing flakiness, making for an effective test suite.