On Slow Test Suites and CI Servers
November 21, 2017 š¬ Get My Weekly Newsletter ☞
A few people on Twitter were talking about developers that run their entire test suite only on their continuous integration (CI) server. The idea was that this was a sign of low quality tests, low quality code, or an otherwise bad process. In the past, I felt this way, and ābreaking the buildāĀ (by checking in code that had failing tests) was viewed as bad. I donāt believe this any longer, and now feel like itās a critical ability for a high-functioning team to have. Not that it isnāt important to identify low quality tests or code, but what exactly is the value of having a fast test suite?
The Value of Fast Tests
The benefit of a speedy test suite is feedbackāthe quicker you get feedback about any problems, the easier it is to address them and get your code shipped. But, this feedback (and, indeed, the tests themselves), are not results. They are artifacts and tools designed to help us deliver results. Itās important to remember that when discussing the virtues of techniques like this.
When developing a feature, you can get feedback quickly by running only the tests relevant to what you are changing. Presumably, you are making a small change, and can evaluate that change by running only a few tests, possibly even just one. The way I like to work is to have one or more acceptance tests capture the overall feature Iām doing, and use unit tests to drive edge cases, as outlined in my article for InfoQ.
While itās nice to be able to run a test suite in a few seconds, itās more important that the tests have adequate coverage for what Iām doing and meet my personal standard for quality, as well as those of the team and project. This means, among other things, that there is value in writing a unit test in Ruby on Rails that loads data into the database. This is the Rails way, like it or not, and there is negative value in deviating from these conventions in a Rails project. Such deviation must be weighed against the benefits, as well as other solutions to the problem at hand (which is not fast tests, but fast feedback).
Again, these are all intermediate artifacts, not results. While my goal is never to be āRails-like at all costsā, itās also never to have āfast testsā. In fact, I donāt even test āadequate test coverageāĀ as an explicit goal. My goal is to deliver results, and while adequate test coverage is often a means to do that quickly, I try hard not to lose sight of the results I need to deliver. I stress this because it means you must take a holistic view of what you are doing and strongly avoid getting lost in the local minima of intermediate metrics like test speed.
It is nice to run the entire test suite after making changes, however. A benefit to doing so is finding regressions caused by new features. Of less benefit is requiring that this happen on a developerās laptop. A CI server can usually run the suite much more quickly, either by being a more powerful computer or by parallelizing the build (or both). If I, as a developer, can get faster feedback on my change by pushing a branch to GitHub and letting 10 parallel processes run my appās test suite, whatās wrong with that? Nothing.
In addition to speed, running tests on a CI server (as opposed to a developer laptop) creates an added avenue for feedback: sharing your build with others. When you hit a snag around a failing test you canāt quite figure out, help is on its way simply by sharing a URL. When your test are trapped inside your laptop, youāve put friction between you and getting help: screen-sharing, synchronous communication, and environment-specific problems on someone elseās laptop.
Remember: these are tools & techniques, not results. Thereās no harm or shame in using what you have available to help do your job.
So what about the notion that a slow test suite is a sign of poor code quality?
Metrics for Code Quality
If one were to list out some metrics of code quality, āspeed of test suite on this yearās MacBook Proā would likely not be one of them. In fact, āspeed of test suiteā is so subjective that itās hard to consider it seriously as any sort of metric for code quality.
Setting aside that code quality (again) is not a result, not a real goal itself, but just a tool that sometimes helps deliver results, there is actually a fair bit of research around more objective measures of code quality. As I discussed a few years ago in āWhat is ābetterā code?ā, we can understand code quality by analyzing its complexity, cohesion, fan-in/fan-out (see also here), or even itās size.
All of these are objectively measurable and provide stronger signals about code quality than speed of a test suite (assuming that we could draw a strong conclusion between code quality and results, which is difficult to do and requires more than anecdotal evidence).
A slow test suite still feels kinda bad, thought, and there are practical concerns with using a CI server to make a test suite faster. If you are practicing Continuous Delivery, it means that you cannot ship code without waiting for your test suite, and so your ability to deliver results is always constrained by it. Where I work, we have an app that needs 20-30 minutes for its test suite to run, even when using heavy parallelization. This means that any serious bug in this app must exist for 20-30 minutes in the best case.
But even this is a different problem than code quality or test suite speed. If we have an application where we cannot get feedback about its correctness (or ship it to production) without a long wait, a fast test suite isnāt the only solution to that problem.
Address the Problem, not the Symptom
Making a test suite faster feels good, because itās tractable. We try to convince ourselves that techniques like mocking, null databases, or headless browsers are all that stand between us and faster feedback. But consider the design and architecture of your application. Breaking up a large application into several smaller ones is much more difficult, but is often more sustainable.
This problem is easily seen in the small when executing a subset of tests locally. Ideally, you can run only those tests
relevant to your change, since they are closest to the code being changed. If your app is organized by ātype of moduleā, this
can be difficult. A Rails application puts all models in app/models
, all controllers in app/controllers
, etc. To run all
tests for code around a certain feature, say purchasing a product, is difficult. An application organized by function, rather
than structure, would make this easier.
If you were to list possible solutions to this problem, āmaking the test suite fasterā would not be high on the list. True, running all tests is a way to check the tests relevant to your feature, but itās note the only solution, and not even the best one. Your applicationās architectureāand what it does to your ability get feedback while developing itāis a lot more important than test speed (Itās tradeoffs all the way down, because your hand-crafted, locally-source Sinatra application might be better organized around running tests, but will send you into endless code reviews around maintaining the conventions you had to invent to get there. Tradeoffs).
And, again, to belabor the point: none of this is a result. No user benefited from the particular location of code in a project, the use of dependency injection over mocks, or a deployment pipeline. However they very much benefit from functionality being delivered quickly and/or working properly. There is a difference between these things, despite how related they are.
Take care in how you judge a system, tool, or technique. Often what appears to be āpapering over the problemā is, in fact, solving a more relevant problem, or addressing something more directly connected toā¦delivering results.