On Slow Test Suites and CI Servers

November 21, 2017 šŸ“¬ Get My Weekly Newsletter

A few people on Twitter were talking about developers that run their entire test suite only on their continuous integration (CI) server. The idea was that this was a sign of low quality tests, low quality code, or an otherwise bad process. In the past, I felt this way, and ā€œbreaking the buildā€Ā (by checking in code that had failing tests) was viewed as bad. I donā€™t believe this any longer, and now feel like itā€™s a critical ability for a high-functioning team to have. Not that it isnā€™t important to identify low quality tests or code, but what exactly is the value of having a fast test suite?

The Value of Fast Tests

The benefit of a speedy test suite is feedbackā€”the quicker you get feedback about any problems, the easier it is to address them and get your code shipped. But, this feedback (and, indeed, the tests themselves), are not results. They are artifacts and tools designed to help us deliver results. Itā€™s important to remember that when discussing the virtues of techniques like this.

When developing a feature, you can get feedback quickly by running only the tests relevant to what you are changing. Presumably, you are making a small change, and can evaluate that change by running only a few tests, possibly even just one. The way I like to work is to have one or more acceptance tests capture the overall feature Iā€™m doing, and use unit tests to drive edge cases, as outlined in my article for InfoQ.

While itā€™s nice to be able to run a test suite in a few seconds, itā€™s more important that the tests have adequate coverage for what Iā€™m doing and meet my personal standard for quality, as well as those of the team and project. This means, among other things, that there is value in writing a unit test in Ruby on Rails that loads data into the database. This is the Rails way, like it or not, and there is negative value in deviating from these conventions in a Rails project. Such deviation must be weighed against the benefits, as well as other solutions to the problem at hand (which is not fast tests, but fast feedback).

Again, these are all intermediate artifacts, not results. While my goal is never to be ā€œRails-like at all costsā€, itā€™s also never to have ā€œfast testsā€. In fact, I donā€™t even test ā€œadequate test coverageā€Ā as an explicit goal. My goal is to deliver results, and while adequate test coverage is often a means to do that quickly, I try hard not to lose sight of the results I need to deliver. I stress this because it means you must take a holistic view of what you are doing and strongly avoid getting lost in the local minima of intermediate metrics like test speed.

It is nice to run the entire test suite after making changes, however. A benefit to doing so is finding regressions caused by new features. Of less benefit is requiring that this happen on a developerā€™s laptop. A CI server can usually run the suite much more quickly, either by being a more powerful computer or by parallelizing the build (or both). If I, as a developer, can get faster feedback on my change by pushing a branch to GitHub and letting 10 parallel processes run my appā€™s test suite, whatā€™s wrong with that? Nothing.

In addition to speed, running tests on a CI server (as opposed to a developer laptop) creates an added avenue for feedback: sharing your build with others. When you hit a snag around a failing test you canā€™t quite figure out, help is on its way simply by sharing a URL. When your test are trapped inside your laptop, youā€™ve put friction between you and getting help: screen-sharing, synchronous communication, and environment-specific problems on someone elseā€™s laptop.

Remember: these are tools & techniques, not results. Thereā€™s no harm or shame in using what you have available to help do your job.

So what about the notion that a slow test suite is a sign of poor code quality?

Metrics for Code Quality

If one were to list out some metrics of code quality, ā€œspeed of test suite on this yearā€™s MacBook Proā€ would likely not be one of them. In fact, ā€œspeed of test suiteā€ is so subjective that itā€™s hard to consider it seriously as any sort of metric for code quality.

Setting aside that code quality (again) is not a result, not a real goal itself, but just a tool that sometimes helps deliver results, there is actually a fair bit of research around more objective measures of code quality. As I discussed a few years ago in ā€œWhat is ā€˜betterā€™ code?ā€, we can understand code quality by analyzing its complexity, cohesion, fan-in/fan-out (see also here), or even itā€™s size.

All of these are objectively measurable and provide stronger signals about code quality than speed of a test suite (assuming that we could draw a strong conclusion between code quality and results, which is difficult to do and requires more than anecdotal evidence).

A slow test suite still feels kinda bad, thought, and there are practical concerns with using a CI server to make a test suite faster. If you are practicing Continuous Delivery, it means that you cannot ship code without waiting for your test suite, and so your ability to deliver results is always constrained by it. Where I work, we have an app that needs 20-30 minutes for its test suite to run, even when using heavy parallelization. This means that any serious bug in this app must exist for 20-30 minutes in the best case.

But even this is a different problem than code quality or test suite speed. If we have an application where we cannot get feedback about its correctness (or ship it to production) without a long wait, a fast test suite isnā€™t the only solution to that problem.

Address the Problem, not the Symptom

Making a test suite faster feels good, because itā€™s tractable. We try to convince ourselves that techniques like mocking, null databases, or headless browsers are all that stand between us and faster feedback. But consider the design and architecture of your application. Breaking up a large application into several smaller ones is much more difficult, but is often more sustainable.

This problem is easily seen in the small when executing a subset of tests locally. Ideally, you can run only those tests relevant to your change, since they are closest to the code being changed. If your app is organized by ā€œtype of moduleā€, this can be difficult. A Rails application puts all models in app/models, all controllers in app/controllers, etc. To run all tests for code around a certain feature, say purchasing a product, is difficult. An application organized by function, rather than structure, would make this easier.

If you were to list possible solutions to this problem, ā€œmaking the test suite fasterā€ would not be high on the list. True, running all tests is a way to check the tests relevant to your feature, but itā€™s note the only solution, and not even the best one. Your applicationā€™s architectureā€”and what it does to your ability get feedback while developing itā€”is a lot more important than test speed (Itā€™s tradeoffs all the way down, because your hand-crafted, locally-source Sinatra application might be better organized around running tests, but will send you into endless code reviews around maintaining the conventions you had to invent to get there. Tradeoffs).

And, again, to belabor the point: none of this is a result. No user benefited from the particular location of code in a project, the use of dependency injection over mocks, or a deployment pipeline. However they very much benefit from functionality being delivered quickly and/or working properly. There is a difference between these things, despite how related they are.

Take care in how you judge a system, tool, or technique. Often what appears to be ā€œpapering over the problemā€ is, in fact, solving a more relevant problem, or addressing something more directly connected toā€¦delivering results.