At Blend, merging changes to our most active codebase requires passing a suite of end-to-end headed tests. For almost a year, many of these tests were flaky (didn’t pass 100% of the time), meaning that every engineer working on that codebase would experience frequent merge build failures, having to “retry the merge” many times per PR. These tests are unpleasant to debug because they run slowly against a browser, require the backend to be running in a certain configuration, and behave slightly differently when run locally vs. in our CI environment.
Our first attempt to improve the reliability of the test suite was to prioritize flaky test fixing as an item in each pod’s roadmap (a pod is a cross-functional team consisting of engineers, a PM, and a designer). For instance, if a pod owned five flaky tests at the time of planning, we would estimate and allocate time in their roadmap for the quarter to fix the five tests. This required a considerable amount of negotiation to prioritize test suite work against other priorities. Even after the roadmap was in place, fixing tests would typically take a back seat to more urgent product work over the course of the quarter. The test suite remained flaky.
Our second approach was inspired by one way in which governments solve tragedies of the commons. A tragedy of the commons is a situation where a resource is shared by a group of parties, and it decays because no one is individually incentivized to maintain it. One approach to solving a tragedy of the commons is to divide the shared resource and privatize ownership of each part, requiring each party to pay a tax proportional to its contribution to the problem.
We applied this idea to the test suite, instituting a policy where, before each sprint, we calculated each pod’s “tax” — a number of engineers owed proportional to the number of flaky tests owned by the pod. Those engineers would be required to join the “Technical Health Pod” the following sprint, devoted to fixing flaky tests. This created a “skin in the game” incentive for each pod to get ahead of flaky test debt to avoid the disruption of losing engineers.
It worked extremely well, and quickly. Within a month, the flaky tests were almost completely eliminated, all without any micromanagement. No one has ever been pulled into the Technical Health Pod. We’ve expanded the tax to cover other types of tech debt in shared resources with similar results.
Roadmapping the work to maintain shared resources isn’t necessarily effective, especially in a culture of distributed ownership and high autonomy like Blend’s. Instead of micromanaging each pod’s task list, try creating an incentive such as a tax to drive behavior that better serves the greater good in a systematic, continuous fashion.