Managing Customer Releases with Feature Flags instead of Branches
How Blend delivers features to customers on different schedules using feature flags.
At Blend, we offer a white-label consumer lending platform that streamlines the otherwise manual, paper-based, and generally painful borrowing process. One challenge inherent in our business model and industry is serving a diverse set of lenders with a single product — from small independents to the largest banks — each with different levels of comfort in accepting changes to the product. Some lenders want the latest functionality as soon as it’s available, while others prefer to test every user-facing change in our beta environment for a month or more before allowing it to be promoted to production.
As we started signing more customers, supporting different cadences of feature release became more urgent. We wanted to meet the needs of our growing customer base while continuing to build and deploy changes iteratively. We considered two approaches to this challenge:
One approach was to deploy a separate instance of our core service for each customer, maintaining separate branches as needed to control which functionality was present. This would keep the code cleaner (on a given branch) and not require any new tools or frameworks. On the other hand, it would make debugging more difficult since different customers would be on different versions with as much as a month of skew among them. It would also make it necessary to manage a linearly growing set of instances, with a nontrivial setup time for each additional instance. It would require us to maintain a large number of branches in production, making continuous delivery much more complicated.
The alternative was to deploy a single version of code for all customers but control functionality differences using feature flags. Deploying a single version would keep debugging and code deployment simpler since the team would only have to know about and understand a single recent version of code. It would also make it quick and easy to revert changes that cause problems. The downside is that it would make the code more complex and branchy (each feature flag introduces at least one conditional), and it would require new tools to manage flag state and scheduling. Finally, this approach would make it more difficult to fully customize anything for a given customer, which can be very useful in the short term but is not as scalable in the long run.
Burned by the one-branch-per-customer approach
I experienced the “branch per customer” approach first-hand in prior work. Because most customers were not willing to use cloud-based services at the time, we typically hosted the application on-premises. This structure permitted us to deploy a different version to each customer, which made it more difficult to upgrade customers to the latest version; every live branch had to have the latest changes merged in, and we didn’t upgrade every customer at the same time. We shipped a new version about once a month. I remember the pain of having to figure out how the code worked a month ago on the version that that customer happened to be on in order to debug.
Because of this experience with on-prem hosting, we’ve always been adamant about hosting Blend in the cloud and delivering upgrades continuously, despite the reservations of many of our early prospects. Among a multitude of other benefits, this made it possible for us to consider the feature flagging approach. This approach seemed like a better solution overall, so we went with it.
Today we have almost 200 feature flags in production. We’ve scaled up our ability to manage it using our “Configuration Center” UI, which allows flags to be controlled for cohorts of customers and automatically scheduled for promotion.
The feature flagging approach has proven to be the right decision. It has scaled past 100 customers so far and allowed us to continue upgrading our core service relatively frequently (~daily) and in a highly automated fashion. Engineers only have to understand a small, constant number of code versions at a given time.
While a few bugs have been caused by unanticipated, untested interactions between flags, this has not been a major issue by and large. Still, it is simpler to deal with a smaller number of flag configuration sets — in other words, try to have as many customers as possible on the same settings. To this end, we have a limited set of cohorts of customers sharing the same settings.
A new form of tech debt
Three unanticipated classes of what we call “dead feature flags” have come about:
On-everywhere flags: Flags that are enabled everywhere, but linger in the code. This tends to grow because pods do not necessarily prioritize their removal immediately.
Off-everywhere flags: Flags that have been in the code for months but are still not enabled anywhere. These come about in several cases:
- A feature is started but deprioritized.
- A feature is worked on for a long time behind a single flag. This is not ideal because it means that the change is not being released in production iteratively.
- A feature is finished, but no customer wants to enable it. The flag and the code it controls are kept in hopes that customers will want the functionality at some point.
Custom flags: Flags that are only ever enabled for one customer, or that are enabled for all but one customer. The number of these flags grows because of the permanently unique needs of certain customers. In these cases, the flag needs to be converted to a permanent configuration setting, or we need to work with the customer to remove the need for customization.
Everyone benefits when unnecessary code is deleted, so we encourage pods to clean up after themselves in shared codebases. We’ve been able to do this effectively using the Technical Health Pod.