Project recommendations on how to organize branches.
This document discusses organizing branches in your remote/origin for feature
development and release management, not the use of local branches in Git or
queues or bookmarks in Mercurial.
This document is purely advisory. Phabricator works with a variety of branching
strategies, and diverging from the recommendations in this document
will not impact your ability to use it for code review and source management.
This document describes a branching strategy used by Facebook and Phabricator to
develop software. It scales well and removes the pain associated with most
branching strategies. This strategy is most applicable to web applications, and
may be less applicable to other types of software. The basics are:
- Never put feature branches in the remote/origin/trunk.
- Control access to new features with runtime configuration, not branching.
The next sections describe these points in more detail, explaining why you
should consider abandoning feature branches and how to build runtime access
controls for features.
Suppose you are developing a new feature, like a way for users to "poke" each
other. A traditional strategy is to create a branch for this feature in the
remote (say, "poke_branch"), develop the feature on the branch over some period
of time (say, a week or a month), and then merge the entire branch back into
master/default/trunk when the feature is complete.
This strategy has some drawbacks:
- You have to merge. Merging can be painful and error prone, especially if the feature takes a long time to develop. Reducing merge pain means spending time merging master into the branch regularly. As branches fall further out of sync, merge pain/risk tends to increase.
- This strategy generally aggregates risk into a single high-risk merge event at the end of development. It does this both explicitly (all the code lands at once) and more subtly: since commits on the branch aren't going live any time soon, it's easy to hold them to a lower bar of quality.
- When you have multiple feature branches, it's impossible to test interactions between the features until they are merged.
- You generally can't A/B test code in feature branches, can't roll it out to a small percentage of users, and can't easily turn it on for just employees since it's in a separate branch.
Of course, it also has some advantages:
- If the new feature replaces an older feature, the changes can delete the older feature outright, or at least transition from the old feature to the new feature fairly rapidly.
- The chance that this code will impact production before the merge is nearly zero (it normally requires substantial human error). This is the major reason to do it at all.
Instead, consider abandoning all use of feature branching. The advantages are
- You don't have to do merges.
- Risk is generally spread out more evenly into a large number of very small risks created as each commit lands.
- You can test interactions between features in development easily.
- You can A/B test and do controlled rollouts easily.
But it has some tradeoffs:
- If a new feature replaces an older feature, both have to exist in the same codebase for a while. But even with feature branching, you generally have to do almost all this work anyway to avoid situations where you flip a switch and can't undo it.
- You need an effective way to control access to features so they don't launch before they're ready. Even with this, there is a small risk a feature may launch or leak because of a smaller human error than would be required with feature branching. However, for most applications, this isn't a big deal.
This second point is a must-have, but entirely tractable. The next section
describes how to build it, so you can stop doing feature branching and never
deal with the pain and risk of merging again.
Controlling Access to Features
Controlling access to features is straightforward: build some kind of runtime
configuration which defines which features are visible, based on the tier
(e.g., development, testing or production?) code is deployed on, the logged in
user, global configuration, random buckets, A/B test groups, or whatever else.
Your code should end up looking something like this:
Behind that is some code which knows about the 'poke' feature and can go lookup
configuration to determine if it should be visible or not. Facebook has a very
sophisticated system for this (called GateKeeper) which also integrates with A/B
tests, allows you to define complicated business rules, etc.
You don't need this in the beginning. Before GateKeeper, Facebook used a much
simpler system (called Sitevars) to handle this. Here are some resources
describing similar systems:
- There's a high-level overview of Facebook's system in this 2011 tech talk: Facebook Push Tech Talk.
- Flickr described their similar system in a 2009 blog post here: Flickr Feature Flags and Feature Flippers.
- Disqus described their similar system in a 2010 blog post here: Disqus Feature Switches.
- Forrst describes their similar system in a 2010 blog post here: Forrst Buckets.
- Martin Fowler discusses these systems in a 2010 blog post here: Martin Fowler's FeatureToggle.
- Phabricator just adds config options but defaults them to off. When developing, we turn them on locally. Once a feature is ready, we default it on. We have a vastly less context to deal with than most projects, however, and sometimes get away with simply not linking new features in the UI until they mature (it's open source anyway so there's no value in hiding things).
When building this system there are a few things to avoid, mostly related to not
letting the complexity of this system grow too wildly:
- Facebook initially made it very easy to turn things on to everyone by accident in GateKeeper. Don't do this. The UI should make it obvious when you're turning something on or off, and default to off.
- Since GateKeeper is essentially a runtime business rules engine, it was heavily abused to effectively execute code living in a database. Avoid this through simpler design or a policy of sanity.
- Facebook allowed GateKeeper rules to depend on other GateKeeper rules (for example, 'new_profile_tour' is launched if 'new_profile' is launched) but did not perform cycle detection, and then sat on a bug describing how to introduce a cycle and bring the site down for a very long time, until someone introduced a cycle and brought the site down. If you implement dependencies, implement cycle detection.
- Facebook implemented some very expensive GateKeeper conditions and was spending 100+ ms per page running complex rulesets to do launch checks for a number of months in 2009. Keep an eye on how expensive your checks are.
That said, not all complexity is bad:
- Allowing features to have states like "3%" instead of just "on" or "off" allows you to roll out features gradually and watch for trouble (e.g., services collapsing from load).
- Building a control panel where you hit "Save" and all production servers immediately reflect the change allows you to quickly turn things off if there are problems.
- If you perform A/B testing, integrating A/B tests with feature rollouts is probably a natural fit.
- Allowing new features to be launched to all employees before they're launched to the world is a great way to get feedback and keep everyone in the loop.
Adopting runtime feature controls increases the risk of features leaking (or
even launching) before they're ready. This is generally a small risk which is
probably reasonable for most projects to accept, although it might be
unacceptable for some projects. There are some ways you can mitigate this
- Essentially every launch/leak at Facebook was because someone turned on a feature by accident when they didn't mean to. The control panel made this very easy: features defaulted to "on", and if you tried to do something common like remove yourself from the test group for a feature you could easily launch it to the whole world. Design the UI defensively so that it is hard to turn features on to everyone and/or obvious when a feature is launching and this shouldn't be a problem.
- The rest were through CSS or JS changes that mentioned new features being shipped to the client as part of static resource packaging or because the code was just added to existing files. If this is a risk you're concerned about, consider integration with static resource management.
In general, you can start with a very simple system and expand it as it makes
sense. Even a simple system can let you move away from feature branches.