If you're using version control you should also use more than one branch

Tuesday, December 9, 2014 - 12:19

There's some things that seem so obvious that you don't expect to have to say them. For example, I know that people coming from a background of subversion and other non-distributed VCSs tend to avoid branching unless they have to; but it's a work of a moment to say "honestly, branching in git has a tiny impact, and merging back is almost always very good indeed, in comparison."

But you never expect to have to explain to seasoned DVCS users, who know full well how to branch, why they shouldn't have a whole team working on just master. Right? I mean, branching in a DVCS is an amazing experience: once you get started on moving up and down the branch, the fetched remote commits, the reflog... it's tough, but you level up to the point where switches come on inside your head. Even if you decide that a specific standardized git workflow (like git flow) isn't quite the fit for you, why would you then come up with a branchless one of your own?

The thing is, I've heard of teams doing this twice now, with two different rationales for why branches either shouldn't or needn't be used. I want to present those arguments below, along with reasons for why I think branching is too important to sacrifice to those concerns.

The "syntactic merging" argument (i.e. you shouldn't branch)

Git merging is stupid. That's not an insult; it's meant to be stupid, as it was built that way: "very repeatable and non-clever". That means that, for example, git does not take into account the language your file is written in, and try to e.g. respect Javascript object integrity when merging. If git encounters anything it can't merge from very simple algorithms, it marks it as a conflict and leaves it for a human to make the decision.

Between what git doesn't understand (and reports) and what it does understand (and merges correctly) there is in theory a chance for git to merge, but get it wrong. And this is the nub of the "syntactic merging" argument for not branching:

At the point of re-merging a branch, git does it syntactically, not semantically: so it's dangerous to merge [and branching should therefore never be attempted.]

You can see where this comes from: when you merge two branches (your own, into master-with-new-commits), you're relying on git to know which bits of which files should and shouldn't be touched. And because git has no concept of any programming language's native file structuring, it could get them wrong. In theory, it could get them very wrong. Of course, this theoretical problem is not unique to git, or to VCSs: some merge more cleverly than others, but we all know how stupid the smartest AIs can be.

I have to say, I have never seen this happen. Or rather, I've seen git merge syntactically e.g. start a diff mid-way through a Doxygen comment, and then end it mid-way through another; but I've never seen the merge on its own have a drastic effect. When something's gone wrong, it's been a much more complex problem: often to do with team communication and large—huge—changes being done elsewhere that the current work ought to have taken account of: that is, a social problem, not a coding one.

Of course, then, that's not to say it doesn't happen, and you should certainly take pains to make sure you're not likely to be affected by it e.g:

With a large team of differing skills, recommend a senior developer make merges into master: someone who can check the diff by eye and walk the developer through its repercussions.
Proactively manage the potential for overlaping and conflicting work, and stop pretending that "agile" means "we don't need project managers".
Write tests to cover as much code/functionality as possible, using unit/behavioural testing frameworks respectively.

All of these will not just help you avoid conflicts during merging, but are also things you ought to be considering doing on your project anyway, because they're good practice. But even if you don't think these measures are enough, the "git merges are non-clever" argument is not sufficient argument to drop branches altogether. Why?

Because if you're working on master, and if someone else works on master, and you want to fetch the changes on master, you must either let git merge them, or rebase them manually. What do you think most of your dev team will do, most of the time? Even if you don't merge, you do merge: you just don't know it.

You can of course teach all your developers git rebase. I agree! developers should know how to rebase. But is rebasing really a simpler procedure, less likely to go wrong, than using branches? And so, even if you try to get people to rebase after fetch (and you can't police their own personal machines), aren't you just bringing that senior developer back in to eyeball changes—only in another guise?

The "many local masters" argument (i.e. you needn't branch)

Another team explained to me how, while they all worked on master, they were each working on their own local copies of master. Let me explain.

Only features which were signed off to QA could get to their "blessed" repository. However, as a team, they had effectively decided to outlaw git push. Instead, senior developers (see, there's no avoiding having a senior developer somewhere!) were the ones permitted to log in and pull into the "merge" repository. Everyone else could then pull those changes; indeed, were encouraged to pull very regularly from "merge".

But! how do you get such changes to other team members e.g. QA? Well, every machine (VMs, thankfully) was on a VPN, and everyone's VM had everyone else's public keys on them. And so QA could pull from your repository, into their repository, and test your edits there.

In theory, this looked like the platonic ideal of a multi-primary, maximally distributed DVCS setup, with no origins and no artificial hierarchy to weigh the team. In practice, it was difficult to leverage, and ended up more difficult, the more any two developers' work overlapped.

The biggest difficulty was that you could never be completely sure, at the point of telling QA to pull your local master, of exactly what else QA had on their local master. It might contain approved features, which hadn't yet gone through the senior developer onto merge, and hence weren't available to you at the point that you signed off. It might even contain rejected features, that they hadn't thought to roll back before pulling your changes. You literally could not say with confidence "what works in my environment will work in QA's environment when they pull the changes".

What does this signify? In essence, many local masters is still only one master. It's just that each developer carries an incomplete copy of that "ideal master" with holes in some places and extra cruft in others, right up to the point where they pull: at which time, their master tries to turn itself into the "ideal master" and lots of things can happen.

At any given point in time, each developer's computer is ignorant of changes going on elsewhere, that will at the point of pulling them suddenly be incorporated into their timeline. Who knows what problems they will cause? And because each feature's development has no start and end points in the codebase, everyone chases a moving target, with any new combinatorial bugs being their problem to fix; it's only skill and luck together (not including the occasional exceptional circumstances that people suddenly remember with an "oh, yeah, that one time") that everyone's managed to hit each target thus far.

Why you should (feel free to) branch

Here's a key tenet of working with a DVCS, especially as opposed to a VCS:

commit often; pull/push regularly

What do I mean by that? Well, with a VCS, every time you committed, your commits were visible to other people. Even working on branches, you might have a co-developer working on the same branch as you: if your commits break the head of your shared branch, then it interrupts both of your work until it's fixed. So you're loath to commit: storing up just enough changes that it would be a disaster if you lost them all to a local disk failure before doing so.

With a DVCS, you can gleefully commit as granularly as your local work permits or requires: keep on stacking up commits. Then, when you're happy to make them public (and maybe after tidying them up with a rebase), you can pull any changes (on your branch) and then push those commits, and you're now safe

However, if everyone's working on the same branch, committing often and pulling regularly very rapidly makes a mess of your own local history. Here's an example.

What happens when you do/don't branch

A few practical examples might help convince you. Let's say a team of three developers (A, B and C) follow a git branching model: git flow is the one I'm used to, but it needn't be so. Here's what a pseudo-log of commits on the master branch can easily look like:

31 May (A): bugfix B1 released
24 May (C): feature F2 released
11 May (B): feature F1 released
1 May (A): first release

That's the ideal scenario: obviously it can get more complicated, but it needn't. And if people are doing a lot of heavy merging, then sometimes merge commits can leak through. But with a well-disciplined team, they needn't. Also, this doesn't detail all the commits on other branches, which will be numerous: but then, as a historical record, it needn't do so. That's not its purpose.

Compare this to what happens when the team doesn't branch, or at any rate are discouraged from branching, meaning that:

Potentially, whenever you pull changes from master, you get a local merge commit
Because you need to pull changes from master regularly, they can often interfere with your ongoing work, meaning you can no longer rebase.

Here's the pseudo-log now:

31 May (A): bugfix B1 behaves slightly differently after F2 and so more work is required; released
28 May (A): merged feature F2 into A's master
24 May (C): feature F2 finished and released
17 May (A): possible bug B1 spotted and work done on master
12 May (C): merged feature F1 into C's master
11 May (B): feature F1 released
5 May (C): first part of F2 committed
1 May (A): first release

Commits in bold are new (to the public record, anyway; before they could have been rebased away locally.) As you can see, local merge commits are now generated. In addition, the merging of released features into local masters makes local rebasing prior to release impossible or at any rate very difficult without forcing changes up to the origin (and hence disrupting the whole team who would have to fetch those changes.)

Finally, compare this to what happens when you use a multi-distributed pull workflow, with a new QA team member D, meaning that:

D will be merging code from different developers who are at different points on their own masters
D might not remember to back out changes before QAing another developer.

Here's an idea of how that pseudo-log will now look:

10 June (D): bugfix B1 released
24 May (D): feature F2 finished and released
23 May (D): merge commit on pulling changes from C on feature F2 to QA them: approved.
17 May (D): merge commit on pulling changes from A on bugfix B1 to QA them: rejected (C's code breaks them).
16 May (D): merge commit on pulling changes from C on feature F2 to QA them; rejected.
12 May (C): merged feature F1 into local master
11 May (D): feature F1 released
10 May (D): merge commit on pulling changes from B to QA them; approved.
5 May (C): first part of F2 committed
1 May (A): first release

Changes again are in bold.

You might disagree with some of the details, but I hope you appreciate where the extra complexity in the logs is coming from. You'll also note that bugfix B1 was released late, because on 30 May A's hard drive died, and because he had no remote repository to safely push to (only D can pull into merge), he lost all his changes! At least he avoided that conflict with feature F2....

Summary: time travel is less useful than many alternative presents

When people try to sell the benefits of version control, they often present it as a way of being able to time-travel. I've used this metaphor myself, in my Basic Git talk to Torchbox. If you find yourself in a present you don't like (you deleted a file three years ago that now you really need), you can travel back to a past you prefer (four years ago) and at least work out what went wrong. But travelling in time, purely to observe, is only of limited value, and this way of seeing version control makes your usage of it it no smarter than incremental backups.

I would argue instead that version control's strength comes from branching. Branching is the way that you truly decouple one developer's work from another: however temporarily. And maybe, yes, sometimes that decoupling means you have to pay in time and effort later, to re-couple their separated work back together and resolve the conflicts. But it's only really through branching, that version control can be elevated above comparisons with quite trivial backup strategies.

Ultimately, the purpose of modern version control systems, especially of distributed ones, is less as a method of time travel, and more as a way of managing multiple independent and often conflicting present times. If you watched the slides of my talk linked above, then maybe I misled you about what a branching version control work flow permits you to do: forget Twelve Monkeys; forget even Back to the Future. Branched workflows permit you to harness the complexity of Primer: safer than that presented in the film itself, but no less powerful.

And ultimately, if you don't branch, you're not version-controlling. You're just making very complicated backups, with an untidy log.

I'm a Drupal Association member!

My individual membership of the Drupal Association

Drupal 8 API tutorials

Want to learn about the Drupal 8 APIs, with worked examples? Follow my series of tutorials, covering routing, caching, entities, config and much more!

Want to hire me?

I'm currently

fully booked

Blog category:

culture, paradigms

Recent blogposts

Altering the length of a Drupal 8 text field that contains data

Friday, July 21, 2017 - 11:31
A menagerie of testing: behavioural, unit, system, smoke, regression, oh my!

Friday, June 2, 2017 - 10:11
Including Javascript in Behat tests, all inside a headless, virtual machine

Tuesday, May 30, 2017 - 16:51

All blogposts

About me

I'm J-P Stacey, and I'm a freelance technical developer and software architect, working with Drupal, Javascript, Symfony, PHP and devops, with experience in project and process management and an emphasis on usability.

I live in the UK; my website is self-hosted on bigv.io; my email is hosted by Google, and that's also what I use to share files. (More info|What is this?)

Secondary menu