Git merge vs. Git rebase

One should always prefer rebase instead of a merge, because it gives you a clean, linear history graph.

But you should NOT perform a rebase if somebody is following a branch that you are about to rebase. This is becasue rebase operation rewrites history by changing the parent pointers of commits, and this can wreak havoc on somebody's local branches who are pulling changes from your branches.

Rebase is perfectly safe for your local branches, because no one knows anything about your local branches.

To make your existing branches use this facility autmatically, you can use this Git command:

git config branch..rebase true

Alternatively, you can open a project's .git/config file and append the 'rebase = yes' line to every branch, like so:

[branch "Daas_1.1.1"]
    remote = origin
    merge = refs/heads/Daas_1.1.1
    rebase = yes

Editing .git/config file may be preferable when you have a lot of branches that follow the remote branches. This option only applies to branches that have the 'merge = ' attribute set, because those are the only branches that Git will try to perform a 'merge' on when you perform a 'git pull'

To enable this attribute for any new branches that you may create in a repository, you can use this command when inside that repository.

git config branch.autosetuprebase always

And if you want this attribute to be set for every new branch in every Git repository on this machine, the add the --global flag (this actually sets the attribute in your $HOME/.gitconfig file)

git config --global branch.autosetuprebase always

What follows is an explanation of difference between merge and rebase using ASCII art. I am not showing any Git commands to keep it clean. For a better and graphical representation, including the Git commands to perform actionas, use the PDF file here: https://github.com/downloads/stevenharman/git-workflows/git-workflow-with-notes.pdf


Let's start with this branch. It has 3 commits A,B and C. This is what is visible to both the developers Dev1 and Dev2, since this is what is in the remote repository.

A ----> B ----> C

Dev1 starts working and performs a commit to the local repository. This is what Dev1's local repository looks like. Remember Dev1 hasn't pushed any commits to the remote repository yet.
                 
A ----> B ----> C ----> C`

Now Dev2 performs a commit to her repository.  This is what Dev2's local repository looks like. Remember Dev2 hasn't pushed any commits to the remote repository yet.

A ----> B ----> C ----> C``


If we hypothetically combine the two local repositories, this is what it'd look like. Even though none of the two developers created any branches, Git treats any offshoot of a commit to be a branch. So these are two branches, because they share a common parent (commit C) and yet their contents are different. Remember, this is a hypothetical combination of two local repositories, and Git on each machine knows nothing about the commits on the other machine.

                  ----> C`
                 /
A ----> B ----> C
                 \
                  --------> C``

At this point Dev2 decides to push her commit to the remote repository. So this is what the *remote* repository will look like, before and after the push:

Before:
A ----> B ----> C

After:
A ----> B ----> C ----> C``

Dev1 continues to work on his local repository and performs another commit. Here's his *local* repository. Remember, he hasn't performed any 'pull' operations since he started his work on commit C`.


A ----> B ----> C  ----> C` ----> C```

Now if he performs a 'fetch' operation, this is what he'd see in his local repository:

Note: By default, 'pull' == ('fetch' + 'merge')

                  ---> C` --------> C```
                 /
A ----> B ----> C --------> C``


And now Dev1 decides to perform a 'merge' operation. Git will try to merge the changes done on the two branches, and if it finds any merge conflicts, it will wait for the user to resolve those conflicts before it performs commit. If no conflicts were found, Git performs a commit automatically.

D => The commit that represents a merge of two branches

                  ---> C` --------> C```
                 /                   \
A ----> B ----> C --------> C`` ------ D --->


This is called a merge bubble. The local branch can now be pushed to the remote repository, and this bubble will persist in the history forever.

Now lets assume that Dev1 decided to avoid the merge bubble, here's what he would do. Lets start after the 'fetch' operation.

                  ---> C` --------> C```
                 /
A ----> B ----> C --------> C``


Dev1 performs a 'rebase' so that C` parent commit is changed from C to C``. Even in the case of a rebase, Git has to perform a merge of changes, to make sure that the two branches did not modify the same code in different ways. If there's a conflict, Git will prompt you to resolve it before it performs a commit.

                          ----> C` ----> C```
                         /
A ----> B ----> C ----> C``


Now the local branch can be pushed to the reomte repository, and we end up with a nice and linear history. Do note that the C``` commit in this case has the same contents as the D commit in the 'merge' case.

A ----> B ----> C ----> C`` ----> C` ----> C```

2 comments:

  1. Surely the info about rebasing being harmful if you've pushed a branch isn't relevant in this situation? As if you tried to push when there are remote changes, the pull would fail, so you'd have to do a pull first, which would then rebase within your private repo?

    Also: "git config branch..rebase true" should be "git config branch.[BRANCH_NAME].rebase true" - think your brackets got HTMLised!

    ReplyDelete