Using shallow, bare, mirror git repos

I build sites atop Drupal's git repo. I want a central repo for my code, which will be used across several servers. Each server has several sites and I want to use git's hard linking support to save disc space locally.

Visual representation of what's described in the body text.

The set up has the following features

  1. central repo: Full source repository, my branches A, B, C, D and all Drupal's (which is immense).
  2. localcentral: Each server takes a local bare, and shallow clone of the branches required on that server, e.g. A and B only. This saves disk space by only keeping the last 200 or so commits from each of my project branches.
  3. localtree: local working tree, clone of one branch from localcentral repo.

Set up the bare, shallow clones

# Make initial pretty empty clone
git clone central:path/to/repo --bare --depth 1 localcentral

# Now add the branches we need by:
git config --add remote.origin.fetch '+refs/heads/branchA:refs/heads/branchA'
git config --add remote.origin.fetch '+refs/heads/branchB:refs/heads/branchB'

# Now fetch these branches to required depth
git fetch --depth 20

# While we fetched the branches with the above, we did not set these
# up as remote tracking branches. We need this so that we can update
# upstream with a git push. So fix that now by specifying what
# is "upstream" of each branch:
git branch -u refs/heads/branchA branchA
git branch -u refs/heads/branchB branchB

So far we've got the bare, shallow clone made. It's small and it has the branches we wanted in it. When we need to, we can update this repo from upstream by issuing git fetch --depth 20 from within the repo. Note: if you forget the depth limit, you'll end up with a non-shallow clone. There does not appear to be a way to put this in the confg file (please let me know in the comments if so).

Important to understand the +refs/heads/branchA:refs/heads/branchA bits: The plus sign means "take whatever the remote head is for this branch, and make our branch match that". It's like a forced update. This is what we want for a tightly bound mirror set-up. Usually the second part of this puts the fetched heads in a separate remotes branch, but here we have it overwriting our own branch tips. More on this, and why we didn't use git clone --mirror, later.

Check out a working tree

This is your run-of-the-mill stuff: git clone /path/to/localcentral -b branchA my-branchA-project-dir note that we do not need to specify depth here. There are two reasons for this: first, there's no need because we're cloning from an already-shallow repo.; second, because our cloned git files will be hard-linked so won't consume disk space anyway. (This assumes your localcentral repo and the cloned ones are on the same device.)

Chaining the  push up through the repos

When a commit is pushed from the local tree up to the localcentral repo, it needs to not stop there; it must go further up to get to the central repo. To do this we use a post-receive hook in the localcentral repo. Create the following script in localrepo/hooks/post-receive

#!/bin/bash
echo "Pushing changes to central repo..."
git push
echo "...done"

Now whenever a push is received by the localcentral repo, it will push it upstream, too.

But wait! What happens if the upstream repo gets an update that we don't have?

Central:   *-----*----A-----???
LocalCentral      \-----------B
Working Tree       \-----B---/

Well now we'd get in a pickle. First, our hook would report a rejection because the push from localcentral → central would fail, with a helpful hint about "updates rejected because the remote contains work that you do not have". Good. But not good because this was a post receive hook, so our localcentral repo is now contaminated with an illegal commit!

Also, because this is a bare repo that we set up to mirror certain branches, when we (or someone else) run fetch in the localcentral repo, this commit B will get squished and lost.

So to prevent this, we need to make the localcentral repo first bring itself inline with its upstream origin, central, before it even considers accepting a commit. There's a hook for that and unsurprisingly it's called pre-receive:

#!/bin/bash
echo "localcentral: fetching from upstream central before accepting push..."
git fetch --depth 20 || exit 1
echo "...done"

So now, when the working tree pushes something to localcentral, localcentral fetches from central (maintaining its shallowness), then decides whether to accept the push, then after accepting a push, it sends that commit up to central.

Upstream changes

So localcentral is supposed to be tightly synced (sunk?) to central. We don't want central to get ahead of the localcentral repos. So we need to regularly run git fetch --depth 20 The only way I can think to do this at the mo is with a cron job, although it could be possible to do it with a post-receive hook at central. Then just merge into working tree as normal.

It's like a mirror, but a mirror is a lot more dangerous

localcentral looks a lot like a mirror, but it's very definitely not. A mirrored repo is one that considers all of its refs to be ok to overwrite the origin's on a push. Now as we've only checked out a few refs (and not even the remote tags because of the shallow clone), if it was configured with core.mirror=true, a push from a localcentral repo would delete all the branches in the origin, except the ones that were pushed(!)

That's how I do it. If anyone knows a better way, or any other pitfalls, please leave a comment.

 

Tags: 

Add new comment