Content-Type: text/plain

Any repository that's been worked on for a long enough time by enough developers tends to accumulate stale branches — fixes that were never deployed, features that were scrapped, ideas that never came to fruition, and so on.

It can be tempting to just nuke them all, but there's a chance that someone on the team will realise sooner or later that they really needed some old branch, for whatever reason that may be.

In this post we will look at how to combine git format-patch with a little bit of shell script to archive all branches in a way that's easy to restore, preserving both the diff and metadata associated with the commits.

#git format-patch

The git format-patch command is ideal as it creates a patch file for each commit with the diff and the metadata associated with the commit object like the author, date, and message.

Here's an example of what it looks like (with the diff omitted).

From e29bf4f0b9ea54b1b2072015cfd721ae119f3198 Mon Sep 17 00:00:00 2001
From: crdx <...>
Date: Thu, 25 Jan 2023 20:40:10 +0000
Subject: [PATCH] foo

---
 foo.md | 71 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 foo

This command is used indirectly in mailing list workflows to generate emails containing patches, which are sent to maintainers who apply them on the other end using git am. This is exactly how archived branches can be restored later.

cat 0001-foo.patch | git am -3

#List the branches

Let's start by getting a list of all remote branch names, excluding the default branch and any leading whitespace.

git branch -r | grep -v '/main$' | sed 's/^\s*//'

(Change main1 if your default branch is instead called master or something else.)

You should carry out a visual check to make sure this list contains only the branches that should be archived. The grep -v '/main$' pipeline may match legitimate branch names in some cases, and depending on your workflow it's more than likely that there are some in that list that represent in-progress work that do not need to be archived. If that is the case then add some more grep -v calls to the pipeline to strip them out.

#Archive them all

Next we'll run git format-patch against each of these remote branches, creating a .patch file for each commit on each branch. If a branch comprises multiple commits, then multiple patch files will be created, so ideally each branch's patches should be placed in a directory named after the branch.

#!/bin/bash
set -euo pipefail

mkdir -p archive

git branch -r | grep -v '/main$' | sed 's/^\s*//' | while read -r REF; do
    DIR="archive/$(echo "$REF" | cut -d/ -f2-)"
    mkdir -p "$DIR"
    git format-patch "main..$REF" -o "$DIR"
done

(If you're already using archive/ for something else, make sure to change it.)

The branch name with the remote prefix removed (e.g., origin/foo becomes foo) is used as the output directory to git format-patch. It's important to mkdir -p this directory first as branch names can contain forward slashes, and these will be mapped directly to the filesystem structure. This is how git stores branches within the .git folder already so it's safe to rely on that behaviour here, too.

Now archive that directory and move it somewhere else for safekeeping, so that when a member of the team inevitably realises they needed something from an old branch they'll be able to access and re-apply the patches with git am.

tar czf archive.tgz archive/

#Delete them all?

Since there are probably active branches in the repository that should not be deleted I recommend exercising caution before scripting deletion of the remote branches.

It might be safer to take the list and convert them into a sequence of calls to git push origin --delete $BRANCH that you can execute manually.

That said, automating it would just mean adapting the loop above, so go ahead if you're sure.

#Clean up local copies

Each developer will have local copies of the remote branches. These can be cleaned up by running the following command which instructs git to delete the local reference to the non-existent remote branches. This will not delete a local checkout of a remote branch.

git remote prune origin

  1. The default branch has been main since git 2.28.