Skip to content

Commit

Permalink
filter-repo: change the --replace-refs default to 'update-no-add'
Browse files Browse the repository at this point in the history
I have been meaning to make this change for a couple years.  In addition
to the replace-refs issues I knew about when I made filter-repo start
writing replace refs, namely:

  * Many refs might slow down git operations
  * replace refs are not pushed or pulled by default, making them a bit
    harder for users to take advantage of
  * libgit2 and jgit do not support replace refs
  * the the various Code Hosting sites (Gerrit, GitHub, GitLab) do not
    support replace refs in their UI

I have found that the negatives about having replace-refs turned on by
default have seemed to be bigger than I realized, and the positives for
it have turned out to be much less than I imagined.  Some of the
additional factors are:

  * commit-graphs disable themselves if there are _any_ replace refs (it
    wouldn't need to for the kind we write, but there is no more
    specialized logic in place)
  * Git itself still has buggy handling of replace refs (as per
    https://lore.kernel.org/git/CABPp-BEAbN05+hCtK=xhGg5uZFqbUvH9hMcCNMcBWp5JWLqzPw@mail.gmail.com/
    )
  * users continue to assume replace refs do things they don't in new
    and interesting ways contrary to any and all documentation.  I've
    been surprised a few times by new powers users can dream and assert
    they have.
  * my big monorepo rewrite at ${FORMER_EMPLOYER} shrunk the repository
    dramatically, as desired, when we finally got to approval to do it
    after years of talking about it.  But despite years of pings about
    making sure we had a way to refer to new commits with old commit IDs
    as part of that conversion, and providing that way, it was used
    dramatically less than I had expected.  I might have even gotten
    entirely away without providing such a mapping, though I'm happy
    that we had it and I was able to proactively provide it to the
    users and comfort their fears about the transition.

Now, all this said, I still think that being able to create replace refs
as a mechanism for users to refer to new commits using old commit IDs is
a great ability.  However, I now think it should be off by default,
which is achieved with the 'update-no-add' setting.

In addition to changing the default in the code, also update the
documentation and the built-in help, and add a few new testcases to
make sure everything is well tested.

Signed-off-by: Elijah Newren <[email protected]>
  • Loading branch information
newren committed Jul 17, 2024
1 parent 9784cbc commit 0f1c9d8
Show file tree
Hide file tree
Showing 3 changed files with 91 additions and 53 deletions.
89 changes: 50 additions & 39 deletions Documentation/git-filter-repo.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,16 @@ can be overridden, but they are all on by default):
* pruning commits which become empty due to the above filters (also
handles edge cases like pruning of merge commits which become
degenerate and empty)
* stripping of original history to avoid mixing old and new history
* repacking the repository post-rewrite to shrink the repo for the
user

And additional facilities are available via a config option

* creating replace-refs (see linkgit:git-replace[1]) for old commit
hashes, which if manually pushed and fetched will allow users to
continue to refer to new commits using (unabbreviated) old commit
IDs
* stripping of original history to avoid mixing old and new history
* repacking the repository post-rewrite to shrink the repo for the
user

Also, it's worth noting that there is an important safety mechanism:

Expand Down Expand Up @@ -218,17 +221,26 @@ Filtering of names & emails (see also --name-callback and --email-callback)
Parent rewriting
~~~~~~~~~~~~~~~~

--replace-refs {delete-no-add, delete-and-add, update-no-add, update-or-add, update-and-add}::
Replace refs (see linkgit:git-replace[1]) are used to rewrite
parents (unless turned off by the usual git mechanism); this
flag specifies what do do with those refs afterward. Replace
refs can either be deleted or updated to point at new commit
hashes. Also, new replace refs can be added for each commit
rewrite. With 'update-or-add', new replace refs are only
added for commit rewrites that aren't used to update an
existing replace ref. default is 'update-and-add' if
$GIT_DIR/filter-repo/already_ran does not exist;
'update-or-add' otherwise.
--replace-refs {delete-no-add, delete-and-add, update-no-add, update-or-add, update-and-add, old-default}::
How to handle replace refs (see git-replace(1)). Replace refs
can be added during the history rewrite as a way to allow
users to pass old commit IDs (from before git-filter-repo was
run) to git commands and have git know how to translate those
old commit IDs to the new (post-rewrite) commit IDs. Also,
replace refs that existed before the rewrite can either be
deleted or updated. The choices to pass to --replace-refs
thus need to specify both what to do with existing refs and
what to do with commit rewrites. Thus 'update-and-add' means
to update existing replace refs, and for any commit rewrite
(even if already pointed at by a replace ref) add a new
refs/replace/ reference to map from the old commit ID to the
new commit ID. The default is update-no-add, meaning update
existing replace refs but do not add any new ones. There is
also a special 'old-default' option for picking the default
used in versions prior to git-filter-repo-2.45, namely
'update-and-add' upon the first run of git-filter-repo in a
repository and 'update-or-add' if running git-filter-repo
again on a repository.

--prune-empty {always, auto, never}::
Whether to prune empty commits. 'auto' (the default) means
Expand Down Expand Up @@ -288,10 +300,10 @@ Generic callback code snippets
Location to filter from/to
~~~~~~~~~~~~~~~~~~~~~~~~~~

NOTE: Specifying alternate source or target locations implies --partial
except that the normal default for --replace-refs is used. However, unlike
normal uses of --partial, this doesn't risk mixing old and new history
since the old and new histories are in different repositories.
NOTE: Specifying alternate source or target locations implies
--partial. However, unlike normal uses of --partial, this doesn't
risk mixing old and new history since the old and new histories are in
different repositories.

--source <source>::
Git repository to read from
Expand All @@ -317,8 +329,7 @@ Miscellaneous options

--partial::
Do a partial history rewrite, resulting in the mixture of old and
new history. This implies a default of update-no-add for
--replace-refs, disables rewriting refs/remotes/origin/* to
new history. This disables rewriting refs/remotes/origin/* to
refs/heads/*, disables removing of the 'origin' remote, disables
removing unexported refs, disables expiring the reflog, and
disables the automatic post-filter gc. Also, this modifies
Expand Down Expand Up @@ -578,25 +589,25 @@ history rewrite are roughly as follows:

6. (Optional) Some additional considerations

* filter-repo by default creates replace refs (see
linkgit:git-replace[1]) for each rewritten commit ID, allowing
you to use old (unabbreviated) commit hashes in the git command
line to refer to the newly rewritten commits. If you want to use
these replace refs, manually push them to the relevant clone URL
and tell users to manually fetch them (e.g. by adjusting their
fetch refspec, `git config --add remote.origin.fetch
+refs/replace/*:refs/replace/*`). Sadly, replace refs are not
yet widely understood; projects like jgit and libgit2 do not
support them and existing repository managers (e.g. Gerrit,
GitHub, GitLab) do not yet understand replace refs. Thus one
can't use old commit hashes within the UI of these other systems.
This may change in the future, but replace refs at least help
users locally within the git command line interface. Also, be
aware that commit-graphs are excessively cautious around replace
refs and just turn off entirely if any are present, so after
enough time has passed that old commit IDs become less relevant,
users may want to locally delete the replace refs to regain the
speedups from commit-graphs.
* filter-repo has a --replace-refs option to allow creating replace
refs (see linkgit:git-replace[1]) for each rewritten commit ID,
allowing you to use old (unabbreviated) commit hashes in the git
command line to refer to the newly rewritten commits. If you
want to use these replace refs, manually push them to the
relevant clone URL and tell users to manually fetch them (e.g. by
adjusting their fetch refspec, `git config --add
remote.origin.fetch +refs/replace/*:refs/replace/*`). Sadly,
replace refs are not yet widely understood; projects like jgit
and libgit2 do not support them and existing repository managers
(e.g. Gerrit, GitHub, GitLab) do not yet understand replace refs.
Thus one can't use old commit hashes within the UI of these other
systems. This may change in the future, but replace refs at
least help users locally within the git command line interface.
Also, be aware that commit-graphs are excessively cautious around
replace refs and just turn off entirely if any are present, so
after enough time has passed that old commit IDs become less
relevant, users may want to locally delete the replace refs to
regain the speedups from commit-graphs.

* If you have a central repo, you may want to prevent people
from pushing old commit IDs, in order to avoid mixing old
Expand Down
37 changes: 24 additions & 13 deletions git-filter-repo
Original file line number Diff line number Diff line change
Expand Up @@ -1896,17 +1896,27 @@ EXAMPLES
parents.add_argument('--replace-refs', default=None,
choices=['delete-no-add', 'delete-and-add',
'update-no-add', 'update-or-add',
'update-and-add'],
help=_("Replace refs (see git-replace(1)) are used to rewrite "
"parents (unless turned off by the usual git mechanism); this "
"flag specifies what do do with those refs afterward. "
"Replace refs can either be deleted or updated to point at new "
"commit hashes. Also, new replace refs can be added for each "
"commit rewrite. With 'update-or-add', new replace refs are "
"only added for commit rewrites that aren't used to update an "
"existing replace ref. default is 'update-and-add' if "
"$GIT_DIR/filter-repo/already_ran does not exist; "
"'update-or-add' otherwise."))
'update-and-add', 'old-default'],
help=_("How to handle replace refs (see git-replace(1)). Replace "
"refs can be added during the history rewrite as a way to "
"allow users to pass old commit IDs (from before "
"git-filter-repo was run) to git commands and have git know "
"how to translate those old commit IDs to the new "
"(post-rewrite) commit IDs. Also, replace refs that existed "
"before the rewrite can either be deleted or updated. The "
"choices to pass to --replace-refs thus need to specify both "
"what to do with existing refs and what to do with commit "
"rewrites. Thus 'update-and-add' means to update existing "
"replace refs, and for any commit rewrite (even if already "
"pointed at by a replace ref) add a new refs/replace/ reference "
"to map from the old commit ID to the new commit ID. The "
"default is update-no-add, meaning update existing replace refs "
"but do not add any new ones. There is also a special "
"'old-default' option for picking the default used in versions "
"prior to git-filter-repo-2.45, namely 'update-and-add' upon "
"the first run of git-filter-repo in a repository and "
"'update-or-add' if running git-filter-repo again on a "
"repository."))
parents.add_argument('--prune-empty', default='auto',
choices=['always', 'auto', 'never'],
help=_("Whether to prune empty commits. 'auto' (the default) means "
Expand Down Expand Up @@ -1988,8 +1998,7 @@ EXAMPLES
"so be cautious about using this flag."))
misc.add_argument('--partial', action='store_true',
help=_("Do a partial history rewrite, resulting in the mixture of "
"old and new history. This implies a default of "
"update-no-add for --replace-refs, disables rewriting "
"old and new history. This disables rewriting "
"refs/remotes/origin/* to refs/heads/*, disables removing "
"of the 'origin' remote, disables removing unexported refs, "
"disables expiring the reflog, and disables the automatic "
Expand Down Expand Up @@ -2905,6 +2914,8 @@ class RepoFilter(object):

# Default for --replace-refs
if not self._args.replace_refs:
self._args.replace_refs = 'update-no-add'
if self._args.replace_refs == 'old-default':
self._args.replace_refs = ('update-or-add' if already_ran
else 'update-and-add')

Expand Down
18 changes: 17 additions & 1 deletion t/t9390-filter-repo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -545,6 +545,22 @@ test_expect_success FUNNYNAMES 'creation/deletion/updating of replace refs' '
echo "$(git rev-parse master~1) refs/replace/$master_2" >>out &&
sort -k 2 out >expect &&
git show-ref | grep refs/replace/ >output &&
test_cmp output expect &&
rsync -a --delete ../replace_handling/ ./ &&
git filter-repo --replace-refs old-default --path-rename numbers:counting &&
echo "$(git rev-parse master) refs/replace/$master" >>out &&
echo "$(git rev-parse master~1) refs/replace/$master_1" >>out &&
echo "$(git rev-parse master~1) refs/replace/$master_2" >>out &&
sort -k 2 out >expect &&
git show-ref | grep refs/replace/ >output &&
test_cmp output expect &&
# Test the default
rsync -a --delete ../replace_handling/ ./ &&
git filter-repo --path-rename numbers:counting &&
echo "$(git rev-parse master~1) refs/replace/$master_1" >expect &&
git show-ref | grep refs/replace/ >output &&
test_cmp output expect
)
'
Expand Down Expand Up @@ -1621,7 +1637,7 @@ test_expect_success 'handle funny characters' '
file_sha=$(git rev-parse :0:señor) &&
former_head_sha=$(git rev-parse HEAD) &&
git filter-repo --to-subdirectory-filter títulos &&
git filter-repo --replace-refs old-default --to-subdirectory-filter títulos &&
cat <<-EOF >expect &&
100644 $file_sha 0 "t\303\255tulos/se\303\261or"
Expand Down

0 comments on commit 0f1c9d8

Please sign in to comment.