From 0f1c9d8ddd4b11e9cf92fb5284515738ac5f113e Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Wed, 17 Jul 2024 11:35:26 -0700 Subject: [PATCH] filter-repo: change the --replace-refs default to 'update-no-add' I have been meaning to make this change for a couple years. In addition to the replace-refs issues I knew about when I made filter-repo start writing replace refs, namely: * Many refs might slow down git operations * replace refs are not pushed or pulled by default, making them a bit harder for users to take advantage of * libgit2 and jgit do not support replace refs * the the various Code Hosting sites (Gerrit, GitHub, GitLab) do not support replace refs in their UI I have found that the negatives about having replace-refs turned on by default have seemed to be bigger than I realized, and the positives for it have turned out to be much less than I imagined. Some of the additional factors are: * commit-graphs disable themselves if there are _any_ replace refs (it wouldn't need to for the kind we write, but there is no more specialized logic in place) * Git itself still has buggy handling of replace refs (as per https://lore.kernel.org/git/CABPp-BEAbN05+hCtK=xhGg5uZFqbUvH9hMcCNMcBWp5JWLqzPw@mail.gmail.com/ ) * users continue to assume replace refs do things they don't in new and interesting ways contrary to any and all documentation. I've been surprised a few times by new powers users can dream and assert they have. * my big monorepo rewrite at ${FORMER_EMPLOYER} shrunk the repository dramatically, as desired, when we finally got to approval to do it after years of talking about it. But despite years of pings about making sure we had a way to refer to new commits with old commit IDs as part of that conversion, and providing that way, it was used dramatically less than I had expected. I might have even gotten entirely away without providing such a mapping, though I'm happy that we had it and I was able to proactively provide it to the users and comfort their fears about the transition. Now, all this said, I still think that being able to create replace refs as a mechanism for users to refer to new commits using old commit IDs is a great ability. However, I now think it should be off by default, which is achieved with the 'update-no-add' setting. In addition to changing the default in the code, also update the documentation and the built-in help, and add a few new testcases to make sure everything is well tested. Signed-off-by: Elijah Newren --- Documentation/git-filter-repo.txt | 89 +++++++++++++++++-------------- git-filter-repo | 37 ++++++++----- t/t9390-filter-repo.sh | 18 ++++++- 3 files changed, 91 insertions(+), 53 deletions(-) diff --git a/Documentation/git-filter-repo.txt b/Documentation/git-filter-repo.txt index bb00203f..9a9af9ae 100644 --- a/Documentation/git-filter-repo.txt +++ b/Documentation/git-filter-repo.txt @@ -54,13 +54,16 @@ can be overridden, but they are all on by default): * pruning commits which become empty due to the above filters (also handles edge cases like pruning of merge commits which become degenerate and empty) + * stripping of original history to avoid mixing old and new history + * repacking the repository post-rewrite to shrink the repo for the + user + +And additional facilities are available via a config option + * creating replace-refs (see linkgit:git-replace[1]) for old commit hashes, which if manually pushed and fetched will allow users to continue to refer to new commits using (unabbreviated) old commit IDs - * stripping of original history to avoid mixing old and new history - * repacking the repository post-rewrite to shrink the repo for the - user Also, it's worth noting that there is an important safety mechanism: @@ -218,17 +221,26 @@ Filtering of names & emails (see also --name-callback and --email-callback) Parent rewriting ~~~~~~~~~~~~~~~~ ---replace-refs {delete-no-add, delete-and-add, update-no-add, update-or-add, update-and-add}:: - Replace refs (see linkgit:git-replace[1]) are used to rewrite - parents (unless turned off by the usual git mechanism); this - flag specifies what do do with those refs afterward. Replace - refs can either be deleted or updated to point at new commit - hashes. Also, new replace refs can be added for each commit - rewrite. With 'update-or-add', new replace refs are only - added for commit rewrites that aren't used to update an - existing replace ref. default is 'update-and-add' if - $GIT_DIR/filter-repo/already_ran does not exist; - 'update-or-add' otherwise. +--replace-refs {delete-no-add, delete-and-add, update-no-add, update-or-add, update-and-add, old-default}:: + How to handle replace refs (see git-replace(1)). Replace refs + can be added during the history rewrite as a way to allow + users to pass old commit IDs (from before git-filter-repo was + run) to git commands and have git know how to translate those + old commit IDs to the new (post-rewrite) commit IDs. Also, + replace refs that existed before the rewrite can either be + deleted or updated. The choices to pass to --replace-refs + thus need to specify both what to do with existing refs and + what to do with commit rewrites. Thus 'update-and-add' means + to update existing replace refs, and for any commit rewrite + (even if already pointed at by a replace ref) add a new + refs/replace/ reference to map from the old commit ID to the + new commit ID. The default is update-no-add, meaning update + existing replace refs but do not add any new ones. There is + also a special 'old-default' option for picking the default + used in versions prior to git-filter-repo-2.45, namely + 'update-and-add' upon the first run of git-filter-repo in a + repository and 'update-or-add' if running git-filter-repo + again on a repository. --prune-empty {always, auto, never}:: Whether to prune empty commits. 'auto' (the default) means @@ -288,10 +300,10 @@ Generic callback code snippets Location to filter from/to ~~~~~~~~~~~~~~~~~~~~~~~~~~ -NOTE: Specifying alternate source or target locations implies --partial -except that the normal default for --replace-refs is used. However, unlike -normal uses of --partial, this doesn't risk mixing old and new history -since the old and new histories are in different repositories. +NOTE: Specifying alternate source or target locations implies +--partial. However, unlike normal uses of --partial, this doesn't +risk mixing old and new history since the old and new histories are in +different repositories. --source :: Git repository to read from @@ -317,8 +329,7 @@ Miscellaneous options --partial:: Do a partial history rewrite, resulting in the mixture of old and - new history. This implies a default of update-no-add for - --replace-refs, disables rewriting refs/remotes/origin/* to + new history. This disables rewriting refs/remotes/origin/* to refs/heads/*, disables removing of the 'origin' remote, disables removing unexported refs, disables expiring the reflog, and disables the automatic post-filter gc. Also, this modifies @@ -578,25 +589,25 @@ history rewrite are roughly as follows: 6. (Optional) Some additional considerations - * filter-repo by default creates replace refs (see - linkgit:git-replace[1]) for each rewritten commit ID, allowing - you to use old (unabbreviated) commit hashes in the git command - line to refer to the newly rewritten commits. If you want to use - these replace refs, manually push them to the relevant clone URL - and tell users to manually fetch them (e.g. by adjusting their - fetch refspec, `git config --add remote.origin.fetch - +refs/replace/*:refs/replace/*`). Sadly, replace refs are not - yet widely understood; projects like jgit and libgit2 do not - support them and existing repository managers (e.g. Gerrit, - GitHub, GitLab) do not yet understand replace refs. Thus one - can't use old commit hashes within the UI of these other systems. - This may change in the future, but replace refs at least help - users locally within the git command line interface. Also, be - aware that commit-graphs are excessively cautious around replace - refs and just turn off entirely if any are present, so after - enough time has passed that old commit IDs become less relevant, - users may want to locally delete the replace refs to regain the - speedups from commit-graphs. + * filter-repo has a --replace-refs option to allow creating replace + refs (see linkgit:git-replace[1]) for each rewritten commit ID, + allowing you to use old (unabbreviated) commit hashes in the git + command line to refer to the newly rewritten commits. If you + want to use these replace refs, manually push them to the + relevant clone URL and tell users to manually fetch them (e.g. by + adjusting their fetch refspec, `git config --add + remote.origin.fetch +refs/replace/*:refs/replace/*`). Sadly, + replace refs are not yet widely understood; projects like jgit + and libgit2 do not support them and existing repository managers + (e.g. Gerrit, GitHub, GitLab) do not yet understand replace refs. + Thus one can't use old commit hashes within the UI of these other + systems. This may change in the future, but replace refs at + least help users locally within the git command line interface. + Also, be aware that commit-graphs are excessively cautious around + replace refs and just turn off entirely if any are present, so + after enough time has passed that old commit IDs become less + relevant, users may want to locally delete the replace refs to + regain the speedups from commit-graphs. * If you have a central repo, you may want to prevent people from pushing old commit IDs, in order to avoid mixing old diff --git a/git-filter-repo b/git-filter-repo index 3dfe5569..eb517929 100755 --- a/git-filter-repo +++ b/git-filter-repo @@ -1896,17 +1896,27 @@ EXAMPLES parents.add_argument('--replace-refs', default=None, choices=['delete-no-add', 'delete-and-add', 'update-no-add', 'update-or-add', - 'update-and-add'], - help=_("Replace refs (see git-replace(1)) are used to rewrite " - "parents (unless turned off by the usual git mechanism); this " - "flag specifies what do do with those refs afterward. " - "Replace refs can either be deleted or updated to point at new " - "commit hashes. Also, new replace refs can be added for each " - "commit rewrite. With 'update-or-add', new replace refs are " - "only added for commit rewrites that aren't used to update an " - "existing replace ref. default is 'update-and-add' if " - "$GIT_DIR/filter-repo/already_ran does not exist; " - "'update-or-add' otherwise.")) + 'update-and-add', 'old-default'], + help=_("How to handle replace refs (see git-replace(1)). Replace " + "refs can be added during the history rewrite as a way to " + "allow users to pass old commit IDs (from before " + "git-filter-repo was run) to git commands and have git know " + "how to translate those old commit IDs to the new " + "(post-rewrite) commit IDs. Also, replace refs that existed " + "before the rewrite can either be deleted or updated. The " + "choices to pass to --replace-refs thus need to specify both " + "what to do with existing refs and what to do with commit " + "rewrites. Thus 'update-and-add' means to update existing " + "replace refs, and for any commit rewrite (even if already " + "pointed at by a replace ref) add a new refs/replace/ reference " + "to map from the old commit ID to the new commit ID. The " + "default is update-no-add, meaning update existing replace refs " + "but do not add any new ones. There is also a special " + "'old-default' option for picking the default used in versions " + "prior to git-filter-repo-2.45, namely 'update-and-add' upon " + "the first run of git-filter-repo in a repository and " + "'update-or-add' if running git-filter-repo again on a " + "repository.")) parents.add_argument('--prune-empty', default='auto', choices=['always', 'auto', 'never'], help=_("Whether to prune empty commits. 'auto' (the default) means " @@ -1988,8 +1998,7 @@ EXAMPLES "so be cautious about using this flag.")) misc.add_argument('--partial', action='store_true', help=_("Do a partial history rewrite, resulting in the mixture of " - "old and new history. This implies a default of " - "update-no-add for --replace-refs, disables rewriting " + "old and new history. This disables rewriting " "refs/remotes/origin/* to refs/heads/*, disables removing " "of the 'origin' remote, disables removing unexported refs, " "disables expiring the reflog, and disables the automatic " @@ -2905,6 +2914,8 @@ class RepoFilter(object): # Default for --replace-refs if not self._args.replace_refs: + self._args.replace_refs = 'update-no-add' + if self._args.replace_refs == 'old-default': self._args.replace_refs = ('update-or-add' if already_ran else 'update-and-add') diff --git a/t/t9390-filter-repo.sh b/t/t9390-filter-repo.sh index 85f155a6..763517d8 100755 --- a/t/t9390-filter-repo.sh +++ b/t/t9390-filter-repo.sh @@ -545,6 +545,22 @@ test_expect_success FUNNYNAMES 'creation/deletion/updating of replace refs' ' echo "$(git rev-parse master~1) refs/replace/$master_2" >>out && sort -k 2 out >expect && git show-ref | grep refs/replace/ >output && + test_cmp output expect && + + rsync -a --delete ../replace_handling/ ./ && + git filter-repo --replace-refs old-default --path-rename numbers:counting && + echo "$(git rev-parse master) refs/replace/$master" >>out && + echo "$(git rev-parse master~1) refs/replace/$master_1" >>out && + echo "$(git rev-parse master~1) refs/replace/$master_2" >>out && + sort -k 2 out >expect && + git show-ref | grep refs/replace/ >output && + test_cmp output expect && + + # Test the default + rsync -a --delete ../replace_handling/ ./ && + git filter-repo --path-rename numbers:counting && + echo "$(git rev-parse master~1) refs/replace/$master_1" >expect && + git show-ref | grep refs/replace/ >output && test_cmp output expect ) ' @@ -1621,7 +1637,7 @@ test_expect_success 'handle funny characters' ' file_sha=$(git rev-parse :0:señor) && former_head_sha=$(git rev-parse HEAD) && - git filter-repo --to-subdirectory-filter títulos && + git filter-repo --replace-refs old-default --to-subdirectory-filter títulos && cat <<-EOF >expect && 100644 $file_sha 0 "t\303\255tulos/se\303\261or"