[DONOTMERGE] [repo_update] Attempt at making the script re-entrant without syncing inbetween #128

MarijnS95 · 2020-05-29T14:07:36Z

repo_update has never been re-entrant without syncing prior to running, making SKIP_SYNC=TRUE largely useless. This is annoying while developing and shoving new patch applicators in here.
With more improvements to come, let's get this cleanup in first.

Tested with and without the rev-parse check, both apply cleanly. The former properly takes all commits as they are on my machine while the latter also fetches everything from the server appropriately.

ix5 · 2020-05-29T14:37:50Z

repo_update.sh

@@ -22,6 +22,9 @@ popd () {
 enter_aosp_dir() {
    LINK="$HTTP://android.googlesource.com/platform/${2:-$1}"
    pushd "$ANDROOT/$1"
+    # Make the script re-entrant when SKIP_SYNC=TRUE by ensuring
+    # the current tree is clean without any of our picks/reverts:
+    repo sync -l .


I absolutely hate this. We should check whether the commit has already been applied instead instead of force-resetting repos.
Someone could cherry-pick other changes before or after running this script.

When someone cherry-picks / reverts after this script has run, everything should be fine, or not?

To make the reentrant "smarter" introduce a dedicated function that is doing the git revert, if skipsync then only do the revert if the commit has not yet been reverted. Also apply_gerrit_cl_commit has to be changed, in case of skipsync and changeid is already available don't do anything.

I kindof agree. The intent is to have this behind a flag but optarg(s) is too much magic for me.

With many ideas in mind to prevent re-picking/reverting stuff, hence averting lots of local trouble, see the messages below I was writing while hitting roadblocks, which have all been resolved as of this last force-push.

🙈
I assumed reverts are not an issue but with reverts that touch the same lines (in audio) these simply generate conflicts. In any case, see [my attempt](https://github.com/sonyxperiadev/repo_update/commit/f04649f092ee8a4ca245faf97e2891675b586747) to ignore errors when the resulting diff is empty.
Cherry-picks on the other hand are an entirely different story. There's currently no way to tell it to proceed if it is already applied, leaving the tree in an unclean state. Doing a git reset HEAD seems like the best solution here, after making sure the issue is indeed The previous cherry-pick is now empty, possibly due to conflict resolution.. This doesn't mess with local changes as you can't cherry-pick with staged changes.

However, it gets worse: In the audio HAL for example we apply multiple commits on top, each changing similar lines to add platforms. Git doesn't understand what you want anymore and gives a nasty conflict if trying to pick the first commit right on top of a previous picking session. The best solution seems to be git cherry which is able to compare two trees for equivalent commits, but that syntax does not allow us to scan the range android-<current revision>..HEAD for a single, unrelated commit (at least not as far as I understand it).
However, we can always resort to using the tool that makes this thing tick: git patch-id.

[...]

I could explain a lot, but read the commit message and comments in the commits blow 😉

MarijnS95 · 2020-06-01T13:42:25Z

repo_update.sh

+    # Variant on is_patch_in_range
+    # Notable differences:


Let's discuss this one. We can safely use git -c diff.noprefix=true show in is_patch_in_range too, then the only difference between these functions is the -R. For that I have previously written an is_stable_hash_in_range() with the (identical) for-loops moved in.

I guess I would just have done something simple like git log --grep=git log --grep="This reverts commit <commit_id_to_be_reverted>\."

As you might have noticed I totally nerded out over using the tools git uses internally (cherry and patch-id) to see if something is applied already, when the safest might even be to just grep for that revert string and the commit title in case of picks. That immediately saves us from keeping patches up to date when context starts to diverge (and thus the result of patch-id).

Another option to alleviate the issue brought up above is to simply stash local changes, keep track of rev-parse HEAD, checkout on the manifest branch (eg repo sync -l ., apply patches, then rebase the local changes on top. Perhaps also filtered by code similar to what is written here now, just in case GH isn't able to apply it. Doesn't happen often, but it'll likely happen

In the end these changes are a lot of time spent for no good reason besides learning purposes. The only proper way is to sync and reset to the selected android tag, apply/revert everything, then leave it to the user whether they want to rebase their stuff and/or unstash. Adding that flag to the repo sync -l . might have just solved it all.

MarijnS95 · 2020-06-01T13:44:20Z

repo_update.sh

+    # TODO: repo info is slow, even if only fetching locally. We should
+    # perhaps just take the manifest branch once for a common repo like `.repo/manifests`?
+    MANIFEST_BRANCH=$(repo info -l "$path" | sed -n 's/^Manifest branch: //p')


Better ideas to retrieve the desired tag? repo doens't clean up .git/refs/remotes/m/ so those are moot.

repo_update.sh

MarijnS95 · 2020-06-01T13:53:38Z

With the new mechanism we are not totally re-entrant per-se as things may not match up after all, but this is rather uncommon. Only ipacf-mgr and fwb build fingerprint have mismatched context, making patch-id generate different patches.

Keep in mind too that dropped patches will not be detected and removed. That point is largely moot though:
The work I did here (unlike the previous revision) is mainly intended to be lenient towards developers maintaining their own patches on top. Those developers will have to repo sync themselves as well, and wouldn't have gotten rid of those patches anywayif they rebase their stuff on top (or worse, merging things back and forth). Anyone else syncing normally will get patches applied freshly on top of the synced tag.

repo_update.sh

stefanhh0 · 2020-06-02T04:53:38Z

You have done a more sophisticated approach to identify if a commit was already applied (same for revert), I would have done it simply with the following: git log --grep="Change-Id: I8296a8fb551097fabf72115d2cec0849671b91ea". That would also require adding the change id as 3rd mandatory parameter. Not sure if that would have any bigger drawbacks though?!

repo_update.sh

stefanhh0 · 2020-06-02T05:17:39Z

repo_update.sh

+    false
+}
+
+revert_if_not_reverted() {


In case of SKIP_SYNC=false the revert should just be done as before shouldn't it?

Same as above: No need to add extra ifs for cases that are implicitly covered by empty collections.

MarijnS95 · 2020-06-03T22:30:15Z

@stefanhh0 Thanks for the in-depth review! I learned quite a bit about writing bash scripts, as well as realizing (well, technically I already knew this during the implementation) that this method is over-engineered while not nearly watertight. Something as simple as grep on commit titles, hashes in reverts, and/or searches on Change-Id: should be able to do that at much lower cost.

On that note, Change-Id is already in the file and only needs to be uncommented+moved; since it uniquely identifies a gerrit commit we might as well see if we can resolve that to a patch url too. Otherwise it can be extracted from a commit as well.

In that case we'd look for (still constrained to the range manifest_tag..HEAD):

Revert hashes, they'll always be there;
Change-Id: they are likely in most if not all commits;
Commit title.

It's definitely worth to evaluate this method because the last cherry-pick for fwb (SystemUI: Implement burn-in protection for status-bar/nav-bar items) is dated and context doesn't match: patch-id doesn't match with what's applied, then the entire thing collapses on itself.

stefanhh0 · 2020-06-04T03:39:42Z

repo_update.sh

+}
+
+enter_aosp_dir() {
+    if [ -z "$1" ]; then


I am happy to see that you added this error, it would be even better if the other functions would have the same guard.

I'm indifferent on this one. Some functions call other code that errors out by itself, other places and in particular git show will take HEAD as default instead.

You anyway asked me to add set -u which catches most cases where an argument is not even passed: such a check is only helpful when "" is passed (eg. the expanded variables) but they themselves are either:

Not set globally, thus an error;

Perhaps empty, but a result from a git command that we expect to return an errorcode instead.

stefanhh0 · 2020-06-04T03:43:55Z

repo_update.sh

+    local _commit=$1
+    local _stable_hash
+    _stable_hash=$(git -c diff.noprefix=true show --no-commit-id "$_commit" | git patch-id --stable)
+    is_stable_patchid_in_range "${_stable_hash%% *}" "$STABLE_SHAS"


_stable_hash vs. STABLE_SHAS

With SHAS you refer to sha256? Anyway it would be good for consistency reasons to stick to _stable_sha and STABLE_SHAS or _stable_hash and STABLE_HASHES, I'd prefer the latter since it is more general and it is not important to now whether the hash is a SHA256 or something else.

stefanhh0 · 2020-06-04T03:54:11Z

I am happy to see that you applied some of my suggestions. I haven't dived too deep into it - meaning what is needed what not and how to achieve the required goal in the best manner. If git log --grep is not working or also carries some uncertainties with it, then the more sophisticated approach makes sense. I could dive into it more profoundly especially all the git commands to give a better review, however I have time limitations + I think you are doing a good job :-)

MarijnS95 · 2020-06-04T08:03:58Z

I haven't dived too deep into it

That sounds dangerous, you have already pointed out far too many issues and this is only scratching the surface?! 🙈

I do consider abandoning the "reentrancy" commits as they add quite a bit of code without working nearly perfect: and probably replace it with some greps instead, see how that works out for us.

stefanhh0

Change looks nice and sound to me from reading the code. However haven't done any functional testing myself on the changes.

repo_update is currently not reentrant: If a commit is picked once it cannot be picked a second time as that either results in no changes or a bunch of merge conflicts. Generate stable hashes with patch-id that are invariant to line number changes, white space and file ordering, and use those to compare with existing patches on top of the selected manifest tag. Skip the pick on matching hashes. Note that this intentionally does not check if the patch is found anywhere else in the tree: It might have been picked and reverted multiple times in AOSP. Furthermore, if the patch eventually ends up on a tag we want to be alerted too and remove it. This obviously implies that we should not be applying and reverting our own patches either. Signed-off-by: Marijn Suijten <[email protected]>

This variant operates on reverts instead of cherry picks. See the previous commit for an overview of the method and design choices. Reverts are compared by creating a patch that is the revert of the desired commit, of which the stable patch-id hash is used to compare against the commit range. Notable is the mnemonic prefix (a/ and b/) which are swapped in that revert as well, resulting in mismatched hashes. This is addressed by showing the patch instead with diff.noprefix=true. Signed-off-by: Marijn Suijten <[email protected]>

PhaseboyTA

Hop bet

ix5 reviewed May 29, 2020

View reviewed changes

pablomh approved these changes May 30, 2020

View reviewed changes

MarijnS95 force-pushed the cleanup branch from 836eba4 to b7b8634 Compare June 1, 2020 13:35

MarijnS95 commented Jun 1, 2020

View reviewed changes