Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The MPS project is hosted in Ravenbrook's Perforce repo, making it hard to sustain #98

Open
rptb1 opened this issue Jan 8, 2023 · 23 comments
Assignees
Labels
adaptability Affecs the cost of developing the MPS to meet changing requirements. critical Will cause failure of the entire project git-migration Project migration from Ravenbrook internal Perforce infrastructure to public git repo maintainability Affects the cost of maintaining the MPS to meet current requirements.

Comments

@rptb1
Copy link
Member

rptb1 commented Jan 8, 2023

The true home of the MPS and all its history is currently Ravenbrook's Perforce SCM repository, hosted on Ravenbrook's servers, and only accessible to Ravenbrook staff. The Git repositories of the MPS (such as the one at GitHub) are mirrors maintained by Ravenbrook using a combination of Perforce Git Fusion and a bit of automation.

Perforce is an excellent system, but the open source world now expects to collaborate using Git. And in any case, Perforce requires per-seat licensing that we can't extend to the world.

Git Fusion is an amazing piece of work that allows for two-way synchronisation between Perforce and Git, but it is no longer maintained by Perforce, and it was never designed to maintain a permanent two-way mirror. It still introduces friction in collaborating in Git, and has maintenance costs. Git Fusion is open source, but one day it will break and those costs will increase dramatically.

So, we plan to migrate the MPS to Git. Initially, we will also be using GitHub as the hosting platform.

The first step will involve switching the "home" of the MPS to Git. This means treating the Git repo as the "source of truth" and Perforce as the mirror. Work will primarily take place using Git (and via GitHub). We have to update our procedures and infrastructure to use Git. And we should, as far as possible, make all of it accessible and usable by anyone, not just Ravenbrook staff. Then we will truly have opened up the MPS to FOSS collaboration.

Git Fusion has created a complete mirror of the history of the MPS codeline, including changes going back to 1994. There is other material that hasn't made it into the Git repo (see #95 for some of it) but that can also be migrated. There is other very important material, such as the issues data (stored in Perforce jobs), that has no home in Git, but might be mirrored to GitHub issues. We must be very cautious about that dependency.

In the short term, we are likely to keep using Git Fusion to keep Perforce up-to-date. It may be possible to do this indefinitely with little overhead, but it should not be necessary to the future of the MPS. But it's likely that the MPS in Perforce will suffer a divorce with Git, and be mothballed.

This issue exists for discussing and tracking the migration.

@rptb1 rptb1 added the git-migration Project migration from Ravenbrook internal Perforce infrastructure to public git repo label Jan 8, 2023
@rptb1
Copy link
Member Author

rptb1 commented Jan 10, 2023

Fix and update the pull request merge procedure, #97 , contains work to ensure that Git and Perforce remain consistent while we're migrating. The pull request merge procedure (procedure/pull-request-merge.rst) can be simplified a great deal once Git becomes home. We will be able to use the GitHub merge button. See "Why not press the GitHub merge button?"

@rptb1
Copy link
Member Author

rptb1 commented Jan 10, 2023

Git allows the destruction and rewriting of history, unlike Perforce. We want all history preserved as far as possible to support the investigation of bugs and the intentions of changes. We certainly don't want any history before now to be changed, even if Git practices differ.

I've written about this in the pull request procedure rationale but this stuff should probably be promoted to a policy document (i.e. a design-of-process).

GitHub offers some protection from Git anarchy. I've enabled branch protection rules and protected release tags for a start. These are exactly the sorts of things that would be informed by a policy.

@UNAA008 and I should review all the repo settings and documentation.

@rptb1
Copy link
Member Author

rptb1 commented Jan 10, 2023

I've written about this in #97 but this stuff should probably be promoted to a policy document (i.e. a design-of-process).

This policy should be clear about the separation of Git and GitHub. The MPS is migration to Git but we are using GitHub. As far as is feasible, we want our practices to function in Git, even where they are supported by automation from GitHub. We want to be able to collaborate via other forms of Git hosting, and be able to leave GitHub fairly easily.

That means we can't rely on GitHub's branch protection, merge buttons, etc. but of course we should make use of them for efficiency.

@rptb1
Copy link
Member Author

rptb1 commented Jan 10, 2023

About GitHub dependency.

GitHub's issues, pull request, and other similar stuff are very useful and quick, but the data is not in the repository, and lives on GitHub. That means that someone who clones the repo doesn't get that information.

While the MPS lives in Perforce, we are able to reference Perforce data and project documentation that doesn't live in the MPS codeline, and be sure of its long term availability and integrity. But not so on GitHub, which doesn't belong to us, or anyone who might want to develop the MPS.

We could us the GitHub API to copy these kinds of records in to the repo itself. For example, the details of a pull request might also become a document in the pull request's branch and ends up as a permanent record. We'd have to investigate whether other people have done something similar, how hard it would be to maintain, etc.

@UNAA008
Copy link
Contributor

UNAA008 commented Jan 10, 2023

We should look at what FOSSIL (https://fossil-scm.org) does about managing this sort of metadata.

@rptb1
Copy link
Member Author

rptb1 commented Jan 11, 2023

We could us the GitHub API to copy these kinds of records ... We'd have to investigate whether other people have done something similar, how hard it would be to maintain, etc.

See Enabling or disabling GitHub Discussions for a repository which then leads on to https://github.com/marketplace?category=backup-utilities which includes various GitHub metadata backup utilities.

The most promising is https://github.com/marketplace/actions/github-archive-action

GitHub has data that is not represented in git — like Issues and PRs. The purpose of this action is to capture that data in a portable, usable fashion. ... A clone of the repository will include the github-meta branch, making it easy to build tooling that uses that data.

@rptb1
Copy link
Member Author

rptb1 commented Jan 11, 2023

Reviewing the GitHub docs, I note:

  1. our licence isn't detected as BSD 2-clause
  2. I set the "Social preview" to the MPS logo from //info.ravenbrook.com/project/mps/art/2013-06-03/mps-logo/mps-logo-512-transparent.png , which illustrates Git Repo needs copies of MPS project and Ravenbrook documents #95 again
  3. We need a regular periodic review of Teams and People, and indeed all other repo and org settings

@rptb1
Copy link
Member Author

rptb1 commented Jan 11, 2023

That means we can't rely on GitHub's branch protection, merge buttons, etc. but of course we should make use of them for efficiency.

In a discussion with @UNAA008 we came up with a good summary of this. Microsoft (who own GitHub) are past masters at vendor lock-in and we must be wary. The way to deal with it is to make sure we maintain alternative methods and ways to back out of GitHub with low cost.

A good example might be use of GitHub CI, as I'm experimenting with at the moment. As it happens, GitHub CI does not support FreeBSD, so we have a reason to maintain a relationship with Travis CI, but in any case it would be wise to maintain the link so that we can switch away from GitHub easily.

@rptb1
Copy link
Member Author

rptb1 commented Jan 11, 2023

The MPS Manual is currently built by automation hosted at Ravenbrook. This should be:

  1. exposed in the Git repo
  2. could be triggered by GitHub workflows
  3. could publish to well-known sites such as Read the Docs (doesn't build at the moment)

Any such GitHub workflows should be simple calls to code that can be run from elsewhere than GitHub, to maintain GitHub independence.

@rptb1
Copy link
Member Author

rptb1 commented Jan 12, 2023

The current code for gitpushbot assumes that there will be no changes in git, and so does not report conflicts.

The script tries to avoid hammering the remote servers by cacheing the last commit of the remote ref. The cache does not expire. 1/30 of the cache expires each day. This was fine when we didn't expect changes in Git, but no good while we're switching over.

Also, this script was written in the early days of GitHub. I doubt our "hammering" is significant to them now. (It might be to other servers though.)

We can set a per-remote cache expiry and have it short for GitHub, so that we notice things faster.

We should also consider making gitpushbot two-way, or having gitpullbot, or similar.

@rptb1
Copy link
Member Author

rptb1 commented Jan 12, 2023

I propose these milestones:

  1. Reversing the polarity of the neutron flow. Instead of insisting that Perforce is the true master, and Git is a mirror, we make Git the master, and keep Perforce up-to-date with it. This will be possible only if the Git history is not edited by force pushes, squashes, etc. We are asking the Master to dematerialise his TARDIS from Perforce and rematerialise it in Git. [Created https://github.com/Ravenbrook/mps/milestone/1 ]
  2. The Perforce Divorce. We declare that the Perforce master is no longer kept up to date with Git, and prevent any further changes. It becomes an archive of events up to a certain date. [Created https://github.com/Ravenbrook/mps/milestone/2 ]

@rptb1 rptb1 added the critical Will cause failure of the entire project label Jan 15, 2023
@rptb1
Copy link
Member Author

rptb1 commented Jan 16, 2023

Linking for reference: "Challenges for MPS development", an e-mail from @gareth-rees and in particular section 1.1 "Move from Perforce to Git as primary repository".

@rptb1 rptb1 added this to the Perforce Divorce milestone Jan 16, 2023
@rptb1
Copy link
Member Author

rptb1 commented Jan 16, 2023

We certainly don't want any history before now to be changed, even if Git practices differ.

There is some history before now (2023-01) that has been changed. The branches branch/2020-08-29/page-sparering-elim and branch/2020-08-31/walk have diverged. Their hashes in GitHub no longer match the hashes produced by Git Fusion for those branches, which are immutable.

@rptb1
Copy link
Member Author

rptb1 commented Jan 17, 2023

We must revise the current MPS page at https://www.ravenbrook.com/project/mps/

We should probably revise it soon to make it clear what's happening.

We should update it at milestone Perforce Polarity to direct people to GitHub.

@rptb1
Copy link
Member Author

rptb1 commented Jan 26, 2023

The headings and justifcation of our Perforce "jobspec" are described in rule.defect but apply to GitHub issues. We should review them and adapt into a GitHub issue template (also linked to those rules).

@rptb1
Copy link
Member Author

rptb1 commented Jan 28, 2023

We should ensure that every task branch with any value is transferred to a GitHub pull request along with any issue report that it refers to.

@rptb1
Copy link
Member Author

rptb1 commented Feb 2, 2023

The MPS Manual is currently built by automation hosted at Ravenbrook.

This will be fixed by #141 but at some point before Perforce Divorce we should redirect the old manual URLs.

And in general, we should plan whether we want to redirect other Ravenbrook URLs.

@rptb1
Copy link
Member Author

rptb1 commented Feb 2, 2023

We must be very cautious about [GitHub] dependency.

One way to deal with this might be through process rules (used during review) that cause us to consider, for each change, what happens if GitHub or other dependencies (Travis, Read the Docs) go away. For an example of me doing this spontaneously, see
#141 (comment) . (@UNAA008 reminded me of this recently.)

@rptb1
Copy link
Member Author

rptb1 commented Feb 3, 2023

We must search the manual for dependencies on www.ravenbrook.com, like this one

Download the latest MPS Kit release from `<https://www.ravenbrook.com/project/mps/release/>`_.

Actually, we must search the entire tree.

@thejayps thejayps changed the title The MPS is hosted in Ravenbrook's Perforce repo, making it hard to collaborate The MPS project is hosted in Ravenbrook's Perforce repo, making it hard to sustain Feb 13, 2023
@rptb1
Copy link
Member Author

rptb1 commented Feb 22, 2023

We should review and capture actions from "Re: Meeting with Pute 2019-10-15, 2019-10-22" which was largely about migration of MPS to Git. Also "Meeting with Pute 2019-12-10 (was Re: Meeting with Pute 2019-10-15, 2019-10-22)".

@rptb1 rptb1 added maintainability Affects the cost of maintaining the MPS to meet current requirements. adaptability Affecs the cost of developing the MPS to meet changing requirements. labels Feb 23, 2023
@UNAA008
Copy link
Contributor

UNAA008 commented Mar 6, 2023

Perforce provides a simple way to automatically imprint version information in a file using keyword expansion.
This provides a quick way of checking whether a document being used (such as a procedure document) is current.
We should consider how to adapt to the loss of this information, for example by making more use of document history sections or providing another form of automatic ID generation.

@rptb1
Copy link
Member Author

rptb1 commented Mar 6, 2023

Perforce provides a simple way to automatically imprint version information in a file using keyword expansion.

See https://stackoverflow.com/a/1796675 (and the rest of that discussion) for something related in Git.

This provides a quick way of checking whether a document being used (such as a procedure document) is current.

The Git thing above does not provide anything quick or convenient. As usual, Git is dominated and constrained by its central technical idea, not by meeting SCM requirements.

@rptb1
Copy link
Member Author

rptb1 commented Oct 20, 2023

Git does not offer an equivalent to the venerable RCS Keywords offered by Perforce. This makes it hard for us to automatically comply with rule.generic.ident for source code. We need to:

  1. define the requirement for source code tagging
  2. identify alternative solutions
  3. adjust the rule

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adaptability Affecs the cost of developing the MPS to meet changing requirements. critical Will cause failure of the entire project git-migration Project migration from Ravenbrook internal Perforce infrastructure to public git repo maintainability Affects the cost of maintaining the MPS to meet current requirements.
Projects
Development

No branches or pull requests

2 participants