Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to migrate existing build results to Pulp #3509

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

FrostyX
Copy link
Member

@FrostyX FrostyX commented Nov 11, 2024

Fix #3503

@FrostyX FrostyX added the pulp label Nov 11, 2024
backend/run/copr-change-storage Fixed Show fixed Hide fixed
backend/run/copr-change-storage Fixed Show fixed Hide fixed
backend/run/copr-change-storage Fixed Show fixed Hide fixed
@FrostyX FrostyX force-pushed the migrate-data-to-pulp branch 3 times, most recently from a7d33d1 to 92c67e6 Compare November 13, 2024 17:05
@FrostyX FrostyX marked this pull request as ready for review November 13, 2024 17:06
@FrostyX
Copy link
Member Author

FrostyX commented Nov 13, 2024

I successfully used the script to migrate data from the Copr STG instance to the Pulp STG instance.
https://copr.stg.fedoraproject.org/coprs/frostyx/pulp-migration/

It woks for CoprDirs as well:

docker run -it quay.io/fedora/fedora:40 bash
dnf install 'dnf-command(copr)'
dnf copr enable copr.stg.fedoraproject.org/frostyx/pulp-migration:custom:bar
dnf install hello

This successfully installs the package from Pulp.

sys.exit(1)

if args.delete:
print("Data removal is not supported yet")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have a pre-initialized logger, I think we should avoid using print() command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, updated.


for builddir in os.listdir(chrootdir):
resultdir = os.path.join(chrootdir, builddir)
if not os.path.isdir(resultdir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this check is enough..., maybe the upload_build_results is clever enough to handle issues? But see how this checking is done for the resultdir cleaner crawler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that would cause problems. Updated.

log.error("Failed to publish a repository: %s", resultdir)
break

log.info("OK: %s", resultdir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can not make this in a transactional manner.. (if error happens, rollback). But would it be possible to first analyze the situation and gather the tasks that need to be done, fail if some problem happens, and only if no problems happen - start the processing?

Also, I'm curious if whether we need a project lock (for building and other modification).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But would it be possible to first analyze the situation and gather the tasks that need to be done, fail if some problem happens, and only if no problems happen - start the processing?

Sooo, I am not really sure how helpful this would be. Gathering tasks beforehand would probably avoid issues like the script trying to access a directory it doesn't have permissions to and then failing. Or something like this. But I suppose the majority of failures that can/will happen are going to happen due to networking issues or something else when actually uploading things to Pulp. And having a calculated list of tasks wouldn't IMHO help.

I would probably only remember or maybe pre-calculate the number of RPM files we are uploading and after everything is done, query Pulp to find out if we have the same number. Or maybe compare names of RPMs if we wanted to be more precise. If it doesn't match, we can either re-try several times, or just log it and manually review all failures.

Also, I'm curious if whether we need a project lock (for building and other modification).

That dumping a lockfile in this script would be easy but changing our build-related code, action code, cron jobs, etc to respect the lock, sounds like a bigger problem.

If such a locking feature would be generally useful, then sure. But if the only purpose would be for the Pulp migration, I hope we could figure something easier.

For initial migrations of test users, I think we would be fine with "please don't submit new builds until the migration is finished". And the mass migration of everything, will be done in batches. So maybe we can just put an ugly hack into our build/action scheduler to temporarily hide all jobs that fall in the currently migrated batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Script to migrate existing project (all of its data) to Pulp
2 participants