Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updaterepo.sh: support storing Singularity images unpacked #46

Open
wants to merge 114 commits into
base: main
Choose a base branch
from

Conversation

natefoo
Copy link
Member

@natefoo natefoo commented Mar 19, 2024

There are benefits to storing images unpacked, namely: space, caching, startup time.

Because we would no longer be storing the packed image, relying on rsync to determine what needs to be updated is not possible anymore - instead, we get the local mod times of every entry in /all, we get the remote mod times from the rsync server at depot.galaxyproject.org, and we compare them (as strings) ourself.

This change also reverses the direction of the symlinks. Currently, all images are stored in /all with symlinks in directories corresponding to the first 2 characters in the package filename, e.g.:

nate@foo:/cvmfs/singularity.galaxyproject.org$ ls -l c/e/centos:8.3.2011 all/centos:8.3.2011
lrwxrwxrwx 1 cvmfs cvmfs       25 Feb 10  2021 c/e/centos:8.3.2011 -> ../../all/centos:8.3.2011
-rw-r--r-- 1 cvmfs cvmfs 71241728 Feb 10  2021 all/centos:8.3.2011

In the sandbox layout, we instead (would) have:

nate@foo:/cvmfs/singularity.galaxyproject.org$ ls -l c/e/centos:8.3.2011 all/centos:8.3.2011
drwxr-xr-x 1 cvmfs cvmfs     4096 Mar 19  2024 c/e/centos:8.3.2011
lrwxrwxrwx 1 cvmfs cvmfs       25 Feb 10  2021 all/centos:8.3.2011 -> ../c/e/centos:8.3.2011

Mod times of the upstream image .sifs will be preserved on the /all symlink only, which is used to determine what needs to be mirrored as described above. Packages beginning with bioconductor-* and mulled-v2-* get their own top level dir (and are then hashed by the rest of the image name underneath there) so that the CVMFS catalogs can remain reasonably sized.

This should be the appropriate .cvmfsdirtab, which gives each image its own catalog, plus one for the root + all symlinks in /all:

# all regular packages
/?/?/*
# packages with their own namespace
/bioconductor/?/?/*
/mulled-v2/?/?/*

In testing I was able to verify that if run (as updaterepo.sh -u -r) against an /all directory containing both symlinks to unpacked dirs and image files, only new/changed images would be downloaded. This means we can store new images unpacked while we work to convert old ones.

xref: galaxyproject/galaxy#16433

natefoo and others added 30 commits November 3, 2022 11:59
GA4GH TRS endpoints expect paths ending in nothing, no trailing slash. They additionally have the structure

/blah/1
/blah/1/GALAXY/descriptor

Which obviously doesn't work since we aren't responding to an exact URL but trying to map it to a file. So internally we will call the files 1.json and descriptor.json, and then have nginx strip that out so it just all works
GTN:GA4GH TRS endpoint support rewrite
add my.galaxy.training links
@natefoo
Copy link
Member Author

natefoo commented Mar 19, 2024

I would add - if anyone has ~30 TB free on a server somewhere, it would be a good idea to set up a mirror of https://depot.galaxyproject.org/singularity, since this conversion will remove our CVMFS "backup" of the upstream image files.

@bgruening
Copy link
Member

@natefoo /data/db has space. You should have access. Should we use that?

@natefoo
Copy link
Member Author

natefoo commented Mar 19, 2024

If you'd like, sure, I can make a copy there.

@natefoo
Copy link
Member Author

natefoo commented Mar 19, 2024

It isn't mounted on cvmfs-stratum0.galaxyproject.eu, do you want to mount it there or make a copy from somewhere else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants