-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updaterepo.sh: support storing Singularity images unpacked #46
base: main
Are you sure you want to change the base?
Conversation
GA4GH TRS endpoints expect paths ending in nothing, no trailing slash. They additionally have the structure /blah/1 /blah/1/GALAXY/descriptor Which obviously doesn't work since we aren't responding to an exact URL but trying to map it to a file. So internally we will call the files 1.json and descriptor.json, and then have nginx strip that out so it just all works
GTN:GA4GH TRS endpoint support rewrite
add my.galaxy.training links
I would add - if anyone has ~30 TB free on a server somewhere, it would be a good idea to set up a mirror of https://depot.galaxyproject.org/singularity, since this conversion will remove our CVMFS "backup" of the upstream image files. |
@natefoo /data/db has space. You should have access. Should we use that? |
If you'd like, sure, I can make a copy there. |
It isn't mounted on cvmfs-stratum0.galaxyproject.eu, do you want to mount it there or make a copy from somewhere else? |
There are benefits to storing images unpacked, namely: space, caching, startup time.
Because we would no longer be storing the packed image, relying on rsync to determine what needs to be updated is not possible anymore - instead, we get the local mod times of every entry in
/all
, we get the remote mod times from the rsync server at depot.galaxyproject.org, and we compare them (as strings) ourself.This change also reverses the direction of the symlinks. Currently, all images are stored in
/all
with symlinks in directories corresponding to the first 2 characters in the package filename, e.g.:In the sandbox layout, we instead (would) have:
Mod times of the upstream image .sifs will be preserved on the
/all
symlink only, which is used to determine what needs to be mirrored as described above. Packages beginning withbioconductor-*
andmulled-v2-*
get their own top level dir (and are then hashed by the rest of the image name underneath there) so that the CVMFS catalogs can remain reasonably sized.This should be the appropriate
.cvmfsdirtab
, which gives each image its own catalog, plus one for the root + all symlinks in/all
:In testing I was able to verify that if run (as
updaterepo.sh -u -r
) against an/all
directory containing both symlinks to unpacked dirs and image files, only new/changed images would be downloaded. This means we can store new images unpacked while we work to convert old ones.xref: galaxyproject/galaxy#16433