[EXP] provide signature file loading function via HTTP #2256
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for direct loading of signatures via HTTP URLs with
GET
, i.e the normal way of getting files from a Web server.See discussion here, #2257.
NEXT STEPS: Per #2257, I should should look into impementing this more generically using fsspec.
So, for example, this supports:
Loading JSON sig/sig.gz files directly from Web sites
If you make raw JSON signature files available via an apache download link, you can Do Things With Them:
You can build manifests with HTTP URLs in them, too
For example,
yields
and then you can do things like
In turn, this allows you to use the full machinery of picklists etc. on non-local signatures.
With the caveat that you might end up asking to download 13 TB of signature files if you make a mistake...
More on standalone manifests
So for example if you have a manifest containing a bunch of signatures you can use
--include
to get just the signatures containing a keyword in the name, or at a particular ksize, and build a local database just from them:Summary thoughts
Anyway this is some useful mixture of oh-so-wrong and so-very-much-right...
This is probably most useful for things like genomes where the individual signatures are quite small; we could distribute just a manifest CSV file to support certain kinds of things. It's also a good reason to support better encoding formats than JSON per #1262.
NOTE: conflicts, maybe, with #1644, which also takes HTTP URLs.
Alternative/additional implementation thoughts
Instead of a custom loader fn for signatures only, we could generically support grabbing files and turning them into file handles for signature, pathlist, and manifest loading.