-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can we turn picklists + collections into manifests? #3048
Comments
sig check seems to work for this! Build a bunch of individual .sig.zip files:
create a picklist for some of them in
then run
and
shows just five sketches in the standalone manifest
|
@AnneliektH maybe worth giving it a try ;) |
Ok so this would work perfectly, but I did something wrong I think while creating all signatures initially. Just rebuild all signatures? |
that's up to you 😆 - depends on what's easiest. we have you can also ignore all of that and use the md5sum in the picklist, instead of the name/ident, which should also work. |
Ok so its almost working.. Snakefile makes the manifests (in /group/ctbrowngrp2/scratch/annie/2023-swine-sra/sourmash/manifests/MAGs), where there is a manifest for each clustering treshold of genomes. When running mgmanysearch, I get an error, which I think has to do with the file paths of the queries (aka the manifests): Within the manifest, the internal location is correctly stated as "sig_files/signatures_concat/MAGs2.zip" Seems like it pastes the manifest location in front of the query location, which isnt where the files at |
Spent time figuring this all out (see #3053 for a demonstration), and I think that the |
(once #3054 is merged, solution will be to use |
…` and `sig collect` (#3054) This PR updates `sig collect` and `sig check` so that they can produce standalone manifests that work properly with default sourmash loading behavior. The default behavior produces broken manifests in some situations and is not changed, but will be deprecated in v5. ## Details Currently, `sig collect` and `sig check` default to producing standalone manifests with internal path locations relative to the current working directory. This conflicts with the default `StandaloneManifest` behavior implemented in `save_load.py` that loads path locations relative to the manifest location. As a result, whenever the manifest was in a subdirectory, the standalone manifests output by `sig check` and `sig collect` were broken. The only way to make good manifests in this situation was to use `sig collect --abspath`, but `sig check` didn't support `--abspath`, and using absolute paths is brittle in situations where you want to distribute manifests. This PR adds `--relpath` to both `sig check` and `sig collect`, and adds `--abspath` to `sig check`. It also demonstrates the bad behavior in tests and annotates the tests appropriately. See #3008 (comment) for more detailed discussion of why I think `--relpath` is the right behavior for the future. - [x] adds `--abspath` and `--relpath` to `sig check`, to properly support relative paths; - [x] adds `--relpath` to `sig collect`, to properly support relative paths; - [x] documents this behavior properly for creating standalone manifests; - [ ] create issue to change default `sig check` and `sig collect` behavior for v4, and disable cwd behavior. Techie TODO: - [x] explicitly test `relpath` and `abspath` behavior in `sig check`; - [x] explicitly test `relpath` behavior in `sig collect` - [x] write some tests for `sig check` and `sig collect` to explore the relative path loading issue, with all three combinations of relpath: mf in cwd, sigs in subdir; mf in subdir, sigs in cwd; mf in subdir, sigs in subdir. Related issues: * Addresses #3008 * Addresses issues in #3048 by updating `sig check` to support `--relpath`; * Fixes #3053 - `--relpath` again --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
along the theme of cool ways to subset collections with manifests, @AnneliektH is converting
fastgather
results into manifests here. This is presumably becausemgmanysearch
doesn't support picklists (for which fastgather output could then be used), but does supports standalone manifests.so the question du jour is: do we have a standard way to go from picklists + collections => standalone manifest?
I think
sourmash sig check
might do it: docs. I will check and then recommend it to annie if so :).it looks like
sourmash sig collect
does not, however: docs. That was my first guess.related:
The text was updated successfully, but these errors were encountered: