-
Notifications
You must be signed in to change notification settings - Fork 80
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MRG: add label output & input options to
compare
and plot
, for be…
…tter customization (#2598) Adds `sourmash compare --labels-to` and `sourmash plot --labels-from` to support better label customization. Fixes #2452 Fixes #2915 ## `sourmash compare --labels-to` This command will generate a 'labels-to' file. Running: ``` sourmash compare tests/test-data/demo/*.sig -o compare-demo \ --labels-to compare-demo-labels.csv ``` will produce a file that looks like this: file `compare-demo-labels.csv`: ```csv order,md5,label,name,filename,signature_file 1,60f7e23c24a8d94791cc7a8680c493f9,SRR2060939_1.fastq.gz,,SRR2060939_1.fastq.gz,../tests/test-data/demo/SRR2060939_1.sig 2,4e94e60265e04f0763142e20b52c0da1,SRR2060939_2.fastq.gz,,SRR2060939_2.fastq.gz,../tests/test-data/demo/SRR2060939_2.sig 3,f71e78178af9e45e6f1d87a0c53c465c,SRR2241509_1.fastq.gz,,SRR2241509_1.fastq.gz,../tests/test-data/demo/SRR2241509_1.sig 4,6d6e87e1154e95b279e5e7db414bc37b,SRR2255622_1.fastq.gz,,SRR2255622_1.fastq.gz,../tests/test-data/demo/SRR2255622_1.sig 5,0107d767a345eff67ecdaed2ee5cd7ba,SRR453566_1.fastq.gz,,SRR453566_1.fastq.gz,../tests/test-data/demo/SRR453566_1.sig 6,f0c834bc306651d2b9321fb21d3e8d8f,SRR453569_1.fastq.gz,,SRR453569_1.fastq.gz,../tests/test-data/demo/SRR453569_1.sig 7,b59473c94ff2889eca5d7165936e64b3,SRR453570_1.fastq.gz,,SRR453570_1.fastq.gz,../tests/test-data/demo/SRR453570_1.sig ``` The `label` column in this file can be edited to suit the user's needs; the index column is `order`, and all other columns can be ignored or deleted or updated without consequence. ## `sourmash plot --labels-from` This command will load labels from a file. Running: ``` sourmash plot --labels-from compare-demo-new-labels.csv compare-demo ``` uses the `label` column from the CSV as labels, in the order specified by the `order` column (interpreted as integers and sorted from lowest to highest). All other columns are ignored. ## Example in a Jupyter Notebook Some example code for updating the labels is available here: https://github.com/sourmash-bio/sourmash/blob/compare_labels/doc/plotting-compare.ipynb ## TODO - [x] add test for `args.labeltext and args.labels_from` check - [x] check the notebook update ## Future: - [ ] Consider switching to `LinearIndex` in the signature loading code, as that would let us maintain the location in the code without the current machinations. Also worth thinking about enabling lazy loading, which some future `Index`-code based modification might support. - [ ] consider if and how to validate --labels-from CSV file... --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
a128ee3
commit c6831fd
Showing
6 changed files
with
194 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
sort_order,md5,label,name,filename,signature_file | ||
4,8a619747693c045afde376263841806b,genome-s10+s11-CHANGED,genome-s10+s11,-,/Users/t/dev/sourmash/tests/test-data/genome-s10+s11.sig | ||
3,ff511252a80bb9a7dbb0acf62626e123,genome-s12-CHANGED,genome-s12,genome-s12.fa.gz,/Users/t/dev/sourmash/tests/test-data/genome-s12.fa.gz.sig | ||
2,1437d8eae64bad9bdc8d13e1daa0a43e,genome-s11-CHANGED,genome-s11,genome-s11.fa.gz,/Users/t/dev/sourmash/tests/test-data/genome-s11.fa.gz.sig | ||
1,4cb3290263eba24548f5bef38bcaefc9,genome-s10-CHANGED,genome-s10,genome-s10.fa.gz,/Users/t/dev/sourmash/tests/test-data/genome-s10.fa.gz.sig |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters