Skip to content

Commit

Permalink
[MRG] refactor ZipFileLinearCollection and SaveSignatures_ZipFile
Browse files Browse the repository at this point in the history
… to use `ZipStorage` (#1598)

* port docstrings and MultiIndex improvements over from #1619

* remove unused import

* add tests for MultiIndex manifests, now that they work

* comment and parent cleanup

* Squashed commit of the following:

commit 3268907
Author: C. Titus Brown <[email protected]>
Date:   Fri Jul 2 05:38:56 2021 -0700

    remove leftover merge code

commit 3b53de9
Merge: 0f7dc81 21f5e63
Author: C. Titus Brown <[email protected]>
Date:   Fri Jul 2 05:35:54 2021 -0700

    Merge branch 'latest' of https://github.com/sourmash-bio/sourmash into add/zipfile_use_storage

commit 0f7dc81
Author: C. Titus Brown <[email protected]>
Date:   Fri Jun 25 10:45:34 2021 -0700

    fix error message

commit fc0c6fe
Merge: 65646fb a5a52b1
Author: C. Titus Brown <[email protected]>
Date:   Fri Jun 25 10:17:33 2021 -0700

    Merge branch 'latest' of https://github.com/sourmash-bio/sourmash into add/zipfile_use_storage

commit 65646fb
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 24 11:35:55 2021 -0700

    fix merge

commit dec537a
Merge: c039fd6 9dbd8b5
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 24 11:31:47 2021 -0700

    Merge branch 'latest' of https://github.com/sourmash-bio/sourmash into add/zipfile_use_storage

commit c039fd6
Merge: 89fad20 8cc96cd
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 17:48:50 2021 -0700

    Merge branch 'add/picklist_zf_manifests' into add/zipfile_use_storage

commit 8cc96cd
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 15:43:19 2021 -0700

    fix tests for a CLEAN test-data/prot/ directory

commit 5e49336
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 10:32:49 2021 -0700

    update docstring

commit 2438d90
Merge: 873592d 0ff54e7
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 10:34:23 2021 -0700

    Merge branch 'latest' of https://github.com/sourmash-bio/sourmash into add/picklist_zf_manifests

commit 89fad20
Merge: c6a8ad7 0ff54e7
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 10:33:13 2021 -0700

    Merge branch 'latest' of https://github.com/sourmash-bio/sourmash into add/zipfile_use_storage

commit c6a8ad7
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 10:32:49 2021 -0700

    update docstring

commit bd753d2
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 09:04:56 2021 -0700

    fix a few more things

commit 86ac7ad
Merge: 41438a6 873592d
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 09:03:35 2021 -0700

    Merge branch 'add/picklist_zf_manifests' into add/zipfile_use_storage

commit 873592d
Merge: b6d5547 1992de9
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 08:48:31 2021 -0700

    Merge branch 'latest' of https://github.com/sourmash-bio/sourmash into add/picklist_zf_manifests

commit b6d5547
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 08:47:02 2021 -0700

    add test for multiple selects

commit 8ebac0d
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 08:37:25 2021 -0700

    remove print statements

commit 701878b
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 08:36:45 2021 -0700

    update test files to have manifest, update tests

commit faad6ee
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 08:17:40 2021 -0700

    don't test manifest content

commit 44aba07
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 08:08:58 2021 -0700

    more refactor zipfile select

commit 61ce0f2
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 07:37:53 2021 -0700

    refactor zipfile select

commit 5879ff2
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 07:37:28 2021 -0700

    check compatibility in MinHash.intersection_and_union

commit e1c44a6
Merge: 6c1f9da d473199
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 07:14:38 2021 -0700

    Merge branch 'latest' of http://github.com/sourmash-bio/sourmash into add/picklist_zf_manifests

commit 6c1f9da
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 07:12:58 2021 -0700

    more manifest testing for zipfiles

commit 1b2cf73
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 07:04:47 2021 -0700

    add use_manifest fixture, refactor manifest loading

commit 38ec792
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 06:50:13 2021 -0700

    add sig manifest tests for other file types

commit 6905d40
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 06:42:51 2021 -0700

    update sig manifest to error when manifests cannot be generated

commit fa47667
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 06:19:51 2021 -0700

    rename signatures_with_internal to _signatures_with_internal

commit 096b141
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 05:46:32 2021 -0700

    add manifests to default zip collection output

commit 99199ee
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 05:32:52 2021 -0700

    move manifest stuff to manifest.py

commit 0adee52
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 23 05:28:54 2021 -0700

    remove print

commit 83e387e
Merge: fe83b68 9bb6a9b
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 18:15:48 2021 -0700

    Merge branch 'add/picklist_manifests_sbt' into add/picklist_zf_manifests

commit fe83b68
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 18:07:23 2021 -0700

    revert collection to multiindex

commit 60a6eec
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 14:08:34 2021 -0700

    change LoadedCollection back over to MultiIndex; remove LazyMultiIndex

commit 9bb6a9b
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 12:53:40 2021 -0700

    fix header writing

commit 7486871
Merge: 4221fc9 c3f1a3d
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 12:53:03 2021 -0700

    Merge branch 'add/picklist_zf_manifests' into add/picklist_manifests_sbt

commit 41438a6
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 12:52:32 2021 -0700

    fix header writing

commit 4026855
Merge: 6b18439 c3f1a3d
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 12:48:33 2021 -0700

    Merge branch 'add/picklist_zf_manifests' into add/zipfile_use_storage

commit c3f1a3d
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 12:42:38 2021 -0700

    reverse order of adding to seen set

commit 71b81ed
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 12:41:33 2021 -0700

    add docstring

commit ed5fb7a
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 12:40:43 2021 -0700

    rename matches_siginfo to matches_manifest_row

commit 2756e7d
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 12:28:37 2021 -0700

    add save/load test

commit ba2e53c
Merge: c243b0e c04f137
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 11:12:07 2021 -0700

    Merge branch 'latest' of github.com:dib-lab/sourmash into add/picklist_zf_manifests

commit c243b0e
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 11:10:20 2021 -0700

    add manifest tests

commit e301645
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 10:07:35 2021 -0700

    add a test for sig manifest

commit e315c90
Merge: d95813e 0814bcc
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 22 09:58:29 2021 -0700

    Merge branch 'latest' of github.com:dib-lab/sourmash into add/picklist_zf_manifests

commit d95813e
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 19 13:52:50 2021 -0700

    add manifest versions

commit 4221fc9
Merge: d4a9a2e 31018df
Author: C. Titus Brown <[email protected]>
Date:   Fri Jun 18 05:47:28 2021 -0700

    Merge branch 'add/picklist_zf_manifests' into add/picklist_manifests_sbt

commit 6b18439
Merge: 9ff0eab 31018df
Author: C. Titus Brown <[email protected]>
Date:   Fri Jun 18 05:46:39 2021 -0700

    Merge branch 'add/picklist_zf_manifests' into add/zipfile_use_storage

commit 31018df
Merge: 9e46ff8 74de59a
Author: C. Titus Brown <[email protected]>
Date:   Fri Jun 18 05:44:38 2021 -0700

    Merge branch 'latest' of github.com:dib-lab/sourmash into add/picklist_zf_manifests

commit 9ff0eab
Merge: 9c530a5 9e46ff8
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 14:02:08 2021 -0700

    Merge branch 'add/picklist_zf_manifests' into add/zipfile_use_storage

commit d4a9a2e
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 14:01:48 2021 -0700

    fix test for manifests

commit 2da0085
Merge: a7e153a 9e46ff8
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 14:01:27 2021 -0700

    Merge branch 'add/picklist_zf_manifests' into add/picklist_manifests_sbt

commit 9e46ff8
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 13:55:21 2021 -0700

    cleanup of comments etc.

commit e1e367a
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 13:49:21 2021 -0700

    remove @ctb comments

commit 5cad5ff
Merge: 54ea3f9 8812142
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 12:22:46 2021 -0700

    Merge branch 'add/picklist_selectors' into add/picklist_zf_manifests

commit 8812142
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 12:20:45 2021 -0700

    further attempt to fix test

commit 54ea3f9
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 12:17:27 2021 -0700

    only match picklist at end of 'select'

commit 122d043
Merge: f697ec4 de6f3c4
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 11:40:39 2021 -0700

    Merge branch 'add/picklist_selectors' into add/picklist_zf_manifests

commit de6f3c4
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 11:38:35 2021 -0700

    remove order dependence from test

commit f697ec4
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 09:31:18 2021 -0700

    fix coltypes

commit 7937292
Merge: bba101c 4d156e9
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 09:24:38 2021 -0700

    Merge branch 'add/picklist_selectors' into add/picklist_zf_manifests

commit 4d156e9
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 09:13:52 2021 -0700

    add docs

commit ab286cf
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 08:50:57 2021 -0700

    remove debugging print

commit c965648
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 08:44:32 2021 -0700

    add a test for using prefetch CSV as picklist

commit ca6ea4f
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 08:34:08 2021 -0700

    add picklist test that checks indexing-and-then-search == index

commit bba101c
Merge: 39abe57 ba5c8bc
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 08:13:05 2021 -0700

    Merge branch 'add/picklist_selectors' into add/picklist_zf_manifests

commit ba5c8bc
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 07:47:13 2021 -0700

    block multiple picklists on SBTs and LCAs, for now

commit a074127
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 07:38:45 2021 -0700

    add picklists to lca index

commit a0335a3
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 07:32:10 2021 -0700

    add picklists to sourmash compare

commit c0e5781
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 07:25:59 2021 -0700

    add picklists to prefetch

commit 7a30b20
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 07:12:17 2021 -0700

    add picklists and tests for search, gather, index

commit ced72d2
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 06:25:01 2021 -0700

    add picklist args throughout, eek.

commit 984a557
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 06:14:23 2021 -0700

    fix space

commit fddf141
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 06:14:09 2021 -0700

    move picklist reporting into sourmash_args

commit b3c6bb9
Author: C. Titus Brown <[email protected]>
Date:   Thu Jun 17 06:09:40 2021 -0700

    move picklist.py from sourmash.sig into sourmash

commit 21ce4b7
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 15:41:51 2021 -0700

    fix tests for new SignaturePicklist

commit b8f4bb8
Merge: 8e5fb8d b787b75
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 15:40:10 2021 -0700

    Merge branch 'latest' of github.com:dib-lab/sourmash into add/picklist_selectors

commit 8e5fb8d
Merge: 5ac4671 04c209c
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 12:05:31 2021 -0700

    Merge branch 'add/picklist' into add/picklist_selectors

commit 04c209c
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 11:27:21 2021 -0700

    remove comment

commit 14b87d4
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 11:18:32 2021 -0700

    trap errors and be nice to users

commit 4f8e20c
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 11:16:49 2021 -0700

    cover untested code with tests

commit 8f65f22
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 10:17:50 2021 -0700

    test with --md5 selector

commit 9d60e32
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 10:15:06 2021 -0700

    documentation

commit 3d23d87
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 09:39:59 2021 -0700

    add --picklist-require-all &c

commit 14a88a7
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 09:34:43 2021 -0700

    verify output

commit 207a813
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 09:31:22 2021 -0700

    more picklist tests

commit 9b50748
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 09:19:40 2021 -0700

    fix tests :)

commit aaa4548
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 09:18:08 2021 -0700

    update comments, constructor, etc.

commit a7e153a
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 07:22:39 2021 -0700

    fix tests

commit 9c530a5
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 07:12:46 2021 -0700

    add comment about Storage encapsulation

commit 48fd900
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 07:07:17 2021 -0700

    all tests pass, w00t

commit d6a48c1
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 07:01:34 2021 -0700

    refactor ZipFileLinearIndex to use ZipStorage underneath

commit 5a185bb
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 06:45:42 2021 -0700

    change internal zipfile writing to use ZipStorage

commit c356842
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 06:12:24 2021 -0700

    done, I think?

commit 75dc079
Merge: 1dd8170 39abe57
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 05:25:38 2021 -0700

    Merge branch 'add/picklist_zf_manifests' into add/picklist_manifests_sbt

commit 39abe57
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 05:23:42 2021 -0700

    CSV output function

commit 1dd8170
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 05:20:35 2021 -0700

    add manifests to SBTs

commit 72d8497
Author: C. Titus Brown <[email protected]>
Date:   Wed Jun 16 04:23:48 2021 -0700

    move manifest stuff into manifest class

commit a4057e6
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 10:33:09 2021 -0700

    create LazyMultiIndex

commit 730a717
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 09:40:45 2021 -0700

    more cleanup and docs

commit 230c793
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 09:28:09 2021 -0700

    cleanup and simplification of ZipFile stuff

commit 8a8c3b2
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 08:54:23 2021 -0700

    shift signature metadata matching from manifests over to picklist

commit ab0fc0e
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 08:45:06 2021 -0700

    misc cleanup

commit c3b6fc0
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 08:36:08 2021 -0700

    more cleanup

commit 509eb45
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 08:23:41 2021 -0700

    remove MultiIndex

commit af5eb86
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 08:09:53 2021 -0700

    fix test names for new LoadedCollection

commit c6cb1af
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 08:08:46 2021 -0700

    fix all the tests

commit 915f847
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 07:48:35 2021 -0700

    cleanup/simplification of LoadedCollection

commit be9ef77
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 07:41:48 2021 -0700

    create LoadedCollection to replace MultiIndex non-lazy loading

commit 3c0c9cf
Author: C. Titus Brown <[email protected]>
Date:   Tue Jun 15 07:17:08 2021 -0700

    try making manifests obligatory for MultiIndex

commit 23c1531
Merge: 67a9be1 5ac4671
Author: C. Titus Brown <[email protected]>
Date:   Mon Jun 14 13:35:20 2021 -0700

    Merge branch 'add/picklist_selectors' into add/picklist_zf_manifests

commit 5ac4671
Merge: a88b66d 031522c
Author: C. Titus Brown <[email protected]>
Date:   Mon Jun 14 13:35:04 2021 -0700

    Merge branch 'add/picklist' into add/picklist_selectors

commit 031522c
Merge: 3c05f95 ff75ec0
Author: C. Titus Brown <[email protected]>
Date:   Mon Jun 14 13:34:37 2021 -0700

    Merge branch 'latest' of github.com:dib-lab/sourmash into add/picklist

commit 67a9be1
Author: C. Titus Brown <[email protected]>
Date:   Mon Jun 14 11:46:07 2021 -0700

    more comment

commit 1d7e0cf
Author: C. Titus Brown <[email protected]>
Date:   Mon Jun 14 11:41:22 2021 -0700

    update comment about picklist.found

commit 2f2269b
Author: C. Titus Brown <[email protected]>
Date:   Mon Jun 14 11:33:56 2021 -0700

    work through manifests for MultiIndex

commit cb8e28d
Author: C. Titus Brown <[email protected]>
Date:   Mon Jun 14 06:34:34 2021 -0700

    get started adding manifests to MultiIndex

commit 01d33fc
Author: C. Titus Brown <[email protected]>
Date:   Mon Jun 14 06:21:05 2021 -0700

    provide 'select' more generically on manifests

commit 17b9576
Author: C. Titus Brown <[email protected]>
Date:   Mon Jun 14 06:12:39 2021 -0700

    build out a manifest class a bit

commit b2547f3
Author: C. Titus Brown <[email protected]>
Date:   Sun Jun 13 20:20:28 2021 -0700

    add missing manifest CLI file

commit 14a5ee1
Author: C. Titus Brown <[email protected]>
Date:   Sun Jun 13 08:59:00 2021 -0700

    hacky but functional manifest support

commit 6593a42
Author: C. Titus Brown <[email protected]>
Date:   Sun Jun 13 08:37:05 2021 -0700

    try out manifests

commit e205e64
Author: C. Titus Brown <[email protected]>
Date:   Sun Jun 13 07:39:47 2021 -0700

    special case md5 prefixes, for prefetch

commit b57b2b3
Author: C. Titus Brown <[email protected]>
Date:   Sun Jun 13 07:23:46 2021 -0700

    support special picklist interactions with zipfile collections

commit a88b66d
Author: C. Titus Brown <[email protected]>
Date:   Sun Jun 13 06:32:38 2021 -0700

    factor out picklist checks to 'passes_all_picklists' fn

commit 54407a3
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 11:16:02 2021 -0700

    test 'Index.find' on picklists for SBTs and LCAs

commit 03cc61b
Merge: de6fc06 3c05f95
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 10:48:59 2021 -0700

    Merge branch 'add/picklist' into add/picklist_selectors

commit 3c05f95
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 10:48:15 2021 -0700

    split column_type out of SignaturePicklist a bit

commit 1bdf88e
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 10:28:23 2021 -0700

    split pickfile out a little bit

commit de6fc06
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 10:48:30 2021 -0700

    picklist tests for .signatures() methods on Index classes

commit def1933
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 10:48:15 2021 -0700

    split column_type out of SignaturePicklist a bit

commit a817843
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 10:28:23 2021 -0700

    split pickfile out a little bit

commit b1fc982
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 10:01:56 2021 -0700

    add picklists to selectors

commit 74f31f5
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 09:21:58 2021 -0700

    track found etc

commit 505b04f
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 09:11:24 2021 -0700

    basic tests for picklist functionality

commit 3ecfb48
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 08:55:28 2021 -0700

    integrate picklists into sourmash sig extract

commit bb794ec
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 08:38:24 2021 -0700

    initial picklist implementation

commit 3a583a9
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 07:36:48 2021 -0700

    clean up sourmash.sig submodule

commit 66b0599
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 07:23:13 2021 -0700

    cleanup flakes errors

commit 0997834
Author: C. Titus Brown <[email protected]>
Date:   Sat Jun 12 07:17:44 2021 -0700

    various cleanups of sourmash_args

* fix test by raising proper error

* update comments

* add relevant tests

* fix tests and new, exciting bug

* add tests for hand-created zipfile with, and without, manifests

* remove outdated comment
  • Loading branch information
ctb authored Oct 26, 2021
1 parent 5accf22 commit a1da79e
Show file tree
Hide file tree
Showing 6 changed files with 235 additions and 63 deletions.
57 changes: 32 additions & 25 deletions src/sourmash/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ class MultiIndex - in-memory storage and selection of signatures from multiple
import sourmash
from abc import abstractmethod, ABC
from collections import namedtuple, Counter
import zipfile
import csv
from io import TextIOWrapper

from .search import make_jaccard_search_query, make_gather_query
Expand Down Expand Up @@ -510,9 +510,9 @@ class ZipFileLinearIndex(Index):
"""
is_database = True

def __init__(self, zf, *, selection_dict=None,
def __init__(self, storage, *, selection_dict=None,
traverse_yield_all=False, manifest=None, use_manifest=True):
self.zf = zf
self.storage = storage
self.selection_dict = selection_dict
self.traverse_yield_all = traverse_yield_all
self.use_manifest = use_manifest
Expand All @@ -535,17 +535,17 @@ def __init__(self, zf, *, selection_dict=None,
def _load_manifest(self):
"Load a manifest if one exists"
try:
zi = self.zf.getinfo('SOURMASH-MANIFEST.csv')
manifest_data = self.storage.load('SOURMASH-MANIFEST.csv')
except KeyError:
self.manifest = None
else:
debug_literal(f'found manifest when loading {self.zf.filename}')
debug_literal(f'found manifest on load for {self.storage.path}')

with self.zf.open(zi, 'r') as mfp:
# wrap as text, since ZipFile.open only supports 'r' mode.
mfp = TextIOWrapper(mfp, 'utf-8')
# load manifest!
self.manifest = CollectionManifest.load_from_csv(mfp)
# load manifest!
from io import StringIO
manifest_data = manifest_data.decode('utf-8')
manifest_fp = StringIO(manifest_data)
self.manifest = CollectionManifest.load_from_csv(manifest_fp)

def __bool__(self):
"Are there any matching signatures in this zipfile? Avoid calling len."
Expand All @@ -564,7 +564,7 @@ def __len__(self):

@property
def location(self):
return self.zf.filename
return self.storage.path

def insert(self, signature):
raise NotImplementedError
Expand All @@ -575,23 +575,28 @@ def save(self, path):
@classmethod
def load(cls, location, traverse_yield_all=False, use_manifest=True):
"Class method to load a zipfile."
zf = zipfile.ZipFile(location, 'r')
return cls(zf, traverse_yield_all=traverse_yield_all,
from .sbt_storage import ZipStorage
storage = ZipStorage(location)
return cls(storage, traverse_yield_all=traverse_yield_all,
use_manifest=use_manifest)

def _signatures_with_internal(self):
"""Return an iterator of tuples (ss, location, internal_location).
Note: does not limit signatures to subsets.
"""
for zipinfo in self.zf.infolist():
zf = self.storage.zipfile

# list all the files, without using the Storage interface; currently,
# 'Storage' does not provide a way to list all the files, so :shrug:.
for zipinfo in zf.infolist():
# should we load this file? if it ends in .sig OR we are forcing:
if zipinfo.filename.endswith('.sig') or \
zipinfo.filename.endswith('.sig.gz') or \
self.traverse_yield_all:
fp = self.zf.open(zipinfo)
fp = zf.open(zipinfo)
for ss in load_signatures(fp):
yield ss, self.zf.filename, zipinfo.filename
yield ss, zf.filename, zipinfo.filename

def signatures(self):
"Load all signatures in the zip file."
Expand All @@ -603,30 +608,32 @@ def signatures(self):

# yield all signatures found in manifest
for filename in manifest.locations():
zi = self.zf.getinfo(filename)
fp = self.zf.open(zi)
for ss in load_signatures(fp):
data = self.storage.load(filename)
for ss in load_signatures(data):
# in case multiple signatures are in the file, check
# to make sure we want to return each one.
if ss in manifest:
yield ss

# no manifest! iterate.
else:
for zipinfo in self.zf.infolist():
storage = self.storage
# if no manifest here, break Storage class encapsulation
# and go for all the files. (This is necessary to support
# ad-hoc zipfiles that have no manifests.)
for zipinfo in storage.zipfile.infolist():
# should we load this file? if it ends in .sig OR force:
if zipinfo.filename.endswith('.sig') or \
zipinfo.filename.endswith('.sig.gz') or \
self.traverse_yield_all:
fp = self.zf.open(zipinfo)

if selection_dict:
select = lambda x: select_signature(x,
**selection_dict)
else:
select = lambda x: True

for ss in load_signatures(fp):
data = self.storage.load(zipinfo.filename)
for ss in load_signatures(data):
if select(ss):
yield ss

Expand All @@ -639,7 +646,7 @@ def select(self, **kwargs):

if manifest is not None:
manifest = manifest.select_to_manifest(**kwargs)
return ZipFileLinearIndex(self.zf,
return ZipFileLinearIndex(self.storage,
selection_dict=None,
traverse_yield_all=traverse_yield_all,
manifest=manifest,
Expand All @@ -659,7 +666,7 @@ def select(self, **kwargs):
d[k] = v
kwargs = d

return ZipFileLinearIndex(self.zf,
return ZipFileLinearIndex(self.storage,
selection_dict=kwargs,
traverse_yield_all=traverse_yield_all,
manifest=None,
Expand Down
2 changes: 1 addition & 1 deletion src/sourmash/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def load_from_csv(cls, fp):
row = None

# do row type conversion
introws = ('num', 'scaled', 'with_abundance', 'ksize', 'n_hashes')
introws = ('num', 'scaled', 'ksize', 'n_hashes')
boolrows = ('with_abundance',)

for row in r:
Expand Down
11 changes: 8 additions & 3 deletions src/sourmash/sbt_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ def __init__(self, path):
self.zipfile = zipfile.ZipFile(path, 'r')
self.bufferzip = zipfile.ZipFile(BytesIO(), mode="w")

self.subdir = None
self.subdir = ""
subdirs = [f for f in self.zipfile.namelist() if f.endswith("/")]
if len(subdirs) == 1:
self.subdir = subdirs[0]
Expand Down Expand Up @@ -204,7 +204,10 @@ def load(self, path):
try:
return self._load_from_zf(self.zipfile, path)
except KeyError:
return self._load_from_zf(self.bufferzip, path)
if self.bufferzip:
return self._load_from_zf(self.bufferzip, path)
else:
raise FileNotFoundError(path)

def init_args(self):
return {'path': self.path}
Expand Down Expand Up @@ -270,7 +273,9 @@ def flush(self, *, keep_closed=False):
# Since there is no duplicated data, we can
# reopen self.zipfile in append mode and write the new data
self.zipfile.close()
if not keep_closed:
if keep_closed:
raise Exception("unexpected error")
else:
zf = zipfile.ZipFile(self.path, mode='a',
compression=zipfile.ZIP_STORED)
for item in new_data:
Expand Down
71 changes: 39 additions & 32 deletions src/sourmash/sourmash_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from enum import Enum
import traceback
import gzip
import zipfile
from io import StringIO

import screed
Expand Down Expand Up @@ -745,7 +744,7 @@ def add(self, ss):


class SaveSignatures_SigFile(_BaseSaveSignaturesToLocation):
"Save signatures within a directory, using md5sum names."
"Save signatures to a .sig JSON file."
def __init__(self, location):
super().__init__(location)
self.keep = []
Expand Down Expand Up @@ -785,7 +784,7 @@ class SaveSignatures_ZipFile(_BaseSaveSignaturesToLocation):
"Save compressed signatures in an uncompressed Zip file."
def __init__(self, location):
super().__init__(location)
self.zf = None
self.storage = None

def __repr__(self):
return f"SaveSignatures_ZipFile('{self.location}')"
Expand All @@ -799,53 +798,61 @@ def close(self):
manifest.write_to_csv(manifest_fp, write_header=True)
manifest_data = manifest_fp.getvalue().encode("utf-8")

# compress the manifest --
self.zf.writestr(manifest_name, manifest_data,
compress_type=zipfile.ZIP_DEFLATED)
self.storage.save(manifest_name, manifest_data, overwrite=True,
compress=True)
self.storage.flush()
self.storage.close()

def open(self):
from .sbt_storage import ZipStorage

# set permissions:
zi = self.zf.getinfo(manifest_name)
zi.external_attr = 0o444 << 16 # give a+r access
do_create = True
if os.path.exists(self.location):
do_create = False

self.zf.close()
storage = ZipStorage(self.location)
if not storage.subdir:
storage.subdir = 'signatures'

def open(self):
self.zf = zipfile.ZipFile(self.location, 'w', zipfile.ZIP_STORED)
self.manifest_rows = []
# now, try to load manifest
try:
manifest_data = storage.load('SOURMASH-MANIFEST.csv')
except (FileNotFoundError, KeyError):
# if file already exists must have manifest...
if not do_create:
raise ValueError(f"Cannot add to existing zipfile '{self.location}' without a manifest")
self.manifest_rows = []
else:
# success! decode manifest_data, create manifest rows => append.
manifest_data = manifest_data.decode('utf-8')
manifest_fp = StringIO(manifest_data)
manifest = CollectionManifest.load_from_csv(manifest_fp)
self.manifest_rows = list(manifest._select())

self.storage = storage

def _exists(self, name):
try:
self.zf.getinfo(name)
self.storage.load(name)
return True
except KeyError:
return False

def add(self, ss):
if not self.zf:
if not self.storage:
raise ValueError("this output is not open")

super().add(ss)

buf = sigmod.save_signatures([ss], compression=1)
md5 = ss.md5sum()
outname = f"signatures/{md5}.sig.gz"

# don't overwrite even if duplicate md5sum.
if self._exists(outname):
i = 0
while 1:
outname = os.path.join(self.location, f"{md5}_{i}.sig.gz")
if not self._exists(outname):
break
i += 1

json_str = sourmash.save_signatures([ss], compression=1)
self.zf.writestr(outname, json_str)

# set permissions:
zi = self.zf.getinfo(outname)
zi.external_attr = 0o444 << 16 # give a+r access
storage = self.storage
path = f'{storage.subdir}/{md5}.sig.gz'
location = storage.save(path, buf)

# update manifest
row = CollectionManifest.make_manifest_row(ss, outname,
row = CollectionManifest.make_manifest_row(ss, location,
include_signature=False)
self.manifest_rows.append(row)

Expand Down
2 changes: 1 addition & 1 deletion tests/test_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -869,7 +869,7 @@ def test_zipfile_API_signatures_traverse_yield_all(use_manifest):
assert len(zipidx) == 8

# confirm that there are 12 files in there total, incl build.sh and dirs
zf = zipidx.zf
zf = zipidx.storage.zipfile
allfiles = [ zi.filename for zi in zf.infolist() ]
print(allfiles)
assert len(allfiles) == 13
Expand Down
Loading

0 comments on commit a1da79e

Please sign in to comment.