-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: update documentation for RocksDB index
and internal/external storage, + miscellaneous improvements
#416
Conversation
index
and internal/external storage, + miscellaneous improvementsindex
and internal/external storage, + miscellaneous improvements
@luizirber @bluegenes your reviews (you can skim the PR description ;)) would be much appreciated. Don't merge yet, we need to get #408 and #390 in first, and then I plan to cut a new release v0.9.7 as soon as this is merged. |
did you change |
typo! |
…water into update_docs
…water into update_docs
(fixed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Tackling #409 and documenting RocksDB
index
internal/external storage per #390.Items from #409:
index
command to TOCmultisearch
threshold documentationmanysketch
fromfile formatting discussionmanysketch
multithreading discussionThis PR also updates the docs to:
manysearch
column presence/absence as adjusted in MRG: add pretty printing option to manysearch command #408Fixes #409.
Here are some of the bigger/more confusing changes, for reviewers to evaluate and help improve ;) --
multisearch threshold discussion:
manysearch output discussion
Internal vs external storage of sketches in a RocksDB index
(The below applies to v0.9.7 and later of the plugin; for v0.9.6 and
before, only external storage was implemented.)
RocksDB indexes support containment queries (a la the
branchwater application),
as well as
gather
-style mixture decomposition (seeIrber et al., 2022).
For this plugin, the
manysearch
command supports a RocksDB index forthe database for containment queries, and
multifastgather
can use aRocksDB index for the database of genomes.
RocksDB indexes contain references to the sketches used to construct
the index. If
--internal-storage
is set (which is the default), acopy of the sketches is stored within the RocksDB database directory;
if
--no-internal-storage
is provided, then the references point tothe original source sketches used to construct the database, wherever
they reside on your disk.
The sketches are not used by
manysearch
, but are used bymultifastgather
: with v0.9.6 and later, you'll get an error if yourun
multifastgather
against a RocksDB index where the sketchescannot be loaded.
What this means is therefore a bit complicated, but boils down to
the following two approaches:
storage (the default). This will consume more disk space but your
RocksDB database will always be usable for both
manysearch
andmultifastgather
, as well as the branchwater app.then specify
--no-internal-storage
and provide a stable absolutepath to the source sketches. This will again support both
manysearch
andmultifastgather
, as well as the branchwater app.If the source sketches later become unavailable,
multifastgather
will stop working (although
manysearch
and the branchwater appshould be fine).