-
Notifications
You must be signed in to change notification settings - Fork 79
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[MRG] provide an initial plugin architecture for sourmash that suppor…
…ts new signature saving & loading mechanisms (#2428) Implement support for `load_from` and `save_to` plugins via `importlib.metadata` entry points. This supports a few of the plugins suggested in #1353 I am nominating this as an experimental feature that is not under semantic versioning/not public yet. Documentation page [here, in dev_plugins.html](https://sourmash--2428.org.readthedocs.build/en/2428/dev_plugins.html). A template repo for new plugins is at https://github.com/sourmash-bio/sourmash_plugin_template. ## Implementation/this PR This PR refactors the `_load_database` loading and `SaveSignaturesToLocation` saving code to build a prioritized list of functions to try in order, and then adds hooks in via the new `sourmash.plugins` module that insert additional loading/saving functions into that list. This PR also moves the current saving/loading functions out of `sourmash.sourmash_args` into the `sourmash.save_load` submodule, and simplifies the code a bit. ## Example plugins: - read JSON sigs and manifests from URLs: https://github.com/sourmash-bio/sourmash_plugin_load_urls - read and write signatures in Apache Avro: https://github.com/sourmash-bio/sourmash_plugin_avro - use extension `.avrosig` to write. Specific TODOs: - [x] provide a minimal "getting started" template repo - [x] add tests for multiple plugins & priorities - [ ] maybe try writing CSV export/import as a plugin? #1098 For later: - think about other kinds of plugins - new CLI entry points, picklist classes, tax loading, tax structure, ??. - work on getting avro support into rust over in luizirber/2021-02-11-sourmash-binary-format#1
- Loading branch information
Showing
12 changed files
with
992 additions
and
508 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# sourmash plugins via Python entry points | ||
|
||
As of version 4.7.0, sourmash has experimental support for Python | ||
plugins to load and save signatures in different ways (e.g. file | ||
formats, RPC servers, databases, etc.). This support is provided via | ||
the "entry points" mechanism supplied by | ||
[`importlib.metadata`](https://docs.python.org/3/library/importlib.metadata.html) | ||
and documented | ||
[here](https://setuptools.pypa.io/en/latest/userguide/entry_point.html). | ||
|
||
```{note} | ||
Note: The plugin API is _not_ finalized or subject to semantic | ||
versioning just yet! Please subscribe to | ||
[sourmash#1353](https://github.com/sourmash-bio/sourmash/issues/1353) | ||
if you want to keep up to date on plugin support. | ||
``` | ||
|
||
You can define entry points in the `pyproject.toml` file | ||
like so: | ||
|
||
``` | ||
[project.entry-points."sourmash.load_from"] | ||
a_reader = "module_name:load_sketches" | ||
[project.entry-points."sourmash.save_to"] | ||
a_writer = "module_name:SaveSignatures_WriteFile" | ||
``` | ||
|
||
Here, `module_name` should be the name of the module to import. | ||
`load_sketches` should be a function that takes a location along with | ||
arbitrary keyword arguments and returns an `Index` object | ||
(e.g. `LinearIndex` for a collection of in-memory | ||
signatures). `SaveSignatures_WriteFile` should be a class that | ||
subclasses `BaseSave_SignaturesToLocation` and implements its own | ||
mechanisms of saving signatures. See the `sourmash.save_load` module | ||
for saving and loading code already used in sourmash. | ||
|
||
Note that if the function or class has a `priority` attribute, this will | ||
be used to determine the order in which the plugins are called. | ||
|
||
The `name` attribute of the plugin (`a_reader` and `a_writer` in | ||
`pyproject.toml`, above) is only used in debugging. | ||
|
||
## Templates and examples | ||
|
||
If you want to create your own plug-in, you can start with the | ||
[sourmash_plugin_template](https://github.com/sourmash-bio/sourmash_plugin_template) repo. | ||
|
||
Some (early stage) plugins are also available as examples: | ||
|
||
* [sourmash-bio/sourmash_plugin_load_urls](https://github.com/sourmash-bio/sourmash_plugin_load_urls) - load signatures and CSV manifests via [fsspec](https://filesystem-spec.readthedocs.io/). | ||
* [sourmash-bio/sourmash_plugin_avro](https://github.com/sourmash-bio/sourmash_plugin_avro) - use [Apache Avro](https://avro.apache.org/) as a serialization format. | ||
|
||
## Debugging plugins | ||
|
||
`sourmash sig cat <input sig> -o <output sig>` is a simple way to | ||
invoke a `save_to` plugin. Use `-d` to turn on debugging output. | ||
|
||
`sourmash sig describe <input location>` is a simple way to invoke | ||
a `load_from` plugin. Use `-d` to turn on debugging output. | ||
|
||
## Semantic versioning and listing sourmash as a dependency | ||
|
||
Plugins should probably list sourmash as a dependency for installation. | ||
|
||
Once plugins are officially supported by sourmash, the plugin API will | ||
be under [semantic versioning constraints](https://semver.org/). That | ||
means that you should constrain plugins to depend on sourmash only up | ||
to the next major version, e.g. sourmash v5. | ||
|
||
Specifically, we suggest placing something like: | ||
``` | ||
dependencies = ['sourmash>=4.8.0,<5'] | ||
``` | ||
in your `pyproject.toml` file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
""" | ||
Support for plugins to sourmash via importlib.metadata entrypoints. | ||
Plugin entry point names: | ||
* 'sourmash.load_from' - Index class loading. | ||
* 'sourmash.save_to' - Signature saving. | ||
* 'sourmash.picklist_filters' - extended Picklist functionality. | ||
CTB TODO: | ||
* consider using something other than 'name' for loader fn name. Maybe __doc__? | ||
* try implement picklist plugin? | ||
""" | ||
|
||
DEFAULT_LOAD_FROM_PRIORITY = 99 | ||
DEFAULT_SAVE_TO_PRIORITY = 99 | ||
|
||
from .logging import debug_literal | ||
|
||
# cover for older versions of Python that don't support selection on load | ||
# (the 'group=' below). | ||
from importlib.metadata import entry_points | ||
|
||
# load 'load_from' entry points. NOTE: this executes on import of this module. | ||
try: | ||
_plugin_load_from = entry_points(group='sourmash.load_from') | ||
except TypeError: | ||
from importlib_metadata import entry_points | ||
_plugin_load_from = entry_points(group='sourmash.load_from') | ||
|
||
# load 'save_to' entry points as well. | ||
_plugin_save_to = entry_points(group='sourmash.save_to') | ||
|
||
|
||
def get_load_from_functions(): | ||
"Load the 'load_from' plugins and yield tuples (priority, name, fn)." | ||
debug_literal(f"load_from plugins: {_plugin_load_from}") | ||
|
||
# Load each plugin, | ||
for plugin in _plugin_load_from: | ||
loader_fn = plugin.load() | ||
|
||
# get 'priority' if it is available | ||
priority = getattr(loader_fn, 'priority', DEFAULT_LOAD_FROM_PRIORITY) | ||
|
||
# retrieve name (which is specified by plugin?) | ||
name = plugin.name | ||
debug_literal(f"plugins.load_from_functions: got '{name}', priority={priority}") | ||
yield priority, name, loader_fn | ||
|
||
|
||
def get_save_to_functions(): | ||
"Load the 'save_to' plugins and yield tuples (priority, fn)." | ||
debug_literal(f"save_to plugins: {_plugin_save_to}") | ||
|
||
# Load each plugin, | ||
for plugin in _plugin_save_to: | ||
save_cls = plugin.load() | ||
|
||
# get 'priority' if it is available | ||
priority = getattr(save_cls, 'priority', DEFAULT_SAVE_TO_PRIORITY) | ||
|
||
# retrieve name (which is specified by plugin?) | ||
name = plugin.name | ||
debug_literal(f"plugins.save_to_functions: got '{name}', priority={priority}") | ||
yield priority, save_cls |
Oops, something went wrong.