Skip to content

Commit

Permalink
[MRG] provide an initial plugin architecture for sourmash that suppor…
Browse files Browse the repository at this point in the history
…ts new signature saving & loading mechanisms (#2428)

Implement support for `load_from` and `save_to` plugins via
`importlib.metadata` entry points.

This supports a few of the plugins suggested in
#1353

I am nominating this as an experimental feature that is not under
semantic versioning/not public yet.

Documentation page [here, in
dev_plugins.html](https://sourmash--2428.org.readthedocs.build/en/2428/dev_plugins.html).

A template repo for new plugins is at
https://github.com/sourmash-bio/sourmash_plugin_template.

## Implementation/this PR

This PR refactors the `_load_database` loading and
`SaveSignaturesToLocation` saving code to build a prioritized list of
functions to try in order, and then adds hooks in via the new
`sourmash.plugins` module that insert additional loading/saving
functions into that list.

This PR also moves the current saving/loading functions out of
`sourmash.sourmash_args` into the `sourmash.save_load` submodule, and
simplifies the code a bit.

## Example plugins:

- read JSON sigs and manifests from URLs:
https://github.com/sourmash-bio/sourmash_plugin_load_urls
- read and write signatures in Apache Avro:
https://github.com/sourmash-bio/sourmash_plugin_avro - use extension
`.avrosig` to write.

Specific TODOs:
- [x] provide a minimal "getting started" template repo
- [x] add tests for multiple plugins & priorities
- [ ] maybe try writing CSV export/import as a plugin?
#1098

For later:
- think about other kinds of plugins - new CLI entry points, picklist
classes, tax loading, tax structure, ??.
- work on getting avro support into rust over in
luizirber/2021-02-11-sourmash-binary-format#1
  • Loading branch information
ctb authored Jan 7, 2023
1 parent 079a2ba commit 14d79c9
Show file tree
Hide file tree
Showing 12 changed files with 992 additions and 508 deletions.
75 changes: 75 additions & 0 deletions doc/dev_plugins.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# sourmash plugins via Python entry points

As of version 4.7.0, sourmash has experimental support for Python
plugins to load and save signatures in different ways (e.g. file
formats, RPC servers, databases, etc.). This support is provided via
the "entry points" mechanism supplied by
[`importlib.metadata`](https://docs.python.org/3/library/importlib.metadata.html)
and documented
[here](https://setuptools.pypa.io/en/latest/userguide/entry_point.html).

```{note}
Note: The plugin API is _not_ finalized or subject to semantic
versioning just yet! Please subscribe to
[sourmash#1353](https://github.com/sourmash-bio/sourmash/issues/1353)
if you want to keep up to date on plugin support.
```

You can define entry points in the `pyproject.toml` file
like so:

```
[project.entry-points."sourmash.load_from"]
a_reader = "module_name:load_sketches"
[project.entry-points."sourmash.save_to"]
a_writer = "module_name:SaveSignatures_WriteFile"
```

Here, `module_name` should be the name of the module to import.
`load_sketches` should be a function that takes a location along with
arbitrary keyword arguments and returns an `Index` object
(e.g. `LinearIndex` for a collection of in-memory
signatures). `SaveSignatures_WriteFile` should be a class that
subclasses `BaseSave_SignaturesToLocation` and implements its own
mechanisms of saving signatures. See the `sourmash.save_load` module
for saving and loading code already used in sourmash.

Note that if the function or class has a `priority` attribute, this will
be used to determine the order in which the plugins are called.

The `name` attribute of the plugin (`a_reader` and `a_writer` in
`pyproject.toml`, above) is only used in debugging.

## Templates and examples

If you want to create your own plug-in, you can start with the
[sourmash_plugin_template](https://github.com/sourmash-bio/sourmash_plugin_template) repo.

Some (early stage) plugins are also available as examples:

* [sourmash-bio/sourmash_plugin_load_urls](https://github.com/sourmash-bio/sourmash_plugin_load_urls) - load signatures and CSV manifests via [fsspec](https://filesystem-spec.readthedocs.io/).
* [sourmash-bio/sourmash_plugin_avro](https://github.com/sourmash-bio/sourmash_plugin_avro) - use [Apache Avro](https://avro.apache.org/) as a serialization format.

## Debugging plugins

`sourmash sig cat <input sig> -o <output sig>` is a simple way to
invoke a `save_to` plugin. Use `-d` to turn on debugging output.

`sourmash sig describe <input location>` is a simple way to invoke
a `load_from` plugin. Use `-d` to turn on debugging output.

## Semantic versioning and listing sourmash as a dependency

Plugins should probably list sourmash as a dependency for installation.

Once plugins are officially supported by sourmash, the plugin API will
be under [semantic versioning constraints](https://semver.org/). That
means that you should constrain plugins to depend on sourmash only up
to the next major version, e.g. sourmash v5.

Specifically, we suggest placing something like:
```
dependencies = ['sourmash>=4.8.0,<5']
```
in your `pyproject.toml` file.
8 changes: 7 additions & 1 deletion doc/developer.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
```{contents} Contents
:depth: 3
```

# Developer information

## Development environment
Expand Down Expand Up @@ -280,7 +284,7 @@ Some installation issues can be solved by simply removing the intermediate build
make clean
```

## Contents
## Additional developer-focused documents

```{toctree}
:maxdepth: 2
Expand All @@ -289,4 +293,6 @@ release
requirements
storage
release-notes/releases
dev_plugins
```

4 changes: 4 additions & 0 deletions src/sourmash/cli/sig/cat.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ def subparser(subparsers):
'-q', '--quiet', action='store_true',
help='suppress non-error output'
)
subparser.add_argument(
'-d', '--debug', action='store_true',
help='provide debugging output'
)
subparser.add_argument(
'-o', '--output', metavar='FILE', default='-',
help='output signature to this file (default stdout)'
Expand Down
5 changes: 5 additions & 0 deletions src/sourmash/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ def __init__(self):
SourmashError.__init__(self, "This index format is not supported in this version of sourmash")


class IndexNotLoaded(SourmashError):
def __init__(self, msg):
SourmashError.__init__(self, f"Cannot load sourmash index: {str(msg)}")


def _make_error(error_name, base=SourmashError, code=None):
class Exc(base):
pass
Expand Down
66 changes: 66 additions & 0 deletions src/sourmash/plugins.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
"""
Support for plugins to sourmash via importlib.metadata entrypoints.
Plugin entry point names:
* 'sourmash.load_from' - Index class loading.
* 'sourmash.save_to' - Signature saving.
* 'sourmash.picklist_filters' - extended Picklist functionality.
CTB TODO:
* consider using something other than 'name' for loader fn name. Maybe __doc__?
* try implement picklist plugin?
"""

DEFAULT_LOAD_FROM_PRIORITY = 99
DEFAULT_SAVE_TO_PRIORITY = 99

from .logging import debug_literal

# cover for older versions of Python that don't support selection on load
# (the 'group=' below).
from importlib.metadata import entry_points

# load 'load_from' entry points. NOTE: this executes on import of this module.
try:
_plugin_load_from = entry_points(group='sourmash.load_from')
except TypeError:
from importlib_metadata import entry_points
_plugin_load_from = entry_points(group='sourmash.load_from')

# load 'save_to' entry points as well.
_plugin_save_to = entry_points(group='sourmash.save_to')


def get_load_from_functions():
"Load the 'load_from' plugins and yield tuples (priority, name, fn)."
debug_literal(f"load_from plugins: {_plugin_load_from}")

# Load each plugin,
for plugin in _plugin_load_from:
loader_fn = plugin.load()

# get 'priority' if it is available
priority = getattr(loader_fn, 'priority', DEFAULT_LOAD_FROM_PRIORITY)

# retrieve name (which is specified by plugin?)
name = plugin.name
debug_literal(f"plugins.load_from_functions: got '{name}', priority={priority}")
yield priority, name, loader_fn


def get_save_to_functions():
"Load the 'save_to' plugins and yield tuples (priority, fn)."
debug_literal(f"save_to plugins: {_plugin_save_to}")

# Load each plugin,
for plugin in _plugin_save_to:
save_cls = plugin.load()

# get 'priority' if it is available
priority = getattr(save_cls, 'priority', DEFAULT_SAVE_TO_PRIORITY)

# retrieve name (which is specified by plugin?)
name = plugin.name
debug_literal(f"plugins.save_to_functions: got '{name}', priority={priority}")
yield priority, save_cls
Loading

0 comments on commit 14d79c9

Please sign in to comment.