Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pluggy for plugin management #146

Merged
merged 21 commits into from
Feb 1, 2023
Merged

Conversation

abkfenris
Copy link
Member

@abkfenris abkfenris commented Jan 15, 2023

Pluggy is the core of py.test's ability to be extended, and it also is the plugin manager for Tox, Datasette, Jupyter-FPS, Napari, and Conda, among others.

In Xpublish a set of hooks is defined that plugins can implement, and a pluggy.PluginManager (as xpublish.Rest.pm) proxies requests to the plugins that have implemented the hooks and aggregates the results.

Hooks define a set of possible kwargs that can be passed to downstream implementations, but not all implementations need to implement all of them (which makes it easy to add new kwargs without disrupting existing implementations). Hooks can also be defined as only returning the first response, or wrapping other hooks (dataset middleware?).

So far I've defined a handful of hooks in xpublish.plugin.hooks:PluginSpec:

  • app_router()
  • dataset_router()
  • get_datasets()
  • get_dataset(dataset_id: str)
  • register_hookspec() - Which allows plugins to register new hook types

get_datasets and get_dataset allow plugins to provide datasets without loading them on launch, or overriding Rest._get_dataset_fn or Rest.setup_datasets().

I've kept the kwargs relatively minimal right now on the hooks as it's easier to expand the kwargs later, than it is to reduce them.

Other potential hooks include middleware, tabular or datatree datasets, a more end user focused UI.

I've additionally refactored the single dataset usage into it's own class xpublish.SingleDatasetRest to simplify some of the conditional logic which the accessor uses.

Example dataset provider plugin:

class TestDatasetPlugin(xpublish.Plugin):
    name = "test-dataset-provider"

    @xpublish.hookimpl
    def get_datasets(self):
        return ["ww3"]

    @xpublish.hookimpl
    def get_dataset(self, dataset_id: str):
        if dataset_id == "ww3":
            return xr.open_dataset("../restful-grids/datasets/ww3_72_east_coast_2022041112.nc")
        
        return None

Pluggy references:

Replaces #145

benbovy and others added 14 commits August 6, 2021 23:09
Builds on top of @benbovy 's work in building router factories in xpublish-community#89 to build a plugin system.

The plugin system uses entry points, which are most commonly used for console or GUI scripts. The entry_point group is `xpublish.plugin` Right now plugins can provide dataset specific and general (app) routes, with default prefixes and tags for both.

Xpublish will by default load plugins via the entry point. Additionally, plugins can also be loaded directly via the init, as well as being disabled, or configured. The existing dataset router pattern also still works, so that folks aren't forced into using plugins

Entry point reference:
- https://setuptools.pypa.io/en/latest/userguide/entry_point.html
- https://packaging.python.org/en/latest/specifications/entry-points/
- https://amir.rachum.com/amp/blog/2017/07/28/python-entry-points.html
# Conflicts:
#	.github/workflows/main.yaml
#	.pre-commit-config.yaml
#	setup.py
Another variation on xpublish-community#140 with a few of the ideas from the discussion there and xpublish-community#139.

Plugin routers are now nested under a parent `Plugin` class which now acts as a way to combine multiple related pieces of functionality together (say db management routes and a CLI). This allows new plugin functionality to be added in other plugins or Xpublish related libraries without requiring the parent `Plugin` class to define everything.

Plugins are loaded from the `xpublish.plugin` entrypoint group. Plugins can be manually configured via the `plugins` argument to `xpublish.Rest`. The specifics of plugin loading can be changed by overriding the `.setup_plugins()` method.

Some other `xpublish.Rest` functionality has been refactored out into separate methods to allow easier overriding for instance making a `SingleDatasetRest` class that will allow simplifying `xpublish.Rest`.

The `ds.rest` accessor has been move out into it's own file.
Pluggy is the core of py.test's ability to be extended, and it also is the plugin manager for Tox, Datasette, Jupyter-FPS, and Conda, among others.

In Xpublish a set of hooks is defined that plugins can implement, and a `pluggy.PluginManager` (as `xpublish.Rest.pm`) proxies requests to the plugins that have implemented the hooks and aggregates the results. Hooks define a set of possible kwargs that can be passed to downstream implementations, but not all implementations need to implement all of them (which makes it easy to add new kwargs without disrupting existing implementations). Hooks can also be defined as only returning the first response, or wrapping other hooks (dataset middleware?).

So far I've defined a handful of hooks in `xpublish.plugin.hooks:PluginSpec`:
- app_router()
- dataset_router()
- get_datasets()
- get_dataset(dataset_id: str)
- register_hookspec() - Which allows plugins to register new hook types

`get_datasets` and `get_dataset` allow plugins to provide datasets without loading them on launch, or overriding  `Rest._get_dataset_fn` or `Rest.setup_datasets()`.

I've kept the kwargs relatively minimal right now on the hooks as it's easier to expand the kwargs later, than it is to reduce them.

I've additionally refactored the single dataset usage into it's own class `xpublish.SingleDatasetRest` to simplify some of the conditional logic which the accessor uses.

Pluggy references:
- https://pluggy.readthedocs.io/en/stable/
- https://docs.pytest.org/en/latest/how-to/writing_plugins.html
- https://docs.datasette.io/en/latest/writing_plugins.html
- https://docs.conda.io/projects/conda/en/latest/dev-guide/plugins/index.html#
This was referenced Jan 15, 2023
abkfenris added a commit to xpublish-community/xpublish-edr that referenced this pull request Jan 16, 2023
abkfenris added a commit to xpublish-community/xpublish-opendap that referenced this pull request Jan 16, 2023
@abkfenris
Copy link
Member Author

PRs for the routers that I've made to work as pluggy based plugins:

@abkfenris abkfenris marked this pull request as ready for review January 19, 2023 13:15
@abkfenris
Copy link
Member Author

@jhamman @benbovy any chance you could a look at this plugin system built on pluggy?

@abkfenris
Copy link
Member Author

abkfenris commented Jan 22, 2023

Other hooks that might be worth adding to Xpublish itself:

Hooks to delegate to plugins:

@abkfenris
Copy link
Member Author

After some pondering (powered by Future Island and a long commute in the dark), it seems like setting dependencies at the Plugin level is probably the wrong move.

It makes it harder to override (say get_dataset if an app_router uses dataset_routers from other plugins, which is what I'm pondering on doing for forecasts) in some cases, without affecting how those dependencies are used in every case of a plugin. I believe dependencies set up this way also aren't usable from non routers.

When I get a chance, I'll explore passing in dependencies as arguments to app_router and dataset_router methods. I think that by passing them in as arguments, they may also be usable to the other plugin hook methods.

@abkfenris
Copy link
Member Author

In a conversation with @mpiannucci I got to pondering how Dask fits into the plugin equation.

As far as I understand, xarray finds any dask client that's been initialized. If that's the case, it might be possible to add a new general Plugin.setup() hook, and an xpublish-dask plugin could create a client during setup. Then dataset loading plugins can implicitly use the Dask cluster.

If xarray can't take advantage of a dask client that's was initialized in another plugin due to namespace issues, then we could use a more explicit Plugin.dask_client() hook. That can be called and saved on the Rest instance, and provided to any get_dataset() hooks, for a signature of get_dataset(id: str, client: dask.Client | None). Dataset loading plugins then could explicitly use the client as needed (and do things like smartly choosing chunks).

I think this could help take care of #54 in that the specifics of a Dask cluster are determined by the specifics of the loaded plugin. Hence we could have xpublish-dask-local-cluster, xpublish-dask-coiled, and the like to help configure different clusters.

Refactored dependency injection into plugin routers so that dependencies can be overridden when routers are called, rather than when plugins are instantiated. This makes it so that routers can be reused and adapted by other plugins.
Comment on lines -276 to +294
# "dataset_id" parameter should be absent in all paths
assert len(openapi_schema['paths']['/']['get']['parameters']) == 0

# test cached value
openapi_schema = airtemp_rest.app.openapi()
assert len(openapi_schema['paths']['/']['get']['parameters']) == 0
with pytest.raises(KeyError):
# "dataset_id" parameter should be absent in all paths
# parameters is no longer generated when plugins use passed in deps
# and get_dataset is replaced
assert len(openapi_schema['paths']['/']['get']['parameters']) == 0

with pytest.raises(KeyError):
# test cached value
# parameters is no longer generated when plugins use passed in deps
# and get_dataset is replaced
openapi_schema = airtemp_rest.app.openapi()
assert len(openapi_schema['paths']['/']['get']['parameters']) == 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change in dependency loading for built in routes (via plugins), get_dataset doesn't take an argument for single datasets, and thus the parameters key doesn't get added to the openapi spec.

It still would get added (and need to be overridden) for any non-plugin routers, so the modified spec generator is still needed for single datasets.

Copy link
Contributor

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @abkfenris!!! Apologies for such a delayed review.

A few questions below but they are mostly about just getting me caught up on what you (and @benbovy) have done here.

"""Plugin extension points"""

@hookspec
def app_router(self, deps: Dependencies) -> APIRouter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my learning, all these methods are optional right? I.e. a plugin could implement one or more of these hooks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, they are all optional. I'm thinking I might explore mix-ins for common plugin patterns, in addition to a whole lot of documentation around plugins and how the whole ecosystem can work.

@@ -0,0 +1,75 @@
from typing import List
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question about file naming. Do we need the /included/ part of the file path for the default plugins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wavering between /default/ and /included/ for the plugins that come with xpublish. I did want to keep them in the /plugins/ directory, but I wanted to differentiate them from the plugin machinery. I initially had /included_plugins/ or similar at the same level as /plugins/ but that seemed like overkill and more confusing.

I'm also thinking that as part of documenting things, I'll probably make a few demo plugins that won't be automatically loaded via entrypoints, but can show off the different parts of the functionality (for example an dataset plugin that provides the xarray tutorial datasets), so I was thinking those would end up in xpublish/plugins/demo/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit overkill at this point but I'm fine with it.

xpublish/plugins/manage.py Outdated Show resolved Hide resolved
xpublish/plugins/manage.py Outdated Show resolved Hide resolved
@abkfenris
Copy link
Member Author

Ok, @jhamman as soon as you are happy, this is ready for you to merge.

@jhamman jhamman merged commit b2972bc into xpublish-community:main Feb 1, 2023
@jhamman
Copy link
Contributor

jhamman commented Feb 1, 2023

Super excited by this @abkfenris!

@benbovy
Copy link
Contributor

benbovy commented Feb 1, 2023

Well done @abkfenris, this is great! Sorry I didn't take the chance to look at it but happy to see it merged. I'll open new issues if I have any comment.

@abkfenris
Copy link
Member Author

abkfenris commented Feb 1, 2023

Thanks @jhamman and @benbovy

@benbovy there is definitely more work to be done around plugins (see comment above), so there will be further iteration on them that I'd love feedback on.

I'm hoping to work on the docs soon to at least give a rough intro to plugins, and explain the vision behind them/where Xpublish is going, so I'd love some feedback on that.

@benbovy benbovy mentioned this pull request Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants