Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entry point plugins #140

Closed

Conversation

abkfenris
Copy link
Member

Builds on top of @benbovy's work in building router factories in #89 to build a plugin system, to try to implement some of my thoughts in #139

The plugin system uses entry points, which are most commonly used for console or GUI scripts. The entry_point group is xpublish.plugin. Right now plugins can provide dataset specific and general (app) routes, with default prefixes and tags for both.

Xpublish will by default load plugins via the entry point. Additionally, plugins can also be loaded directly via the init, as well as being disabled, or configured. The existing dataset router pattern also still works, so that folks aren't forced into using plugins as the only way to extend functionality.

It runs against the existing test suite, but I haven't implemented any new tests or docs yet.

Entry point reference:

benbovy and others added 9 commits August 6, 2021 23:09
Builds on top of @benbovy 's work in building router factories in xpublish-community#89 to build a plugin system.

The plugin system uses entry points, which are most commonly used for console or GUI scripts. The entry_point group is `xpublish.plugin` Right now plugins can provide dataset specific and general (app) routes, with default prefixes and tags for both.

Xpublish will by default load plugins via the entry point. Additionally, plugins can also be loaded directly via the init, as well as being disabled, or configured. The existing dataset router pattern also still works, so that folks aren't forced into using plugins

Entry point reference:
- https://setuptools.pypa.io/en/latest/userguide/entry_point.html
- https://packaging.python.org/en/latest/specifications/entry-points/
- https://amir.rachum.com/amp/blog/2017/07/28/python-entry-points.html
# Conflicts:
#	.github/workflows/main.yaml
#	.pre-commit-config.yaml
#	setup.py
@abkfenris
Copy link
Member Author

abkfenris commented Dec 11, 2022

xpublish-community/xpublish-edr#8 Is an example of how a plugin can register itself and routes via entry points.

@benbovy
Copy link
Contributor

benbovy commented Dec 13, 2022

This is great @abkfenris!

I'm wondering whether we should keep or deprecate routers. It is kind of duplicate of plugins and it is pretty easy to create small plugins.

@abkfenris
Copy link
Member Author

Thanks @benbovy!

Right now I'm leaving towards leaving routers in. While routers and plugins have a lot of overlap in my PR, I bet we will continue to grow the amount of things that plugins can do increasing their complexity.

With routers a new user can go from 0-something really cool in very few lines of code, and without much additional understanding. I think that helps sell the story of why to use xpublish.

From there to building a plugin is a much smaller jump in understanding than getting someone to build a plugin in the first place. 'what's this register_routes() method I need to define, and where does self.dataset_router come from?'

What do you think about renaming routers? We could use dataset_routers or similar to help standardize the API and clarify where the routers are acting.

Copy link
Contributor

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great. Just one minor comment that I could take or leave.

Comment on lines +131 to +134
if plugins is not None:
self._plugins = plugins
else:
self.load_plugins(exclude_plugins=exclude_plugin_names, plugin_configs=plugin_configs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we take the simpler approach of either using the provided plugins or loading all the entrypoints for now:

if plugins is None:
    for entry_point in entry_points()['xpublish.plugin']:
        self._plugins[entry_point.name] = entry_point.load()
else:
    self._plugins = plugins

Then you could probably remove much of the login in load_plugins/find_plugins/etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with a simpler approach, but then I tried to put myself in the position of a bunch of different user cases:

  • 'I'm happy to use all the defaults'
  • 'I built my own plugin! How do I add that!'
  • 'My org has said that I can't use X due to inane security reasons..., how can I disable just that and keep the rest'
  • 'I like most of what comes by default, but I need to pass a little configuration to X plugin'

I think the differences and reasons why each is useful would probably be made clear with some docs, tests, and examples.

rest = xpublish.Rest(
    datasets,
    routers=[
        (dap_router, {"tags": ["opendap"], "prefix": "/opendap"}),  # manually adding a router that isn't available as a plugin
    ],
    # plugins={}, # don't override plugins, we want to largely load what's available automatically
    plugin_configs={
        # "cf_edr": {
        #     "dataset_router_prefix": "/cf_edr_test_override"  # passing config info into a plugin loaded via entry points
        # }
    },
    extend_plugins={
        "cf_edr": CfEdrPlugin(dataset_router_prefix="/cf_edr_extended") # adding a new plugin, or overriding one that was loaded via entry points
    },
    exclude_plugin_names=["module_version"]  # don't share my dirty laundry
)

Also my naming of things isn't the clearest in the flurry of hacking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find all those use cases relevant and useful! However, the API is still a bit confusing to me. I'd suggest put it out of Rest. Also maybe accept both simple fastapi routers and plugins via a single argument (I agree it nice to keep a simple way to define dataset API routes).

A rough sketch:

T_Router = Union[XpublishPluginFactory, tuple[fastapi.APIRouter, dict[str, Any]]]


def load_default_routers(
    exclude: iterable[str] | None = None,
) -> dict[str, T_Router]:
    """return a dict of router plugins loaded from entrypoints."""
    return ...


class Rest:

    def __init__(
        self,
        datasets: Mapping[str, xr.Dataset],
        routers: Mapping[str, T_Router] | None = None,
    ):
        if routers is None:
            routers = load_default_routers()
        
        ...

#
# Usage examples
#

# use all routers but "module_version"
routers = load_default_routers(exclude=["module_version"])

# re-configure the "cf_edr" (entrypoint) router
routers["cf_edr"].configure(dataset_router_prefix="/cf_edr_test_override")

# or replace it completely
routers["cf_edr"] = CfEdrPlugin(dataset_router_prefix="/cf_edr_extended")

# add a simple router
routers["dap"] = (dap_router, {"tags": ["opendap"], "prefix": "/opendap"})

rest = Rest({...}, routers=routers)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't use X due to inane security reasons

(idea for later: we could maybe add a specific tag for plugins so that they are automatically selected or excluded in production vs. stagging vs. debug deployment when calling load_default_routers).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think combining plugins and (dataset) routers might be a bit confusing, especially since plugins may eventually be more than just routers, how about something in between?

Routers and plugins as init kwargs, and some helpers for default plugins?

def load_default_plugins(
    exclude: iterable[str] | None = None,
) -> dict[str, XpublishPluginFactory]:
    """return a dict of plugins loaded from entrypoints."""
    return ...


class Rest:

    def __init__(
        self,
        datasets: Mapping[str, xr.Dataset],
        dataset_routers: Iterable[tuple[fastapi.APIRouter, dict[str, Any]]] | None = None,
        plugins: Mapping[str, XpublishPluginFactory] | None = None
    ):
        if plugins is None:
            plugins = load_default_plugins()
        
        ...

#
# Usage examples
#

# use all routers but "module_version"
plugins = load_default_plugins(exclude=["module_version"])

# re-configure the "cf_edr" (entrypoint) router
plugins["cf_edr"].configure(dataset_router_prefix="/cf_edr_test_override")

# or replace it completely
plugins["cf_edr"] = CfEdrPlugin(dataset_router_prefix="/cf_edr_extended")

# add a simple router
routers = [(dap_router, {"tags": ["opendap"], "prefix": "/opendap"})]

rest = Rest({...}, dataset_routers=routers, plugins=plugins)

Probably would wrap the helper in a method, so it's easier to override just that in a subclass.

I can't use X due to inane security reasons

(idea for later: we could maybe add a specific tag for plugins so that they are automatically selected or excluded in production vs. stagging vs. debug deployment when calling load_default_routers).

Yes, but at what point are we pulling the classic Flask move of reinventing Django?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but at what point are we pulling the classic Flask move of reinventing Django?

There's a long way before even approaching that point, but yeah we probably down want to go deep down that way. (just put some random thoughts for later).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think combining plugins and (dataset) routers might be a bit confusing, especially since plugins may eventually be more than just routers, how about something in between?

Hmm I'm not sure for what else could be reused the plugin base class and entry point added in this PR. Do you have some examples in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so I was thinking that this was only the start of a plugin interface, not the end point.

Some ideas:

  • Dataset loaders
  • Django debug style toolbar
  • Dataset transforming middleware
  • CLI tools

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm wouldn't be better to have separate base classes for each kind of extension instead of trying to put everything in a unique plugin base class? Mixing everything via a unique class / entry point seems a bit messy to me.

In fact, I rather see an Xpublish plugin more at the level of a repository / package, which may provide all sorts of extensions (routers, middlewares, cli-tools, etc.) ?

Similarly to an Xarray extension package that may provide Dataset/DataArray accessors, custom index subclasses, backend engines, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pondering this as I was trying to think of the entry point group, but I think as soon as we try to split different functionality up into different types of plugins, we'll immediately have use cases that require multiple types to work in conjunction.

I had another example in mind, but I've forgotten it, but at the moment what I can think of is a database backed dataset catalog. It may need to define it's own top level routes, override get_dataset and similar dependencies, and probably have CLI commands for managing the database structure. While those could possibly be three different plugins that get the same config.

What do you think about having a single top level Plugin class, and then nested specialty classes? (maybe use pydantic.BaseModel to help make it easier to set nested config)

class XpublishPlugin(BaseModel):
    app_router: Router | None
    dataset_router: Router | None
    dependency_overrides: ...
    cli: ...
    dataset_middleware: ...

class Extension(BaseModel):
    plugin: XpublishPlugin

class Router(Extension):
    default_prefix: str
    ...

I think in most cases we would see 1 plugin for 1 package, but there are definitely cases where a package might expose multiple plugins.

For example from the OGC EDR one that I've made, it might register a plugin that uses CF conventions by default, but have a variant that could be used if all the datasets aren't CF compliant and require a bit more configuration.

@abkfenris
Copy link
Member Author

I had gotten a start on a new draft with some of things we discussed here, but then life happened and got in the way. Hopefully I can get it cleaned up and shareable this weekend.

abkfenris added a commit to abkfenris/xpublish that referenced this pull request Jan 8, 2023
Another variation on xpublish-community#140 with a few of the ideas from the discussion there and xpublish-community#139.

Plugin routers are now nested under a parent `Plugin` class which now acts as a way to combine multiple related pieces of functionality together (say db management routes and a CLI). This allows new plugin functionality to be added in other plugins or Xpublish related libraries without requiring the parent `Plugin` class to define everything.

Plugins are loaded from the `xpublish.plugin` entrypoint group. Plugins can be manually configured via the `plugins` argument to `xpublish.Rest`. The specifics of plugin loading can be changed by overriding the `.setup_plugins()` method.

Some other `xpublish.Rest` functionality has been refactored out into separate methods to allow easier overriding for instance making a `SingleDatasetRest` class that will allow simplifying `xpublish.Rest`.

The `ds.rest` accessor has been move out into it's own file.
@abkfenris
Copy link
Member Author

Continuing plugin exploration in #145

@abkfenris
Copy link
Member Author

Using Pluggy for plugin management in #146 to reduce the amount of things we need to invent.

jhamman pushed a commit that referenced this pull request Feb 1, 2023
* add XpublishFactory base class

* use factories for base and zarr routers

* tests: drop py36 support, add py39

* Add entry point based plugins

Builds on top of @benbovy 's work in building router factories in #89 to build a plugin system.

The plugin system uses entry points, which are most commonly used for console or GUI scripts. The entry_point group is `xpublish.plugin` Right now plugins can provide dataset specific and general (app) routes, with default prefixes and tags for both.

Xpublish will by default load plugins via the entry point. Additionally, plugins can also be loaded directly via the init, as well as being disabled, or configured. The existing dataset router pattern also still works, so that folks aren't forced into using plugins

Entry point reference:
- https://setuptools.pypa.io/en/latest/userguide/entry_point.html
- https://packaging.python.org/en/latest/specifications/entry-points/
- https://amir.rachum.com/amp/blog/2017/07/28/python-entry-points.html

* Test plugin system against existing test suite

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean up unused imports

* Extendable plugins

Another variation on #140 with a few of the ideas from the discussion there and #139.

Plugin routers are now nested under a parent `Plugin` class which now acts as a way to combine multiple related pieces of functionality together (say db management routes and a CLI). This allows new plugin functionality to be added in other plugins or Xpublish related libraries without requiring the parent `Plugin` class to define everything.

Plugins are loaded from the `xpublish.plugin` entrypoint group. Plugins can be manually configured via the `plugins` argument to `xpublish.Rest`. The specifics of plugin loading can be changed by overriding the `.setup_plugins()` method.

Some other `xpublish.Rest` functionality has been refactored out into separate methods to allow easier overriding for instance making a `SingleDatasetRest` class that will allow simplifying `xpublish.Rest`.

The `ds.rest` accessor has been move out into it's own file.

* Use `typing.Dict` for Python 3.8 compatibility

* More typing fixes for 3.8

* Refactor single dataset support into it's own class

* Use pluggy for plugin management

Pluggy is the core of py.test's ability to be extended, and it also is the plugin manager for Tox, Datasette, Jupyter-FPS, and Conda, among others.

In Xpublish a set of hooks is defined that plugins can implement, and a `pluggy.PluginManager` (as `xpublish.Rest.pm`) proxies requests to the plugins that have implemented the hooks and aggregates the results. Hooks define a set of possible kwargs that can be passed to downstream implementations, but not all implementations need to implement all of them (which makes it easy to add new kwargs without disrupting existing implementations). Hooks can also be defined as only returning the first response, or wrapping other hooks (dataset middleware?).

So far I've defined a handful of hooks in `xpublish.plugin.hooks:PluginSpec`:
- app_router()
- dataset_router()
- get_datasets()
- get_dataset(dataset_id: str)
- register_hookspec() - Which allows plugins to register new hook types

`get_datasets` and `get_dataset` allow plugins to provide datasets without loading them on launch, or overriding  `Rest._get_dataset_fn` or `Rest.setup_datasets()`.

I've kept the kwargs relatively minimal right now on the hooks as it's easier to expand the kwargs later, than it is to reduce them.

I've additionally refactored the single dataset usage into it's own class `xpublish.SingleDatasetRest` to simplify some of the conditional logic which the accessor uses.

Pluggy references:
- https://pluggy.readthedocs.io/en/stable/
- https://docs.pytest.org/en/latest/how-to/writing_plugins.html
- https://docs.datasette.io/en/latest/writing_plugins.html
- https://docs.conda.io/projects/conda/en/latest/dev-guide/plugins/index.html#

* Move included plugins to plugins.included

* Remove commented code, clarify a few methods

* Allow late registered plugins to add new hooks and routers

* Refactor dependency injection into plugin routers

Refactored dependency injection into plugin routers so that dependencies can be overridden when routers are called, rather than when plugins are instantiated. This makes it so that routers can be reused and adapted by other plugins.

* Clean up and tighten plugin typing

* Test plugins and plugin management

---------

Co-authored-by: Benoit Bovy <[email protected]>
Co-authored-by: Joe Hamman <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@abkfenris
Copy link
Member Author

Plugins are now implemented via #146!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants