Non-Zarr Plugin project (seeking scope advice/recs) #220
Replies: 2 comments 3 replies
-
Mainly thinking about file metadata, going to continue pondering how that might interact with kerchunk Hmm, each of those formats has different native ways of encoding their metadata? In that case they probably should have different, but related paths. To keep those with the same
ImportError# plugin.py
class FileMetadata(Plugin):
...
@hookimpl
def dataset_router(self, deps: Dependencies):
router = APIRouter(prefix=self.dataset_router_prefix, tags=self.dataset_router_tags)
try:
@router.get("/netcdf")
def netcdf_medata(dataset = Depends(deps.dataset)):
import netcdf4
...
except ImportError:
pass
...
return router Mini plugin systemThe more flexible way would be to make your own plugins using the same entry point system that Xpublish plugins use. I've experimented with that a bit in xpublish-edr to make it possible for others to provide different output formats, but a similar thing could be done for new routes. # plugin.py
class FileMetadata(Plugin):
...
@hookimpl
def dataset_router(self, deps: Dependencies):
router = APIRouter(prefix=self.dataset_router_prefix, tags=self.dataset_router_tags)
for entrypoint in pkg_resources.iter_entry_points("xpublish_file_metadata"):
try:
route_fn = entry_point.load()
route_fn(router, deps)
except ImportError:
pass
...
return router Then in a 'sub-plugin' ('xpublish_file_metadata_netcdf') # netcdf_metadata.py
def netcdf_routes(router: APIRouter, deps: Dependencies):
@router.get("/netcdf")
def metadata(dataset: Depends(deps.dataset)):
... # pyproject.toml
[project.entry-points.xpublish_file_metadata]
netcdf = "xpublish_file_metadata_netcdf.netcdf_metadata:netcdf_routes |
Beta Was this translation helpful? Give feedback.
-
So I sort of changed the behavior I was aiming for. My new idea is the following:
To hide attributes, one can instantiate the plugin class and pass it directly to xpublish. Basically, you can either pass a list of attribute names to hide regardless of which file format is being read, or you can provide a dictionary. A key difference is that the file format-specific metadata grabbing functions are NOT routers. Rather they are functions with a signature that is checked to match a typing.Protocol. The routes are only defined at the plugin, and the underlying function is called via the identified file format. One can pip install using the "extras" syntax, or if you install with dev group it includes all the optional dependencies. Poetry is still the packaging |
Beta Was this translation helpful? Give feedback.
-
Hi all.
I am in the process of writing a simple dataset router plugin for NetCDF source files. Essentially, it would check if the referenced
xarray.Dataset.encoding['source']
is a .nc file, and if it is it would useNetCDF4
to provide file metadata not necessarily available via xarray attributes. The user could choose to hide certain attributes for security reasons if they pleased.This originally was going to be paired with a
kerchunk
powered dataset provider where paths to NetCDF files can initialize a server, and JSON can be used to customize chunking schemes for whichever the use case may be.That said, after exploring kerchunk further I realized it works just as well with other compressed formats (HDF5, GRIB, GeoTIFF). Additionally, each of those formats have Python libraries for reading their file metadata as well.
My question is about plugin scope, what would be a desirable grouping of these capabilities? Originally I was thinking the NetCDF provider and router plugins could be included in a single
xpublish-netcdf
package, but now that seems silly as a catalog could point to multiple file types and having a/netcdf
,/grib
,/tiff
etc paths in the documentation would be confusing given that only one would actually work for a given dataset.I began thinking a
xpublish-file-metadata
plugin could handle the dataset router plugin being agnostic to file type, and a separatexpublish-kerchunk
package could house the file type agnostic dataset provider functionality. I still like this idea, but a drawback is redundant dependencies. For example, the package would need thegrib
,hdf5
, etc. libraries installed even if the user is only dealing with a single file type.Any thoughts on organizing these functionalities?
Beta Was this translation helpful? Give feedback.
All reactions