This repository contains the specifications of the standard format defined by the bioimage.io community for the content (i.e., models, datasets and applications) in the bioimage.io website.
Each item in the content is always described using a YAML 1.2 file named rdf.yaml
or bioimageio.yaml
.
This rdf.yaml
\ bioimageio.yaml
--- along with the files referenced in it --- can be downloaded from or uploaded to the bioimage.io website and may be produced or consumed by bioimage.io-compatible consumers (e.g., image analysis software like ilastik).
These are the rules and format that bioimage.io-compatible resources must fulfill.
Note that the Python package PyYAML does not support YAML 1.2 . We therefore use and recommend ruyaml. For differences see https://ruamelyaml.readthedocs.io/en/latest/pyyaml.
Please also note that the best way to check whether your rdf.yaml
file is bioimage.io-compliant is to call bioimageio.core.validate
from the bioimageio.core Python package.
The bioimageio.core Python package also provides the bioimageio command line interface (CLI) with the validate
command:
bioimageio validate path/to/your/rdf.yaml
All bioimage.io description formats are defined as Pydantic models.
Type | Format Version | Documentation1 | Developer Documentation2 |
---|---|---|---|
model | 0.5 0.4 |
model 0.5 model 0.4 |
ModelDescr_v0_5 ModelDescr_v0_4 |
dataset | 0.3 0.2 |
dataset 0.3 dataset 0.2 |
DatasetDescr_v0_3 DatasetDescr_v0_2 |
notebook | 0.3 0.2 |
notebook 0.3 notebook 0.2 |
NotebookDescr_v0_3 NotebookDescr_v0_2 |
application | 0.3 0.2 |
application 0.3 application 0.2 |
ApplicationDescr_v0_3 ApplicationDescr_v0_2 |
generic | 0.3 0.2 |
- | GenericDescr_v0_3 GenericDescr_v0_2 |
Simplified descriptions are available as JSON Schema (generated with Pydantic):
bioimageio.spec version | JSON Schema | documentation1 |
---|---|---|
latest | bioimageio_schema_latest.json | latest documentation |
0.5 | bioimageio_schema_v0-5.json | 0.5 documentation |
Note: bioimageio_schema_v0-5.json and bioimageio_schema_latest.json are identical, but bioimageio_schema_latest.json will eventually refer to the future bioimageio_schema_v0-6.json
.
A flattened view of the types used by the spec that also shows values constraints.
You can also generate these docs locally by running PYTHONPATH=./scripts python -m interactive_docs
We provide some examples for using rdf.yaml files to describe models, applications, notebooks and datasets, and an example notebook to programmatically access the models, applications, notebooks and datasets descriptions.
- Due to the limitations of storage services such as Zenodo, which does not support subfolders, it is recommended to place other files in the same directory level of the
rdf.yaml
file and try to avoid using subdirectories. - Use the bioimageio.core Python package to validate your
rdf.yaml
file. - bioimageio.spec keeps evolving. Try to use and upgrade to the most current format version!
The bioimageio CLI has moved entirely to bioimageio.core.
bioimageio.spec can be installed with either conda
or pip
.
We recommend installing bioimageio.core
instead to get access to the Python programmatic features available in the BioImage.IO community:
conda install -c conda-forge bioimageio.core
or
pip install -U bioimageio.core
Still, for a lighter package or just testing, you can install the bioimageio.spec
package solely:
conda install -c conda-forge bioimageio.spec
or
pip install -U bioimageio.spec
TODO: link to settings in dev docs
Made with contrib.rocks.
To keep the bioimageio.spec Python package version in sync with the (model) description format version, bioimageio.spec is versioned as MAJOR.MINRO.PATCH.LIB, where MAJOR.MINRO.PATCH correspond to the latest model description format version implemented and LIB may be bumpbed for library changes that do not affect the format version. This change was introduced with bioimageio.spec 0.5.3.1.
- update conda environments (remove
cpuonly
from pytorch envs)
- fix URL validation (checking with actual http requests was erroneously skipped)
- fix loading tifffile in python 3.8 (pin tifffile)
- use default tensorflow environments for Keras H5 weights
- support loading and saving from/to zipfile.ZipFile objects
- fix bug when packaging with weights priority order (#638)
- add conda_env module providing helper to create recommended conda environments for model descriptions
- fix summary formatting
- improve logged origin for logged messages
- make the
model.v0_5.ModelDescr.training_data
field aleft_to_right
Union to avoid warnings - the deprecated
version_number
is no longer appended to theid
, but instead set asversion
if noversion
is specified.
- expose
progressbar
to customize display of download progress - expose
get_resource_package_content
- prefer
rdf.yaml
overbioimageio.yaml
(namebioimageio.yaml
filerdf.yaml
file when packaging, look forrdf.yaml
first, etc.) - enforce: (generic 0.3/model 0.5 spec) documentation source file encoding has to be UTF-8.
- bugfix: allow optional pre- and postprocessing to be missing in an RDF (before it required an empty dict).
- bugfix "reset known files if root changes" (#619)
note: the versioning scheme was changed as our previous post
releases include changes beyond what a post release should entail (only changing docstrings, etc).
This was motivated by the desire to keep the library version in sync with the (model) format version to avoid confusion.
To keep this relation, but avoid overbearing post releases a library version number is now added as the 4th part MAJOR.MINOR.PATCH.LIB_VERSION.
- add
load_model_description
andload_dataset_description
- add
ensure_description_is_model
andensure_description_is_dataset
- expose
perform_io_checks
andknown_files
fromValidationContext
toload_description
andload_description_and_validate_format_only
- fix pinning of pydantic
- update resolving of bioimage.io resource IDs
- fix SHA-256 value when resolving a RDF version from the bioimage.io collection that is not the latest
- bump patch version during loading for model 0.5.x
- improve validation error formatting
- validate URLs first with a head request, if forbidden, follow up with a get request that is streamed and if that is also forbidden a regular get request.
RelativePath.absolute()
is now a method (not a property) analog topathlib.Path
- remove collection description
- update SPDX license list
- update generic description to 0.3.1
- update model description to 0.5.3
- add timeout argument to all requests.get calls
- added more information to validation summary
- deprioritize
Path
objects in theFileSource
union
- resolve backup DOIs
- fix resolving relative file paths given as strings
- allow to bypass download and hashing of known files
- avoid full download when validating urls
- resolve version (un)specific collection IDs, e.g.
load_description('affable-shark')
,load_description('affable-shark/1')
- fix model packaging with weights format priority
- new patch version model 0.5.2
- new patch version model 0.5.1
- don't fail if CI env var is a string
- fix
_internal.io_utils.identify_bioimageio_yaml_file()
- new description formats: generic 0.3, application 0.3, collection 0.3, dataset 0.3, notebook 0.3 and model 0.5.
- various API changes, most important functions:
bioimageio.spec.load_description
(replacesload_raw_resource_description
, interface changed)bioimageio.spec.validate_format
(new)bioimageio.spec.dump_description
(replacesserialize_raw_resource_description_to_dict
, interface changed)bioimageio.spec.update_format
(interface changed)
- switch from Marshmallow to Pydantic
- extended validation
- one joint, more precise JSON Schema
- small bugixes
- better type hints
- improved tests
- add
axes
andeps
toscale_mean_var
- add simple forward compatibility by treating future format versions as latest known (for the respective resource type)
-
Make CLI output more readable
-
find redirected URLs when checking for URL availability
-
Improve error message for non-existing RDF file path given as string
-
Improve documentation for model description's
documentation
field
- fix enrich_partial_rdf_with_imjoy_plugin (see #452)
- fix rdf_update of entries in
resolve_collection_entries()
- pass root to
enrich_partial_rdf
arg ofresolve_collection_entries()
- keep
ResourceDescrption.root_path
as URI for remote resources. This fixes the collection description as the collection entries are resolved after the collection description has been loaded.
- new bioimageio.spec.partner module adding validate-partner-collection command if optional 'lxml' dependency is available
-
new env var
BIOIMAGEIO_CACHE_WARNINGS_LIMIT
(default: 3) to avoid spam from cache hit warnings -
more robust conversion of ImportableSourceFile for absolute paths to relative paths (don't fail on non-path source file)
- resolve symlinks when transforming absolute to relative paths during serialization; see #438
- fix loading of collection description with id (id used to be ignored)
- support loading bioimageio resources by their animal nickname (currently only models have nicknames).
-
any field previously expecting a local relative path is now also accepting an absolute path
-
load_raw_resource_description returns a raw resource description which has no relative paths (any relative paths are converted to absolute paths).
- add command
commands.update_rdf()
/update-rdf
(cli)
- fix unresolved ImportableSourceFile
- fix collection description conversion for type field
-
fix to shape validation for model description 0.4: output shape now needs to be bigger than halo
-
moved objects from bioimageio.spec.shared.utils to bioimageio.spec.shared[.node_transformer]
-
additional keys to validation summary: bioimageio_spec_version, status
- fixes to generic description:
- ignore value of field
root_path
if present in yaml. This field is used internally and always present in RDF nodes.
- ignore value of field
- fixes to collection description:
- RDFs specified directly in collection description are validated correctly even if their source field does not point to an RDF.
- nesting of collection description allowed
-
fixed missing field
icon
in generic description's raw node -
fixes to collection description:
- RDFs specified directly in collection description are validated correctly
- no nesting of collection description allowed for now
links
is no longer an explicit collection entry field ("moved" to unknown)
- new model spec 0.3.5 and 0.4.1
load_raw_resource_description
no longer acceptsupdate_to_current_format
kwarg (useupdate_to_format
instead)
load_raw_resource_description
acceptsupdate_to_format
kwarg
- Non-breaking changes
- remove
version_number
in favor of usingversion
- remove
- Non-breaking changes
- added
concatenable
flag to index, time and space input axes
- added
- Non-breaking changes
- added
DataDependentSize
foroutputs.i.size
to specify an output shape that is not known before inference is run. - added optional
inputs.i.optional
field to indicate that a tensor may beNone
- made data type assumptions in
preprocessing
andpostprocessing
explicit by adding'ensure_dtype'
operations per default. - allow to specify multiple thresholds (along an
axis
) in a 'binarize' processing step
- added
- Breaking canges that are fully auto-convertible
- dropped
download_url
- dropped non-file attachments
attachments.files
moved toattachments.i.source
- dropped
- Non-breaking changes
- added optional
parent
field
- added optional
all generic 0.3.0 changes (except models already have the parent
field) plus:
- Breaking changes that are partially auto-convertible
inputs.i.axes
are now defined in more detail (same foroutputs.i.axes
)inputs.i.shape
moved per axes toinputs.i.axes.size
(same foroutputs.i.shape
)- new pre-/postprocessing 'fixed_zero_mean_unit_variance' separated from 'zero_mean_unit_variance', where
mode=fixed
is no longer valid. (for scalar values this is auto-convertible.)
- Breaking changes that are fully auto-convertible
- changes in
weights.pytorch_state_dict.architecture
- renamed
weights.pytorch_state_dict.architecture.source_file
to...architecture.source
- renamed
- changes in
weights.pytorch_state_dict.dependencies
- only conda environment allowed and specified by
weights.pytorch_state_dict.dependencies.source
- new optional field
weights.pytorch_state_dict.dependencies.sha256
- only conda environment allowed and specified by
- changes in
weights.tensorflow_model_bundle.dependencies
- same as changes in
weights.pytorch_state_dict.dependencies
- same as changes in
- moved
test_inputs
toinputs.i.test_tensor
- moved
test_outputs
tooutputs.i.test_tensor
- moved
sample_inputs
toinputs.i.sample_tensor
- moved
sample_outputs
tooutputs.i.sample_tensor
- renamed
inputs.i.name
toinputs.i.id
- renamed
outputs.i.name
tooutputs.i.id
- renamed
inputs.i.preprocessing.name
toinputs.i.preprocessing.id
- renamed
outputs.i.postprocessing.name
tooutputs.i.postprocessing.id
- changes in
- Non-breaking changes:
- new pre-/postprocessing:
id
='ensure_dtype' with kwargdtype
- new pre-/postprocessing:
- Breaking changes that are fully auto-convertible
id
overwritten with value fromconfig.bioimageio.nickname
if available
- Non-breaking changes
version_number
is a new, optional field indicating that an RDF is the nth published version with a givenid
id_emoji
is a new, optional field (set fromconfig.bioimageio.nickname_icon
if available)uploader
is a new, optional field withemail
and an optionalname
subfields
- Non-breaking changes
- make pre-/postprocessing kwargs
mode
andaxes
always optional for model description 0.3 and 0.4
- make pre-/postprocessing kwargs
- Non-breaking changes
cite
field is now optional
- Breaking changes that are fully auto-convertible
- name field may not include '/' or '' (conversion removes these)
- Non-breaking changes
- Implicit output shape can be expanded by inserting
null
intoshape:scale
and indicating length of new dimension D in theoffset
field. Keep in mind thatD=2*'offset'
.
- Implicit output shape can be expanded by inserting
- Breaking changes that are fully auto-convertible
parent
field changed to hold a string that is a bioimage.io ID, a URL or a local relative path (and not subfieldsuri
andsha256
)
- Non-breaking changes
- new optional field
training_data
- new optional field
- Non-breaking changes
- explicitly define and document dataset description (for now, clone of generic description with type="dataset")
- Non-breaking changes
- add optional field
download_url
- add optional field
dependencies
to all weight formats (not only pytorch_state_dict) - add optional
pytorch_version
to the pytorch_state_dict and torchscript weight formats
- add optional field
- Bug fixes:
- in a
pytorch_state_dict
weight entryarchitecture
is no longer optional.
- in a
-
Non-breaking changes
- make
authors
,cite
,documentation
andtags
optional
- make
-
Breaking changes that are fully auto-convertible
- Simplifies collection description 0.2.1 by merging resource type fields together to a
collection
field, holindg a list of all resources in the specified collection.
- Simplifies collection description 0.2.1 by merging resource type fields together to a
- Non-breaking changes
rdf_source
new optional fieldid
new optional field
- First official release, extends generic description with fields
application
,model
,dataset
,notebook
and (nested)collection
, which hold lists linking to respective resources.
- Non-breaking changes
- add optional
email
andgithub_user
fields to entries inauthors
- add optional
maintainers
field (entries like inauthors
butgithub_user
is required (andname
is not))
- add optional
-
Breaking changes that are fully auto-convertible
- moved field
dependencies
toweights:pytorch_state_dict:dependencies
- moved field
-
Non-breaking changes
documentation
field accepts URLs as well
- Non-breaking changes
documentation
field accepts URLs as well
-
Breaking changes
- model inputs and outputs may not use duplicated names.
- model field
sha256
is required ifpytorch_state_dict
weights are defined. and is now moved to thepytroch_state_dict
entry asarchitecture_sha256
.
-
Breaking changes that are fully auto-convertible
- model fields language and framework are removed.
- model field
source
is renamedarchitecture
and is moved together withkwargs
to thepytorch_state_dict
weights entry (if it exists, otherwise they are removed). - the weight format
pytorch_script
was renamed totorchscript
.
-
Other changes
- model inputs (like outputs) may be defined by
scale
ing andoffset
ing areference_tensor
- a
maintainers
field was added to the model description. - the entries in the
authors
field may now additionally containemail
orgithub_user
. - the summary returned by the
validate
command now also contains a list of warnings. - an
update_format
command was added to aid with updating older RDFs by applying auto-conversion.
- model inputs (like outputs) may be defined by
- Non-breaking changes
- Add optional parameter
eps
toscale_range
postprocessing.
- Add optional parameter
- Breaking changes that are fully auto-convertible
reference_input
for implicit output tensor shape was renamed toreference_tensor
-
Breaking changes
- The RDF file name in a package should be
rdf.yaml
for all the RDF (notmodel.yaml
); - Change
authors
andpackaged_by
fields from List[str] to List[Author] with Author consisting of a dictionary{name: '<Full name>', affiliation: '<Affiliation>', orcid: 'optional orcid id'}
; - Add a mandatory
type
field to comply with the generic description. Only valid value is 'model' for model description; - Only allow
license
identifier from the SPDX license list;
- The RDF file name in a package should be
-
Non-breaking changes
- Add optional
version
field (default 0.1.0) to keep track of model changes; - Allow the values in the
attachments
list to be any values besides URI;
- Add optional
Footnotes
-
JSON Schema based documentation generated with json-schema-for-humans. ↩ ↩2
-
Part of the bioimageio.spec package documentation generated with pdoc. ↩