Welcome to the EDAM Developers Guide. It contains best-practice guidelines for the technical processes of EDAM development; modifying EDAM files on GitHub, creation of releases, deprecation of concepts etc.
If you're not sure how to do something please ask [email protected]. You'll need to subscribe to the list first.
Note
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119:
- "MUST", "REQUIRED" or "SHALL" mean that the guideline is an absolute requirement of the specification.
- "MUST NOT" or "SHALL NOT" mean that the guideline is an absolute prohibition of the specification.
- "SHOULD" or "RECOMMENDED" mean that there may exist valid reasons in particular circumstances to ignore a particular guideline, but the full implications must be understood and carefully weighed before doing so.
- "SHOULD NOT" or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when acting contrary to the geuideline is acceptable or even useful, but the full implications should be understood and the case carefully weighed before doing so.
- "MAY" or "OPTIONAL" mean that the guideline is truly optional; you can choose to follow it or not.
- As much as you can, try to make atomic changes and commit them independently. this improves greatly traceability in the long term
- Make trivial modifications using a text editor if possible, rather than Protege, because the actual modification is not hidden in haystack of Protege reformattings
- Include an informative message when you commit a change and (ideally) add a description of your modifications in the changelog.
- Check and double-check your changes: errors can be hard to track and fix later
When adding new concepts, you MUST specify the following:
Attribute | OWL attribute | Note |
---|---|---|
Concept URI | rdf:about |
In the right namespace and with the latest numerical ID. |
Primary term | rdfs:label |
See Editors Guide. |
Definition | oboInOwl:hasDefinition |
See Editors Guide. |
Parent(s) | rdfs:subClassOf |
Immediate parent(s) of the concept (normally one only)." |
Version | created_in |
Current EDAM dev version, e.g. 1.21 . |
Type subset | oboInOwl:inSubset |
One of concrete or placeholder , see Technical details. |
EDAM subset | oboInOwl:inSubset |
Always edam . |
Branch subset | oboInOwl:inSubset |
One of topic , data , format or operation . |
Next ID | <next_id> |
Increment the current count by 1. |
For Format additions you MUST also specify:
Attribute | OWL attribute | Note |
---|---|---|
Type of data | <is_format_of> |
URI of EDAM Data concept. |
Specification | <documentation> |
URL of formal (machine-readable) specification. See Editors Guide. |
Basic type | rdfs:subClassOf |
One of Textual format, Binary format, etc.. See Technical details. |
Type of data | rdfs:subClassOf |
Some child of Format (by type of data). See Technical details. |
For Identifier additions you MUST also specify:
Attribute | OWL attribute | Note |
---|---|---|
Type of data | <is_identifier_of> |
URI of EDAM Data concept. |
Basic type | rdfs:subClassOf |
One of Accession or Name. See Technical details. |
Type of data | rdfs:subClassOf |
Some child of Identifier (hybrid). See Technical details. |
When adding new concepts, you SHOULD specify the following:
Attribute | OWL attribute | Note |
---|---|---|
Exact synonym | oboInOwl:hasExactSynonym |
See Technical details. |
Narrow synonym [1] | oboInOwl:hasNarrowSynonym |
See Technical details. |
Broad synonym [1] | oboInOwl:hasBroadSynonym |
See Technical details. |
Comment | rdfs:comment |
See Editors Guide. |
Wikipedia | <documentation> |
URL of Wikipedia page. |
Usage guideline | <notRecommendedForAnnotation> |
Set to `true` for placeholder concepts. |
[1] narrowSynonym and broadSynonym MUST NOT be specified on EDAM Format concepts.
For Operation additions you MAY also specify:
Attribute | OWL attribute | Note |
---|---|---|
Top-level operation | rdfs:subClassOf |
One of the Tier 1 operations (see technical docs) unless this already subsumed adequately by the parent. |
For Format additions you SHOULD also specify:
Attribute | OWL attribute | Note |
---|---|---|
Documentation | <documentation> |
URL of documentation about the format. |
Publication | <documentation> |
DOI of publication about the format. |
File extension [1,2] | <file_extension> |
File extension without period character), one extension / <file_extension> annotation. Must contain lowercase alphanumeric characters only. |
Media type | <media_type> |
media type (MIME type) if available. |
Example | <example> |
Link to example of the format, if available. |
Information standard | <information_standard> |
Link to relevant information standard which the format supports. |
Ontology used | <ontology_used> |
Link to an ontology used by this format (one link per <ontology_used> annotation. |
Governing organisation | <organisation> |
Link to an organisation that formally governs the format, one link link per <organisation> annotation. |
[1] File extension values MUST be in lowercase.
[2] If a file extension is specified, then this MUST also be given either as "exact synonyms" or as the concept "preferred label" (term). Exact synonyms MAY include variants both with and without full stop (period) character, e.g. both .txt
and txt
.
For Identifier additions you SHOULD also specify:
Attribute | OWL attribute | Note |
---|---|---|
Regexp | <regex> |
Regular expression pattern for identifier instances. |
Documentation | <documentation> |
URL of documentation about the identifier. |
The following rules maintain the integrity of the conceptual hierarchy and ensure a consistent level of conceptual granularity. See Technical details for definition of concrete and placeholder concepts.
All subontologies
- leaf nodes MUST be concrete concepts
Topic:
- MUST have a path to root of 4 levels deep maximum
- MUST NOT have a path to root exceeding 5 levels deep
Operation:
- MUST ensure placeholders appear in Tiers 1 and 2 (usually) and 3 (rarely - in exceptional cases) only
- MUST NOT chain more than 3 placeholders
- MUST NOT chain more than 3 concrete operations
Data:
- MUST NOT chain more than 2 placeholders
- MUST NOT chain more then 2 concrete data concepts
- MUST ensure placeholders occur in Tier 1 (usually) and 2 (rarely) only
Identifier:
MUST NOT chain more than 4 placeholders
MUST NOT chain more than 2 concrete identifiers
MUST be related (via is_identifier_of) to a Data concept, but MUST NOT duplicate this annotation if it's already stated on an ancestor concept.
concrete identifiers MUST descend (via
subClassOf
relations) from:- Accession or Name and
- Identifier (typed) (or its kids)
but MUST NOT duplicate these relations if already stated on an ancestor concept.
Additionally, concrete identifier re-used for data objects of fundamentally different types (typically served from a single database) MUST descend from:
- "Identifier (hybrid)" (http://edamontology.org/data_2109) may also be given.
Format:
- MUST NOT chain more than 4 placeholders
- MUST be related (via is_format_of) to a Data concept, but SHOULD NOT duplicate this annotation if it's already stated on an ancestor concept.
- MUST descend (via
subClassOf
) concrete formats from Textual format, Tabular format, Binary format, XML, HTML, JSON, RDF format or YAML, but you MUST NOT duplicate this ancestry in format variants. For example FASTA-like (text) is defined as a child of Textual format, but the kids of FASTA-like (text) format are not. - MUST descend (via
subClassOf
) concrete formats from Format (by type of data) (or it's kids), but again, you MUST NOT duplicate this ancestry in format variants. For example FASTA-like (text) is defined as a child of Sequence record format -> FASTA-like, but the kids of FASTA-like (text) format are not. - MUST NOT add new placeholder concepts (kids of Format (by type of data)) unless there is a corresponding concrete data format descending from it.
If you add a concept which you expect to remain a leaf node, i.e. EDAM will not include finer-grained concepts, then - if other well-developed ontologies exist that serve this conceptual niche - you SHOULD annotate this junction (see `todo <>`_).
When deprecating concepts, you MUST (unless otherwise stated) specify the following:
Attribute | OWL attribute | Note |
---|---|---|
EDAM version | obsolete_since |
Current version e.g. 1.21 |
Subset | oboInOwl:inSubset |
Set this to obsolete (pick the value) |
Deprecation flag | owl:deprecated |
Type the value of true |
Replacement concept [1] | oboInOwl:replacedBy |
The alternative 'replacement' concept to firmly use. Pick one. |
Replacement concept [1] | oboInOwl:consider |
Replacement concept when less certain. Pick one. |
Old parent | oldParent |
Specify the URI(s) of the erstwhile parent(s) of the now-deprecated concept (using one or more attributes as needed). |
Comment [2] | deprecation_comment |
Optional comment as to why the concept is deprecated. |
New parent | rdfs:subClassOf |
Set the parent concept to be ObsoleteClass |
[1] One of replacedBy
or consider
MUST be specified.
[2] deprecation_comment
is OPTIONAL
Also:
- MUST remove all other class annotations (subsets, comments, synonyms etc.) and axioms (including parent concepts), apart from
rdfs:definition
(which MUST be preserved) andrdfs:comment
(which MAY be preserved). - MUST refactor all references (e.g.
SubClassOf
) to the concept being deprectated from other concepts (you can see these using Protege) - SHOULD preserve comments and synonyms, as new annotations either in the old parent(s), or the replacement(s) of the deprecated concept, as appropriate.
You MAY specify the following on concepts which are candidates for deprecation:
Attribute | OWL attribute | Note |
---|---|---|
Candidate for deprecation | is_deprecation_candidate |
Set this to true |
Note
You can see all references to a concept in Protege in the "Class Usage" window; each reference will need updating in turn: in case of very many such references, this can be easier to do globally in a text editor rather than Protege.
Protege is a nice OWL Editor, but has it's quirks, so it's recommended you first get a crash course from the EDAM Developers before using it. A commercial alternative is TopBraid Composer.
Important
When editing EDAM using Protege:
- URLs should be entered using the Protege IRI editor.
- general text is entered using the Protege 'Constant" editor.
- subsets (
oboInOwl:inSubset
annotation): you must pick (don't type!) an appropriate value. - when saving the file be sure to use File...Save as and select "RDF/XML" format.
Don't deviate from the above advice. The EDAM CI (and other) systems rely upon EDAM being saved in RDF/XML format, following the patterns specified.
Before committing changes, to ensure logical consistency of EDAM, please do the following within Protege:
- Click Reasoner->Hermit
- Click Reasoner->Start reasoner (it may take a few seconds)
- In the Entities tab, select the Class hierarchy (inferred) tab
- Select the nothing branch
If nothing (no classes) are shown under the nothing branch, then all is well. If one or more classes are shown, then there is a logical inconsistency which must be fixed. You might see lots of classes, but usually the problem is in one or a few classes.
Common problems include:
- classes assigned as a
subClass
of some deprecated concept - end-point of relations are in the wrong branch, e.g. class has_topic some operation. These can easily occur if you use the Class expression editor in Protege to define such axioms: this is NOT EDAM namespace-aware, and in cases where a concept with the same primary label exists in both classes, can easily pick the wrong one.
The problems are easily fixed within Protege: ask on the mailing list if you're not sure how.
Caution!
Do not be tempted to click Reasoner->Synchronise reasoner between changes: it tends to hang Protege. Instead, use Reasoner->Stop reasoner than Reasoner->Start reasoner.
EDAM Developers can edit the main repository. The workflow is:
Get the "editing token"
- contact [email protected] and claim the "editing token" after first checking that it is not currently taken :)
- say briefly what you are doing, why, and about how long it will take
Update your local repo with the latest files from the GitHub main branch:
git pull
(or "Synch" from the Desktop client)If you've not already done so, you will first need to clone the repo:
git clone https://github.com/edamontology/edamontology.git
(or "Clone" from the Desktop client)Configure Git hooks by running the following from your
edamontology
git directorygit config core.hooksPath .githooks
Git hooks are scripts defined in https://github.com/edamontology/edamontology/tree/main/.githooks. They currently detect and prevent (at pre-commit stage) commits of EDAM_dev.owl which are not in RDF/XML format.
Make and commit your local changes. You must be working with the "dev" version,
EDAM_dev.owl
.check your changes and that the OWL file looks good in Protege
ensure the
next_id
attribute is updatedensure that
oboOther:date
is updated to the current GMT/BST before the commitadd the edited file to the commit
git add <filepath>
Commit your local changes, including a concise but complete summary of the major changes:
git commit -m ¡±commit message here¡±
Push your changes to GitHub (main branch):
git push origin
Release the editing token for the other developers:
- contact [email protected] and release the "editing token"
- summarise what you actually did and why
Important
Please provide a meaningful report on changes so that we can easily generate the ChangeLog upon next release
- in the Git commit message, including the GitHub issue number of any issues addressed (use
fix #xxx
syntax, see GitHub docs) - directly in the changelog.md
We aiim to follow a bi-monthly release cycle to this schedule:
- First Wed of every month
- EDAM team skype to discuss plans for this month. Announcement (to edam-announcence) including short summary of plans, invitation for suggestions.
- Last Mon of every month
- Announcement (to edam-announcence) saying that release is immiment, invitation for last-minute suggestions.
- Last Wed of every month
- Complete the work for the release. Make the release. Ensure it works in BioPortal, OLS, AgroPortal and in bio.tools.
- Last Fri of every month
- Announcee the release, incuding summary of changes.
Note
Releases have been mosty quarterly but more regular (bi-monthly or even monthly) remains the aspiration. Please help out move faster by getting involved.
Before creating a new release, please make sure you have the approval of leader of EDAM-dev, and that the changelog.md and changelog-detailed.md files are up-to-date with the changes of the new release. See Editing the ChangeLog below. Once you're clear to go, do the following:
fix any known bugs in EDAM: at the very least, the EDAM build tests should pass as indicated by:
update your local version of the repository:
git pull
(or "Synch" in desktop client)assuming you are releasing version n+1, n being the current version:
- you initially have
EDAM_dev.owl
in the repository - make sure to update
oboOther:date
in this file - copy the file
EDAM_dev.owl
toreleases/EDAM.owl
andreleases/EDAM_n+1.owl
cp EDAM\_dev.owl releases/EDAM.owl
cp EDAM\_dev.owl releases/EDAM_n+1.owl
git add releases/EDAM\_n+1.owl
- modify the
doap:version
property to n+1 inreleases/EDAM_n+1.owl
and to n+2_dev inEDAM_dev.owl
- you initially have
commit and push your changes
git commit -a
(or "Commit to main" in the desktop client)git push origin
(or "Synch" in the desktop client)
update the detailed changelog by running Bubastis to compare the release against the previous version.
update the changelog with a summary of the major changes.
create the release on GitHub (use the _draft a new release_ button of the _releases_ tab).
- from the main page of the EDAM repository, click
Releases
. - click
Draft a new release
- enter the version number e.g.
1.24
in theTag version
box - enter a title e.g.
EDAM 1.24 release
- check the
This is a pre-release
box if applicable - paste an excerpt from
changelog.md
into
- from the main page of the EDAM repository, click
submit this new release to BioPortal. OLS will pull the file automatically from edamontology.org every night.
download the
EDAM.csv
file from BioPortal and copy this to https://github.com/edamontology/edamontology/tree/main/releasescreate a tsv equivalent of
EDAM.csv
(e.g. by hacking in a text editor) and copy the resulting ``EDAM.tsv``file to https://github.com/edamontology/edamontology/tree/main/releasesclose GitHub issues labelled done - staged for release.
create the next milestone tag in GitHub, e.g. "1.25"
review any GitHub issues tagged for the release milestone which we're not acted upon; remove the milestone and (if applicable) tag them with the next milestone tag
confirm everything is working in bio.tools by mailing bio.tools Lead Curator.
let the developers of the EDAM browser know a new release is available by posting here
Update the content of https://github.com/edamontology/edamontology.org/blob/main/page.html (add a line linking to the download of the latest release)
ensure http://edamontology.org is updated
announce the new release on Twitter and mailing lists ([email protected], [email protected]) including thanks and a summary of changes.
help applications that implement EDAM to update to the new version.
The ChangeLog includes:
- changelog - a summary of the major changes and what motivated them
- detailed changelog - fine-grained details obtained using Bubastis
The changelog should include:
- (as 1st paragraph) an "executive summary" suitable for consumption by technical managers, describing the motivation for major changes, including e.g. requests at recent hackathons, requests via GitHub, strategic directions etc.
- summary of changes distilled from the output of Bubastis (see below).
- summary of GitHub commit messages. please ensure meaningful commit messages are provided on every commit
- Some hacking of bubastis output is needed to identify (at least):
- number of new concepts
- number of deprecations
- summary of activity, i.e. in which branches was most work focucssed ?
Every modification on the ontology pushed to GitHub triggers an automated test in Travis CI. It checks:
a few rules using the edamxpathvalidator tool.
the consistency of the ontology by running the Hermit reasoner automatically.
The Travis-CI website shows you the current status here. The fact that the continuous integration task succeeds does not guarantee there are no remaining bugs, but a failure means that you must take action to correct the problem, either fix it, fix the
edamxpathvalidator
program, or ask the mailing list if you're unsure.
GitHub makes it possible for any developer to make modifications in a copy of EDAM and suggest these modifications are included in the original. Please note that we discourage using this mechanism for large modifications made using Protege, because merging OWL files which have been reformatted by Protege is notoriously unreliable (see "Best practices for edition" below).
The workflow is:
- Fork the edamontology repository in your own account.
- Make the modifications you want to suggest for inclusion in EDAM in this forked repository.
- Open pull requests for each modification you make.
Please make sure to:
- Keep your forked repository synchronized with the core repository, to avoid inconsistencies.
- Make sure to follow the "Best practices for edition" below.