Skip to content

Latest commit

 

History

History
435 lines (285 loc) · 25.3 KB

developers_guide.rst

File metadata and controls

435 lines (285 loc) · 25.3 KB

Developers Guide

Welcome to the EDAM Developers Guide. It contains best-practice guidelines for the technical processes of EDAM development; modifying EDAM files on GitHub, creation of releases, deprecation of concepts etc.

If you're not sure how to do something please ask [email protected]. You'll need to subscribe to the list first.

Technical recipes

Note

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119:

  • "MUST", "REQUIRED" or "SHALL" mean that the guideline is an absolute requirement of the specification.
  • "MUST NOT" or "SHALL NOT" mean that the guideline is an absolute prohibition of the specification.
  • "SHOULD" or "RECOMMENDED" mean that there may exist valid reasons in particular circumstances to ignore a particular guideline, but the full implications must be understood and carefully weighed before doing so.
  • "SHOULD NOT" or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when acting contrary to the geuideline is acceptable or even useful, but the full implications should be understood and the case carefully weighed before doing so.
  • "MAY" or "OPTIONAL" mean that the guideline is truly optional; you can choose to follow it or not.

General guidelines

  1. As much as you can, try to make atomic changes and commit them independently. this improves greatly traceability in the long term
  2. Make trivial modifications using a text editor if possible, rather than Protege, because the actual modification is not hidden in haystack of Protege reformattings
  3. Include an informative message when you commit a change and (ideally) add a description of your modifications in the changelog.
  4. Check and double-check your changes: errors can be hard to track and fix later

Adding concepts

Mandatory attributes

When adding new concepts, you MUST specify the following:

Attribute OWL attribute Note
Concept URI rdf:about In the right namespace and with the latest numerical ID.
Primary term rdfs:label See Editors Guide.
Definition oboInOwl:hasDefinition See Editors Guide.
Parent(s) rdfs:subClassOf Immediate parent(s) of the concept (normally one only)."
Version created_in Current EDAM dev version, e.g. 1.21.
Type subset oboInOwl:inSubset One of concrete or placeholder, see Technical details.
EDAM subset oboInOwl:inSubset Always edam.
Branch subset oboInOwl:inSubset One of topic, data, format or operation.
Next ID <next_id> Increment the current count by 1.

For Format additions you MUST also specify:

Attribute OWL attribute Note
Type of data <is_format_of> URI of EDAM Data concept.
Specification <documentation> URL of formal (machine-readable) specification. See Editors Guide.
Basic type rdfs:subClassOf One of Textual format, Binary format, etc.. See Technical details.
Type of data rdfs:subClassOf Some child of Format (by type of data). See Technical details.

For Identifier additions you MUST also specify:

Attribute OWL attribute Note
Type of data <is_identifier_of> URI of EDAM Data concept.
Basic type rdfs:subClassOf One of Accession or Name. See Technical details.
Type of data rdfs:subClassOf Some child of Identifier (hybrid). See Technical details.

Optional attributes

When adding new concepts, you SHOULD specify the following:

Attribute OWL attribute Note
Exact synonym oboInOwl:hasExactSynonym See Technical details.
Narrow synonym [1] oboInOwl:hasNarrowSynonym See Technical details.
Broad synonym [1] oboInOwl:hasBroadSynonym See Technical details.
Comment rdfs:comment See Editors Guide.
Wikipedia <documentation> URL of Wikipedia page.
Usage guideline <notRecommendedForAnnotation> Set to `true` for placeholder concepts.

[1] narrowSynonym and broadSynonym MUST NOT be specified on EDAM Format concepts.

For Operation additions you MAY also specify:

Attribute OWL attribute Note
Top-level operation rdfs:subClassOf One of the Tier 1 operations (see technical docs) unless this already subsumed adequately by the parent.

For Format additions you SHOULD also specify:

Attribute OWL attribute Note
Documentation <documentation> URL of documentation about the format.
Publication <documentation> DOI of publication about the format.
File extension [1,2] <file_extension> File extension without period character), one extension / <file_extension> annotation. Must contain lowercase alphanumeric characters only.
Media type <media_type> media type (MIME type) if available.
Example <example> Link to example of the format, if available.
Information standard <information_standard> Link to relevant information standard which the format supports.
Ontology used <ontology_used> Link to an ontology used by this format (one link per <ontology_used> annotation.
Governing organisation <organisation> Link to an organisation that formally governs the format, one link link per <organisation> annotation.

[1] File extension values MUST be in lowercase. [2] If a file extension is specified, then this MUST also be given either as "exact synonyms" or as the concept "preferred label" (term). Exact synonyms MAY include variants both with and without full stop (period) character, e.g. both .txt and txt.

For Identifier additions you SHOULD also specify:

Attribute OWL attribute Note
Regexp <regex> Regular expression pattern for identifier instances.
Documentation <documentation> URL of documentation about the identifier.

Hierarchy

The following rules maintain the integrity of the conceptual hierarchy and ensure a consistent level of conceptual granularity. See Technical details for definition of concrete and placeholder concepts.

  • All subontologies

    • leaf nodes MUST be concrete concepts
  • Topic:

    • MUST have a path to root of 4 levels deep maximum
    • MUST NOT have a path to root exceeding 5 levels deep
  • Operation:

    • MUST ensure placeholders appear in Tiers 1 and 2 (usually) and 3 (rarely - in exceptional cases) only
    • MUST NOT chain more than 3 placeholders
    • MUST NOT chain more than 3 concrete operations
  • Data:

    • MUST NOT chain more than 2 placeholders
    • MUST NOT chain more then 2 concrete data concepts
    • MUST ensure placeholders occur in Tier 1 (usually) and 2 (rarely) only
  • Identifier:

    • MUST NOT chain more than 4 placeholders

    • MUST NOT chain more than 2 concrete identifiers

    • MUST be related (via is_identifier_of) to a Data concept, but MUST NOT duplicate this annotation if it's already stated on an ancestor concept.

    • concrete identifiers MUST descend (via subClassOf relations) from:

      but MUST NOT duplicate these relations if already stated on an ancestor concept.

      Additionally, concrete identifier re-used for data objects of fundamentally different types (typically served from a single database) MUST descend from:

  • Format:

    • MUST NOT chain more than 4 placeholders
    • MUST be related (via is_format_of) to a Data concept, but SHOULD NOT duplicate this annotation if it's already stated on an ancestor concept.
    • MUST descend (via subClassOf) concrete formats from Textual format, Tabular format, Binary format, XML, HTML, JSON, RDF format or YAML, but you MUST NOT duplicate this ancestry in format variants. For example FASTA-like (text) is defined as a child of Textual format, but the kids of FASTA-like (text) format are not.
    • MUST descend (via subClassOf) concrete formats from Format (by type of data) (or it's kids), but again, you MUST NOT duplicate this ancestry in format variants. For example FASTA-like (text) is defined as a child of Sequence record format -> FASTA-like, but the kids of FASTA-like (text) format are not.
    • MUST NOT add new placeholder concepts (kids of Format (by type of data)) unless there is a corresponding concrete data format descending from it.

If you add a concept which you expect to remain a leaf node, i.e. EDAM will not include finer-grained concepts, then - if other well-developed ontologies exist that serve this conceptual niche - you SHOULD annotate this junction (see `todo <>`_).

Deprecating concepts

When deprecating concepts, you MUST (unless otherwise stated) specify the following:

Attribute OWL attribute Note
EDAM version obsolete_since Current version e.g. 1.21
Subset oboInOwl:inSubset Set this to obsolete (pick the value)
Deprecation flag owl:deprecated Type the value of true
Replacement concept [1] oboInOwl:replacedBy The alternative 'replacement' concept to firmly use. Pick one.
Replacement concept [1] oboInOwl:consider Replacement concept when less certain. Pick one.
Old parent oldParent Specify the URI(s) of the erstwhile parent(s) of the now-deprecated concept (using one or more attributes as needed).
Comment [2] deprecation_comment Optional comment as to why the concept is deprecated.
New parent rdfs:subClassOf Set the parent concept to be ObsoleteClass

[1] One of replacedBy or consider MUST be specified. [2] deprecation_comment is OPTIONAL

Also:

  1. MUST remove all other class annotations (subsets, comments, synonyms etc.) and axioms (including parent concepts), apart from rdfs:definition (which MUST be preserved) and rdfs:comment (which MAY be preserved).
  2. MUST refactor all references (e.g. SubClassOf) to the concept being deprectated from other concepts (you can see these using Protege)
  3. SHOULD preserve comments and synonyms, as new annotations either in the old parent(s), or the replacement(s) of the deprecated concept, as appropriate.

You MAY specify the following on concepts which are candidates for deprecation:

Attribute OWL attribute Note
Candidate for deprecation is_deprecation_candidate Set this to true

Note

You can see all references to a concept in Protege in the "Class Usage" window; each reference will need updating in turn: in case of very many such references, this can be easier to do globally in a text editor rather than Protege.

Use of Protege

Protege is a nice OWL Editor, but has it's quirks, so it's recommended you first get a crash course from the EDAM Developers before using it. A commercial alternative is TopBraid Composer.

Editing

Important

When editing EDAM using Protege:

  • URLs should be entered using the Protege IRI editor.
  • general text is entered using the Protege 'Constant" editor.
  • subsets (oboInOwl:inSubset annotation): you must pick (don't type!) an appropriate value.
  • when saving the file be sure to use File...Save as and select "RDF/XML" format.

Don't deviate from the above advice. The EDAM CI (and other) systems rely upon EDAM being saved in RDF/XML format, following the patterns specified.

Ensuring logical consistency

Before committing changes, to ensure logical consistency of EDAM, please do the following within Protege:

  1. Click Reasoner->Hermit
  2. Click Reasoner->Start reasoner (it may take a few seconds)
  3. In the Entities tab, select the Class hierarchy (inferred) tab
  4. Select the nothing branch

If nothing (no classes) are shown under the nothing branch, then all is well. If one or more classes are shown, then there is a logical inconsistency which must be fixed. You might see lots of classes, but usually the problem is in one or a few classes.

Common problems include:

  • classes assigned as a subClass of some deprecated concept
  • end-point of relations are in the wrong branch, e.g. class has_topic some operation. These can easily occur if you use the Class expression editor in Protege to define such axioms: this is NOT EDAM namespace-aware, and in cases where a concept with the same primary label exists in both classes, can easily pick the wrong one.

The problems are easily fixed within Protege: ask on the mailing list if you're not sure how.

Caution!

Do not be tempted to click Reasoner->Synchronise reasoner between changes: it tends to hang Protege. Instead, use Reasoner->Stop reasoner than Reasoner->Start reasoner.

EDAM release process

Modifying GitHub main repo.

EDAM Developers can edit the main repository. The workflow is:

  1. Get the "editing token"

    • contact [email protected] and claim the "editing token" after first checking that it is not currently taken :)
    • say briefly what you are doing, why, and about how long it will take
  2. Update your local repo with the latest files from the GitHub main branch:

    git pull (or "Synch" from the Desktop client)

    If you've not already done so, you will first need to clone the repo:

    git clone https://github.com/edamontology/edamontology.git (or "Clone" from the Desktop client)

  3. Configure Git hooks by running the following from your edamontology git directory

    git config core.hooksPath .githooks

    Git hooks are scripts defined in https://github.com/edamontology/edamontology/tree/main/.githooks. They currently detect and prevent (at pre-commit stage) commits of EDAM_dev.owl which are not in RDF/XML format.

  4. Make and commit your local changes. You must be working with the "dev" version, EDAM_dev.owl.

    • check your changes and that the OWL file looks good in Protege

    • ensure the next_id attribute is updated

    • ensure that oboOther:date is updated to the current GMT/BST before the commit

    • add the edited file to the commit

      git add <filepath>

    • Commit your local changes, including a concise but complete summary of the major changes:

      git commit -m ¡±commit message here¡±

  5. Push your changes to GitHub (main branch):

    git push origin

  6. Release the editing token for the other developers:

    • contact [email protected] and release the "editing token"
    • summarise what you actually did and why

Important

Please provide a meaningful report on changes so that we can easily generate the ChangeLog upon next release

  • in the Git commit message, including the GitHub issue number of any issues addressed (use fix #xxx syntax, see GitHub docs)
  • directly in the changelog.md

Creating a new official EDAM release

EDAM release schedule

We aiim to follow a bi-monthly release cycle to this schedule:

  1. First Wed of every month
  • EDAM team skype to discuss plans for this month. Announcement (to edam-announcence) including short summary of plans, invitation for suggestions.
  1. Last Mon of every month
  • Announcement (to edam-announcence) saying that release is immiment, invitation for last-minute suggestions.
  1. Last Wed of every month
  • Complete the work for the release. Make the release. Ensure it works in BioPortal, OLS, AgroPortal and in bio.tools.
  1. Last Fri of every month
  • Announcee the release, incuding summary of changes.

Note

Releases have been mosty quarterly but more regular (bi-monthly or even monthly) remains the aspiration. Please help out move faster by getting involved.

Process

Before creating a new release, please make sure you have the approval of leader of EDAM-dev, and that the changelog.md and changelog-detailed.md files are up-to-date with the changes of the new release. See Editing the ChangeLog below. Once you're clear to go, do the following:

  1. fix any known bugs in EDAM: at the very least, the EDAM build tests should pass as indicated by:

    assets/build_passing.png
  2. update your local version of the repository:

    git pull (or "Synch" in desktop client)

  3. assuming you are releasing version n+1, n being the current version:

    • you initially have EDAM_dev.owl in the repository
    • make sure to update oboOther:date in this file
    • copy the file EDAM_dev.owl to releases/EDAM.owl and releases/EDAM_n+1.owl
      • cp EDAM\_dev.owl releases/EDAM.owl
      • cp EDAM\_dev.owl releases/EDAM_n+1.owl
      • git add releases/EDAM\_n+1.owl
    • modify the doap:version property to n+1 in releases/EDAM_n+1.owl and to n+2_dev in EDAM_dev.owl
  4. commit and push your changes

    • git commit -a (or "Commit to main" in the desktop client)
    • git push origin (or "Synch" in the desktop client)
  5. update the detailed changelog by running Bubastis to compare the release against the previous version.

  6. update the changelog with a summary of the major changes.

  7. create the release on GitHub (use the _draft a new release_ button of the _releases_ tab).

    • from the main page of the EDAM repository, click Releases.
    • click Draft a new release
    • enter the version number e.g. 1.24 in the Tag version box
    • enter a title e.g. EDAM 1.24 release
    • check the This is a pre-release box if applicable
    • paste an excerpt from changelog.md into
  8. submit this new release to BioPortal. OLS will pull the file automatically from edamontology.org every night.

  9. download the EDAM.csv file from BioPortal and copy this to https://github.com/edamontology/edamontology/tree/main/releases

  10. create a tsv equivalent of EDAM.csv (e.g. by hacking in a text editor) and copy the resulting ``EDAM.tsv``file to https://github.com/edamontology/edamontology/tree/main/releases

  11. close GitHub issues labelled done - staged for release.

  12. create the next milestone tag in GitHub, e.g. "1.25"

  13. review any GitHub issues tagged for the release milestone which we're not acted upon; remove the milestone and (if applicable) tag them with the next milestone tag

  14. confirm everything is working in bio.tools by mailing bio.tools Lead Curator.

  15. let the developers of the EDAM browser know a new release is available by posting here

  16. Update the content of https://github.com/edamontology/edamontology.org/blob/main/page.html (add a line linking to the download of the latest release)

  17. ensure http://edamontology.org is updated

  18. announce the new release on Twitter and mailing lists ([email protected], [email protected]) including thanks and a summary of changes.

  19. help applications that implement EDAM to update to the new version.

Editing the ChangeLog

The ChangeLog includes:

  1. changelog - a summary of the major changes and what motivated them
  2. detailed changelog - fine-grained details obtained using Bubastis

The changelog should include:

  1. (as 1st paragraph) an "executive summary" suitable for consumption by technical managers, describing the motivation for major changes, including e.g. requests at recent hackathons, requests via GitHub, strategic directions etc.
  2. summary of changes distilled from the output of Bubastis (see below).
  3. summary of GitHub commit messages. please ensure meaningful commit messages are provided on every commit
Some hacking of bubastis output is needed to identify (at least):
  • number of new concepts
  • number of deprecations
  • summary of activity, i.e. in which branches was most work focucssed ?

Continuous Integration

Every modification on the ontology pushed to GitHub triggers an automated test in Travis CI. It checks:

  • a few rules using the edamxpathvalidator tool.

  • the consistency of the ontology by running the Hermit reasoner automatically.

    The Travis-CI website shows you the current status here. The fact that the continuous integration task succeeds does not guarantee there are no remaining bugs, but a failure means that you must take action to correct the problem, either fix it, fix the edamxpathvalidator program, or ask the mailing list if you're unsure.

Modifications in a GitHub fork

GitHub makes it possible for any developer to make modifications in a copy of EDAM and suggest these modifications are included in the original. Please note that we discourage using this mechanism for large modifications made using Protege, because merging OWL files which have been reformatted by Protege is notoriously unreliable (see "Best practices for edition" below).

The workflow is:

  • Fork the edamontology repository in your own account.
  • Make the modifications you want to suggest for inclusion in EDAM in this forked repository.
  • Open pull requests for each modification you make.

Please make sure to:

  • Keep your forked repository synchronized with the core repository, to avoid inconsistencies.
  • Make sure to follow the "Best practices for edition" below.