-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEP 004 -- Add contextual role properties to SequenceAnnotation and Component #4
Comments
We need to change the title on this, because SO terms are merely one classifier we could encode using roles, and this is not primarily about SO terms, it is about adding roles attributes to Component and SequenceAnnotation. |
Hi Mike, yes, the title should be different. And then it would be good to get some Below is the raw template. Please have a go at it and send it as a text Greetings, SEP 004 -- Adding Additional Semantics for SO Terms
Abstract< insert short summary > Table of Contents
1. Rationale< insert motivation / rational, keep it brief > 2. Specification< give detailed specification, can be split up into more than one section 2.1 optional sub-point< sub-divide specification section if useful > 2.2 optional sub-point< sub-divide specification section if useful > 3. Example or Use Case< Describe brief examples or use cases > 4. Backwards Compatibility< discuss compatibility issues and how to solve them; remove this section 5. Discussion5.1 discussion point< Summarize discussion, also represent dissenting opinions > 5.2 discussion point6. Competing SEPs< list competing SEP #'s > References< list references as follows > < refer to these references in the text as, e.g. SBOL or 1 > CopyrightThis document has been placed in the public domain. On Fri, Dec 11, 2015 at 7:11 PM, mikebissell [email protected]
Raik Grünberg |
changed title and tried to flesh out the proposal as far as I remember it from the discussion during the meeting in Salt Lake City |
Ok, thanks. Chris Macklin and I are now working on this, so we'll merge MB On Saturday, January 23, 2016, Raik Grünberg [email protected]
|
Great, please add Chris as an author and feel free to remove me as an author if your text has not much in common with what I suggested. |
I rewrote 98% of this placeholder draft after consulting with Raik, Chris Macklin, and Darren Platt. I'll submit a pull request. |
I've just discovered Mike's second pull request from 2 weeks ago: merged and updated the issue. It's just a cosmetic change of "freezer" to "inventory". |
SBOL14 meeting sense: this is ready to move forward to a vote; Mike wants to add in a little more clarification first. |
Just to combat vote inflation, are there any other SEPs pending for the 2.1 milestone? |
This is the only SEP pending for 2.1; the other pending 2.1 change is the non-SEP. Otherwise, I would agree that it would be good to do a multiple vote like we did for SEP 1/2/5. |
updated to Mike's pull request #14 following discussion on various channels (mostly libsbol-team mailing list). I applied a provisional fix of URIs suggested for http://sbols.org/v2/roleIntegration#override, removing the duplicate "/#' separator. Putting those directly into the SBOL v2 name space may be better and should still be discussed. |
merged in Mike's corrections. See pull request #15. |
This does not need to be worked out before the vote, but we do need to get this sorted before the specification is updated. Namely, what new validation rules will these roles have? Below are the ones for CD roles, and I think we may want to start with these and modify them to fit the new role locations: sbol-10507, 10508, 10509, 10510, 10511, and 10527. In addition, we will need rules for roleIntegration, something like: sbol-10810 |
We also need to modify sbol-10701 and sbol-10901 to allow roles and roleIntegration fields. |
Closing in accordance with changes to SEP issue tracking rules detailed in SEP 001 bcbbcab#diff-44cec2aabf4c066f9a54ac4ef6634b9b |
SEP 004 -- Add contextual role properties to SequenceAnnotation and Component
Abstract
We propose adding optional
roles
fields to bothSequenceAnnotation
andComponent
.SequenceAnnotation
andComponent
will each contain a set ofzero or more
role
members, which we call "contextual roles." Contextual roleswill allow users to specify the actual role(s) of a subcomponent or subsequence
in the context of a design.
SBOL already permits roles on
ComponentDefinition
instances, however thedeclared roles that were originally imagined and encoded by the author of a
ComponentDefinition
may not accurately reflect the roles of thatpart in every biological and operational context. In practice, context often
takes precedence in the determination of a part's actual roles. For example, a
DNA sequence encoding a protein purification tag might, in a particular design,
be used merely as a benign spacer between two DNA elements.
1. Rationale
1.1 Background: Roles Per SBOL 2.0
In SBOL 2.0, a non-contextual
roles
field already belongs toComponentDefinition
. This optional field classifies an entity's declaredfunction(s) at the time of design. Roles may specify biological function, in
which case the roles should come from Table 3 (Sequence Ontology) if possible,
however SBOL does not strictly limit the application of roles to biological
ontologies. A
role
simply links the givenComponentDefinition
to a term froman external ontology.
1.2 Argument: Context Matters
Note, however, that in the context of a larger design, the actual purpose of
this
ComponentDefinition
may be rather different from that which was intended;the original authors of a
ComponentDefinition
record cannot foresee allpossible applications of a biological part. In fact, outside of a host context,
a part in isolation most likely will fail to play its declared biological roles.
Moreover, genetic designers may want to annotate functions of a sequence that
are only locally relevant to their particular experiment or a specific
fabrication step. Consider, for example, sequences required in DNA assembly and
quality control, where the same DNA sequence could serve different purposes in
different locations of a design, or in different steps of an assembly workflow.
Importantly, when implementing a construction and testing service, or when
implementing genetic CAD software and the underlying genetic compilers, it
may become necessary to flag subcomponents in a design not only according to
their biological function, but also with non-biological classifications that
describe each subcomponent's logical role in each specific assembly.
During the automated planning of construction recipes, the intended biological
function of a physically isolated subcomponent is largely irrelevant, but it
may become necessary to classify logistical and compositional relationships in
order to manage the flow of templates, host strains, construction reagents,
and output products through a construction workflow. For example, a planning
algorithm might decide that one
Component
incorporating a PCR primer'sComponentDefinition
is to be implemented by ordering an oligo from aparticular vendor, while a second such
Component
is to be implemented byretrieving an existing sample from inventory.
During computer aided design and compilation, certain algorithms may care about
the local, logical or compositional role of an entity while deliberately
ignoring its planned biological function in finished assemblies. Since modular
subcomponents are typically intended for reuse in multiple finished assemblies,
these subcomponents' eventual logical or biological roles may not even be
determined at the time of design or compilation, however the genetic compiler
will still need to classify subcomponents' relationships to various intermediate
build products containing them.
In both scenarios, non-biological roles are important because a subcomponent has
no relevant biological activity in isolation, during design and construction.
Therefore, contextual roles are needed.
1.3 Terminology and Model
According to SBOL, a hierarchical affiliation of ComponentDefinitions may take
the following form. Please note how this SEP applies the position-relative
labels "parent," "child," and "sibling" in the context of the argument below.
(Additionally note that the UML notation in this class diagram employs a dialect
of UML 2.5. The UML class diagrams in SBOL use their own dialect. If this SEP
is approved, we assume that the SBOL editors will merge its contents into the
specification as needed, but not necessarily verbatim.)
1.4 Argument for Both Component.roles and SequenceAnnotation.roles
We propose to add contextual roles in two places:
Component
andSequenceAnnotation
.Adding an optional
roles
field toComponent
will allow users to describe thesubcomponent's actual design purpose(s) with respect to the current assembly,
without reference to the specific details of its parent's sequence (if that
sequence is even known yet).
Adding an optional
roles
field toSequenceAnnotation
will allow users todescribe the design purpose(s) of a subsequence and an optional subcomponent in
the context of a parent sequence, with respect to a specific location in that
sequence.
It is insufficient to put
roles
only onSequenceAnnotation
;roles
areadditionally needed on
Component
. There are two reasons. First, not all rolespertain to a specific parent sequence or to a particular location on that
sequence. Second, SequenceAnnotations must refer to a sequence, but a parent
sequence is not always known at every moment in the lifetime of a design. For
instance, genetic compilers often compute the sequences of a parent assembly
from the sequences of its children in a bottom-up fashion. In this case, the
parent sequence would not be determined until the final compilation step, even
though the contextual roles of its subcomponents would have been determined
earlier in the build. Compilation of the parent ComponentDefinition's sequence
might even be deferred until later, when the host context is finally determined.
Between the time that the subcomponents were compiled and the moment of the
parent's compilation, subcomponents' contextual roles must be conveyed through
the compiler toolchain, and we propose that
Component.roles
is theappropriate vehicle.
It is insufficient to put
roles
only onComponent
;roles
are additionallyneeded on
SequenceAnnotation
. There are, again, two reasons. First, when it ispossible to map a subcomponent with some contextual
role
(say, the promoterhttp://identifiers.org/so/SO:0000167) onto a specific location in a parent
sequence, then users will want to do that. Only
SequenceAnnotation
gives usthat ability;
Component
does not refer to parent sequence details. Secondly,users may choose not to model every aspect of a
ComponentDefinition
hierarchically (using Components to link in sub-ComponentDefinitions), but our
users may still wish to annotate the genetic details of a ComponentDefinition's
sequence. For example, the user might tag one subsequence as an RBS, and another
sequence as a start codon, without making separate, heavyweight
Component
andComponentDefinition
elements for the RBS and the start codon.2. Specification
We specify the contents of
SequenceAnnotation.roles
andComponent.roles
inanalogy to
Participation.roles
(SBOL 7.9.4) andComponentDefinition.roles
(SBOL 7.1). We additionally specify the optional fields
SequenceAnnotation.roleIntegration
andComponent.roleIntegration
.Integration rules specify how to resolve
potential conflicts between competing sets of roles.
2.1.0 Component.roles
The
Component.roles
property comprises an OPTIONAL set of zero or moreURIs describing the purpose or potential function of a given child
ComponentDefinition
in the context of its parentComponentDefinition
.If provided, these
role
URIs MUST identify terms from appropriate ontologies.Roles are not restricted to describing biological function; they may annotate
Components' function in any domain for which an ontology exists.
It is RECOMMENDED that these
role
URIs identify terms that are compatible withthe
type
properties of both the parent and child ComponentDefinitions. Forexample, a
role
of aComponent
which belongs to aComponentDefinition
oftype DNA and points to a child
ComponentDefinition
of type DNA might refer toterms from the Sequence Ontology. A table of recommended ontology branches is
given in the SBOL specification.
2.1.1 Component.roleIntegration
The
Component.roleIntegration
property has a data type of URI. AComponent
instance with zero
roles
MAY OPTIONALLY specify aroleIntegration
. AComponent
instance with one or moreroles
MUST specify aroleIntegration
from the table below.
A
roleIntegration
specifies the relationship between aComponent
instance'sown set of
roles
and the set ofroles
on the childComponentDefinition
.Component
, ignore any role(s) given for the childComponentDefinition
. Instead use only the set of zero or more roles given for thisComponent
.Component
as well as the set of zero or more roles given for the childComponentDefinition
.If zero
Component.roles
are given and noComponent.roleIntegration
isgiven, then
http://sbols.org/v2#mergeRoles
is assumed.It is RECOMMENDED to specify a set of
Component.roles
only if the integratedresult set of roles would differ from the set of
roles
belonging to the childComponentDefinition
.2.2.0 SequenceAnnotation.roles
The
SequenceAnnotation.roles
property comprises an OPTIONAL set of zero ormore URIs describing the purpose or potential function of a given subsequence
(and, if given, a child
ComponentDefinition
) in the context of its parentComponentDefinition
.If provided, these
role
URIs MUSTidentify terms from appropriate ontologies. Roles are not restricted to
describing biological function; they may annotate Sequences' function in any
domain for which an ontology exists.
It is RECOMMENDED that these
role
URIs identify terms that are compatible withthe
type
properties of both the parent and child ComponentDefinitions (ifgiven). For example, a
role
of aSequenceAnnotation
which belongs to aComponentDefinition
of type DNA and incorporates a childComponentDefinition
of type DNA might refer to terms from the SequenceOntology. A table of recommended ontology branches is given in the SBOL
specification.
2.2.1 SequenceAnnotation.roleIntegration
The
SequenceAnnotation.roleIntegration
property has a data type of URI. ASequenceAnnotation
instance with zeroroles
MAY OPTIONALLY specify aroleIntegration
. ASequenceAnnotation
instance with one or moreroles
MUSTspecify a
roleIntegration
from the table below.Using
roleIntegration
, aSequenceAnnotation
instance MAY specify therelationship between its own set of roles and the set of roles computed for the
sibling
Component
(if given). The integrated result set of computed roles mayinclude
role
elements from the childComponentDefinition
if dictated by thevalue of
Component.roleIntegration
. To determine the integrated role set for aSequenceAnnotation
, first compute the integrated set of roles for the siblingComponent
(if given) according to the integration rules in section 2.1.1above. If no sibling
Component
is given, then the sibling set is assumed to bethe empty set.
SequenceAnnotation
, ignore the integrated set of roles computed for the optional siblingComponent
. Instead use only the set of zero or moreroles
given for thisSequenceAnnotation
.roles
given for thisSequenceAnnotation
as well as the integrated set of zero or more roles computed for the optional siblingComponent
.If zero
SequenceAnnotation.roles
are given and noSequenceAnnotation.roleIntegration
is given, thenhttp://sbols.org/v2#mergeRoles
is assumed.It is RECOMMENDED to specify a set of
SequenceAnnotation.roles
only ifthe integrated result set of roles would differ from the integrated set of roles
computed for the sibling Component.
2.3.0 XML Serialization Details
For both proposed contextual
roles
properties, a set of zero or more<role>
child elements, if declared, MUST be contained within the parent element
in the XML serialized form.
For both proposed
roleIntegration
rule properties, either zero or one<roleIntegration>
child element, if declared, MUST be contained within theparent element in the XML serialized form.
2.3.1 XML Serialization Example
By providing a way to override roles, this data structure allows us to specify
that a child
ComponentDefinition
whose designer declared some role(s) actuallyhas different role(s) in the current context.
3. Examples
3.1 DNA Assembly and Cloning
A collection of DNA constructs for which DNA assembly is ordered from a
commercial provider may contain a short DNA sequence in different locations of
several constructs. In some constructs, this sequence may be included as a
forward sequencing primer binding site, in others as a reverse primer for the
amplification of a DNA segment during the DNA assembly process.
3.2 Protein Peptide Sequences
A given fusion protein design may contain a Hexahistidine tag which is commonly
used for protein purification. However, in a cell biology experiment, the His
tag may instead have been included as an antigen in order to localize the
protein with immunostaining. In both cases, the
SequenceAnnotation
should(via
Component
) point to the sameComponentDefinition
(His tag). As long asthe actual purpose is annotated in
SequenceAnnotation
, a third party couldautomatically verify whether this immunotag is compatible with the antibodies
available in-house. Alternatively, a cell biologist may automatically remove all
protein purification tags from a given design but would want to keep any
sequence used for immunostaining.
4. Backwards Compatibility
There are no obvious backwards compatibility issues. Contextual roles require
additional fields on both
Component
andSequenceAnnotation
. These fieldscan be safely ignored by software tools written for SBOL 2.0.
5. Discussion
5.1 History
This proposal was advanced by Mike Bissell from Amyris in order to accommodate
two types of annotations that proved necessary to implement the Amyris genetic
compilers and DNA assembly processes. It was heavily revised in response to
community input during SBOL Workshop 14 (March 2016, Boston), and again in
response to a second round of reviews by Bryan Bartley, Chris Myers, Raik
Gruenberg, and Chris Macklin (March 23-24, 2016).
5.2 Role Inheritance, a.k.a. Role Integration
It is possible for there to be conflicts between the roles assigned explicitly
to a
Component
and the roles already assigned to the underlying childComponentDefinition
to which thatComponent
refers. It is likewise possiblefor there to be conflicts between the roles assigned expliclty to a
SequenceAnnotation
and the roles already assigned to either its optionalunderlying sibling
Component
and/or the roles already assigned to itssibling's underlying child
ComponentDefinition
.Moreover, we discussed how there may arise situations where, up at the level of
a
Component
instance, users may wish to explicitly invalidate or overrideroles declared down at the level of the child
ComponentDefinition
. Forexample, whoever originally designed a part may have had no idea what it really
might eventually be used for, and a user of that part might therefore need to
discard, supplement, and/or replace the originally declared roles where that
part is incorporated into a parent
ComponentDefinition
.After discarding other options, we settled on introducing an explicit property,
roleIntegration
, which specifies whether roles onComponent
orSequenceAnnotation
should (1) be added to (mergeRoles
), or (2) completelyreplace (
overrideRoles
), any roles declared on the underlying object(s) towhich they may refer.
As an alternative, it was first proposed to re-use
MapsTo.refinement
types.Per SBOL these are:
useLocal
(akin to the proposed 'overrideRoles
'roleIntegration
, meaning replace terms),useRemote
(ignore any terms declared on this instance),verifyIdentical
(see SBOLMapsTo.refinement
for definition),merge
(akin to the proposed 'mergeRoles
'roleIntegration
).Both the use of
verifyIdentical
anduseRemote
were challenged as unnecessarycomplications. The effect of
useRemote
is more obviously achieved by notdeclaring any new roles on
Component
.verifyIdentical
was criticized forbeing an altogether different beast:
verifyIdentical
would impose a constraintimplying a validation rule, as opposed to
mergeRoles
andoverrideRoles
,which simply name functions which, given multiple sets of roles, return a single
set of roles.
For compatibility with SBOL 2.0 documents, a
roleIntegration
is not requiredif no contextual roles are given. Because SBOL avoids specifying default
values, an explicit
roleIntegration
is required if contextual roles are given.5.3 Errata
On August 11, 2016, MB deleted the following line from the second
ComponentDefinition
above, at the end of the serialization example:This SEP does not specify a
ComponentDefinition.roleIntegrationproperty.The deleted line was the result of a copy/paste error.
On August 11, 2016, MB reordered two of the properties in the
SequenceAnnotation
serialization example, so that the new properties appearafter the last property specified in SBOL 7.7.4.
On August 16, 2016, MB deleted the unintentionally duplicated word
'set' from the specification of Component.roleIntegration.
6. Competing SEPs
There are currently no competing or conflicting SEPs.
References
SBOL - http://sbolstandard.org/downloads/specification-data-model-2-0/
Copyright
This document has been placed in the public domain.
The text was updated successfully, but these errors were encountered: