-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage of protein feature glyphs in combination with CDSs #167
Comments
Hy Shyam,
good question! The main use case for the protein glyphs that we would think
of was to sit on their own "protein line" above / below a DNA. I think in
many cases that should look best and is fairly intuitive.
What I could suggest alternatively is to put the protein glyphs on the top
line of the CDS box. Technically speaking, protein glyphs should never
occur on a DNA baseline (contrary to your examples) as they can only
describe things that are actually translated. So there should always be a
CDS box to decorate. I have never seen this done though anywhere. So it
wouldn't score high in terms of intuitive pattern recognition.
Greetings
Raik
…On Thu, 2 Nov 2023 at 07:14, Shyam Bhakta ***@***.***> wrote:
How are protein-stemmed glyphs to be used in conjunction with CDS/CDS
domain glyphs?
Should CDS glyphs be required to represent the full start->stop codon open
reading frame?
The polypeptide-stemmed glyph indicates a feature that manifests in the
polypeptide form. Extended from this stem, the protein cleavage site glyph
might represent a TEV protease site and a protein stability element glyph
might represent a solubilization domain or(?) a degradation tag².
Polypeptides are encoded by CDSs (Coding DNA Sequences), a DNA feature¹.
The clash naturally arises in how to indicate a protein-stemmed feature
within the CDS it is encoded within. The ambiguity might be why they have
been some of the least used glyphs.
[image: image]
<https://user-images.githubusercontent.com/5035245/279859749-f713ba27-accf-45b5-848d-afda0dbd9f0d.png>
Option 1a): require CDS/domain glyphs to cover the contiguous translation
unit, *i.e.* start to stop codons, AND³ 1b) allow superposition of any
protein-stem glyphs with the CDS glyph or any of its domains, as shown in
example (A).
Rationale: The SO term "CDS" is defined as "A contiguous sequence which
begins with, and includes, a start codon and ends with, and includes, a
stop codon." Furthermore, the CDS pentagon/arrow glyph (or a rectangle) has
historically been used to faithfully encompass start–>stop codon translated
reading frame spans in biology and syn bio well before SBOL's formalization
of it and to this day. The translational unit is evidently of extreme
importance to denote with a single glyph without interruptions, save for
domain indicators that subdivide the glyph without breaking it. Surely we
cannot violate such basic definitions and norms. In fact, we sort of
already decided on the sanctity of the contiguous CDS glyph when we
deliberated on the 2A peptide glyph in Issue 78
<#78>, where we chose
dashed lines that don't interrupt the CDS pentagon shape. I actually
brought up the present issue and arguments back then: comment (#1)
<#78 (comment)>
(#2)
<#78 (comment)>
Option 2): Allow protein-stem glyphs to *substitute* domain glyphs as in
example (B), and thus allow CDS/domain glyphs to stand for protein-coding
segments of DNA *without* implication of a full open reading frame, *i.e.,
without* the implication of beginning/ending with start/stop codons.
Rationale: Maybe someone thinks glyph superposition must be avoided and
that the CDS definition and norms are better to be revised instead of being
respected. I think example (B) is misleading: instinct to see the CDS glyph
as a translational unit makes the diagram evoke that the CDS wrongly ends
after the purple domain, and that the stability element is a feature that
*follows* the CDS, not is part of it. Also, the cleavage site in the
middle of (B) interrupts the interlocking domain shape, which is
aesthetically unpleasing.
¹ CDSs may be DNA features, perhaps rationalized as information-storage
parts, but CDSs truly only manifest their coding function in the RNA, since
that's what the ribosome/tRNA read. 🤔
² The stability-top in general is perhaps so rarely used because the +/–
direction of stabilization is hugely important to understanding the
function of the part and the reason it is used in a circuit.
Positive-stability domains are quite rare in syn bio; I can only imagine
enzymes being stabilized by, e.g., an MBP or GST tag. It's counterintuitive
that a shield glyph can represent negative stabilization when degradation
tags would be the predominant use of the glyph in syn bio. Furthermore, it
is pretty easy to misuse/misinterpret the protein cleavage site glyph as a
degradation tag, as the X top evokes degradation as well as cleavage. Not
to mention, technically, the proteasomal degradation process is a series of
many proteolytic cleavages. This matter is for a separate issue.
³ option 1B need not *necessarily* be in conjunction with 1A. But this
would mean that either protein-stem glyphs would have to be deprecated or
such glyphs could only be used only in isolation, outside genes/CDSs,
*e.g.*, in part plasmids where they are in isolation. There must be some
implied SBOL rule that prevents glyph composition from invalidating the
usage of a glyph, as would happen when, say, a deg tag part and the
protein-stemmed glyph that represents it gets used to build a CDS in a
gene: the glyph would become invalid in the composition with other CDS
domains, where the CDS domains would take precedent. Hence the option being
to permit superposition of the glyphs.
—
Reply to this email directly, view it on GitHub
<#167>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOGZXKDFCHCWSXUTS5S3ADYCMMYZAVCNFSM6AAAAAA62IHQCSVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TGNBXGE2DGNI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
___________________________________
Raik Grünberg
http://www.raiks.de
___________________________________
|
@graik Would this be what you're suggesting with the glyphs on top of the CDS box? This would be my intuition if I were to represent all my features (DNA, RNA, protein) on the same diagram. In that case, would the glyphs necessarily sit on top of each peptide region (A), assuming a 1:1 relationship, or could they sit anywhere along the line (B), including between peptides? |
I liked @shyambhakta alternative A, then if we implement that I have some questions. A is how I show my designs using the actual standard. B Then to represent internal components of a part the trivial way would be to just show the part as a component with a line inside separating different parts. This is also compatible with how to represent it in the data standard. C Including @shyambhakta A) ideas, I could represent them in this way but then if I represent an assembly scar how should it be done. D Is a mixture between B and C where you just represent sequence features relevant to the image and show that is the composite and all the other details are ommited. |
Hi Gonzalo,
as a general comment, it looks to me as if we are running here into the
quite regular issue of non-practitioners trying to set standards without
looking at the current community practices in literature and seminar
slides, however confusing they may be. A bit off-topic but, for example, as
a protein biochemist / engineer, I really dislike this "helix stem" for
designating protein features. I don't think any protein engineer would ever
use it or even recognize it as something related to proteins. It also is
unnecessarily crowding the space and, besides, most protein features cover
a range of residues rather than single sites. And a word of caution: the
dimensions in your example are extremely off. Cloning scars and even RBS
regions are tiny compared to the average length of a CDS.
Fully on-topic: the most common protein annotation will not be tags or
protease "sites" but domains, that is longer functional regions. The IMO
best depiction for that is a "pill-box" shape. Whatever you come up with
here, needs to start from that. Single-residue "site annotations" should be
a secondary concern. First you have to figure out how to draw two domain
regions into your protein, perhaps with a protease site in between and two
catalytic sites on top of one of them to make it more fun. I think you end
up with a situation where it is in fact cleaner, definitely more intuitive,
and not much more space consuming to draw a separate protein line above the
CDS and start populating this line with the protein features. You could
then also have the line zoom in (be longer than the CDS symbol below, with
dashed connectors back etc). And then you could just leave away this pescy
helix stem :) the symbols can sit directly on this protein line.
Anyway, that would be my suggestion. Starting from that, one could later
see whether things could optionally be compressed back onto the top line of
the CDS as described in Felipe's image. My guess is that this could work
sometimes but, as soon as you start putting exon and cloning scar
annotations into the mix, I think it may quickly get confusing.
Just my two cents of course...
Raik
…On Sat, 4 Nov 2023 at 00:41, Gonzalo Vidal ***@***.***> wrote:
[image: CDS_parts_representation]
<https://user-images.githubusercontent.com/35148159/280417568-84461a81-b178-423b-b65d-f10709aeb986.png>
I liked @shyambhakta <https://github.com/shyambhakta> alternative A, then
if we implement that I have some questions.
Lines inside a TU should be arrows or can be a straight line? In my
examples I use straight line as allows it uses less space in the X axis.
A is how I show my designs using the actual standard.
B Then to represent internal components of a part the trivial way would be
to just show the part as a component with a line inside separating
different parts. This is also compatible with how to represent it in the
data standard.
C Including @shyambhakta <https://github.com/shyambhakta> A) ideas, I
could represent them in this way but then if I represent an assembly scar
how should it be done.
D Is a mixture between B and C where you just represent sequence features
relevant to the image and show that is the composite and all the other
details are ommited.
—
Reply to this email directly, view it on GitHub
<#167 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOGZXJFY6UGC77P2MHZHGTYCVQHFAVCNFSM6AAAAAA62IHQCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGEZTMMRZHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
___________________________________
Raik Grünberg
http://www.raiks.de
___________________________________
|
PS: Here is a slide that I presented last week for our Bioengineering
lecture at KAUST:
[image: image.png]
This one is emphasizing the whole central dogma "stack". RNA would
typically be left out. The domain pill-boxes could also be centered on top
of the protein line as the CDS is on top of the DNA line.
Hope that helps somewhat.
Greetings
Raik
…On Sat, 4 Nov 2023 at 14:19, Raik Grünberg ***@***.***> wrote:
Hi Gonzalo,
as a general comment, it looks to me as if we are running here into the
quite regular issue of non-practitioners trying to set standards without
looking at the current community practices in literature and seminar
slides, however confusing they may be. A bit off-topic but, for example, as
a protein biochemist / engineer, I really dislike this "helix stem" for
designating protein features. I don't think any protein engineer would ever
use it or even recognize it as something related to proteins. It also is
unnecessarily crowding the space and, besides, most protein features cover
a range of residues rather than single sites. And a word of caution: the
dimensions in your example are extremely off. Cloning scars and even RBS
regions are tiny compared to the average length of a CDS.
Fully on-topic: the most common protein annotation will not be tags or
protease "sites" but domains, that is longer functional regions. The IMO
best depiction for that is a "pill-box" shape. Whatever you come up with
here, needs to start from that. Single-residue "site annotations" should be
a secondary concern. First you have to figure out how to draw two domain
regions into your protein, perhaps with a protease site in between and two
catalytic sites on top of one of them to make it more fun. I think you end
up with a situation where it is in fact cleaner, definitely more intuitive,
and not much more space consuming to draw a separate protein line above the
CDS and start populating this line with the protein features. You could
then also have the line zoom in (be longer than the CDS symbol below, with
dashed connectors back etc). And then you could just leave away this pescy
helix stem :) the symbols can sit directly on this protein line.
Anyway, that would be my suggestion. Starting from that, one could later
see whether things could optionally be compressed back onto the top line of
the CDS as described in Felipe's image. My guess is that this could work
sometimes but, as soon as you start putting exon and cloning scar
annotations into the mix, I think it may quickly get confusing.
Just my two cents of course...
Raik
On Sat, 4 Nov 2023 at 00:41, Gonzalo Vidal ***@***.***>
wrote:
> [image: CDS_parts_representation]
> <https://user-images.githubusercontent.com/35148159/280417568-84461a81-b178-423b-b65d-f10709aeb986.png>
>
> I liked @shyambhakta <https://github.com/shyambhakta> alternative A,
> then if we implement that I have some questions.
> Lines inside a TU should be arrows or can be a straight line? In my
> examples I use straight line as allows it uses less space in the X axis.
>
> A is how I show my designs using the actual standard.
>
> B Then to represent internal components of a part the trivial way would
> be to just show the part as a component with a line inside separating
> different parts. This is also compatible with how to represent it in the
> data standard.
>
> C Including @shyambhakta <https://github.com/shyambhakta> A) ideas, I
> could represent them in this way but then if I represent an assembly scar
> how should it be done.
>
> D Is a mixture between B and C where you just represent sequence features
> relevant to the image and show that is the composite and all the other
> details are ommited.
>
> —
> Reply to this email directly, view it on GitHub
> <#167 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAOGZXJFY6UGC77P2MHZHGTYCVQHFAVCNFSM6AAAAAA62IHQCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGEZTMMRZHA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
--
___________________________________
Raik Grünberg
http://www.raiks.de
___________________________________
--
___________________________________
Raik Grünberg
http://www.raiks.de
___________________________________
|
HI Raik, Could you please edit your post to re upload the image, It is not visible to me. Also, do you have more examples about how practitioners describe features of a protein in DNA, that would be very useful to align with current practices. |
Hi Gonzalo,
image re-attached...
I would guess that there are in fact not many examples of protein features
annotated on graphical depictions of DNA constructs. Protein features are
typically annotated separately on protein representations. If you want to
have both levels of detail in the same figure, you would by default do it
as I described with DNA and protein displayed next (atop) each other. In a
protein engineering-focused study, the DNA level is typically not shown at
all.
Greetings
Raik
…On Sat, Nov 4, 2023, 15:19 Gonzalo Vidal ***@***.***> wrote:
HI Raik,
Could you please edit your post to re upload the image, It is not visible
to me. Also, do you have more examples about how practitioners describe
features of a protein in DNA, that would be very useful to align with
current practices.
—
Reply to this email directly, view it on GitHub
<#167 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOGZXLCXQSOHRBSXDHYMV3YCYXFXAVCNFSM6AAAAAA62IHQCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGQZDQMRUHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @graik, still no image on Github or the email thread. In terms of the protein function glyphs, I think the X for cleavage is still intuitive, but like I mentioned earlier, a shield for what is typically a degradation tag is bad; we need a separate glyph for negative stability/degradation, eventually. Also a secretion/localization and maybe affinity tag function glyph. But any novel functional glyphs are of less priority since they are slow to catch on, as they need familiarization with. The very basics of the protein language with pill boxes on a backbone is open-ended and intuitive. Superposing protein-stem glyphs in (G) with domains doesn't look too awful aesthetically, but it feels a bit weird to put glyphs where it seems text belongs. CDS/domains seem to "want" text like in (E) and (H). But perhaps the glyphs could stand where abstraction is desired. Protein-stem glyphs on top of the CDS leaves space for text in the CDS, but I'm not sure I like the aesthetics of things dangling off. It can nicely provide abstraction, though, as the TEV site can be shown compactly between the two CDS domains in (H) without needing a separate domain as in (G). And I agree, cloning scars would not be practically used so prominently. @Gonza10V, I show part boundaries/scars when important with just dots like in (G) and (H). Doesn't steal from the show this way. If you really want to use the scar glyph, I think the specs may require a white box around like in (E), so that it interrupts and stands out from the DNA backbone. At least that's how I've seen it in examples. Also, in your (B)–(D), CDS subdivisions have to be indicated with angled lines like below, not straight, I believe. |
@shyambhakta Your F and E definitely look most natural to me. Curved linkers are not really something I have seen but it doesn't look bad either. H and G look weird IMO... (I) also... so I guess the simple straight line is best. And I agree that the pillbox+text can for now be used for lots of things before expanding into a whole list of custom symbols. |
As I see in @graik image protein engineers show protein features in the protein and not in the CDS. But in SBOL we still dont have a protein language #68 nor RNA language #79. The solution of indicating a part as a composite and then showing its components in RNA or protein needs the development of the latter two. The development of the protein language would be enough to represent something similar to the example provided. |
I am not sure G is the best template. Doesn't look intuitive to me (I would never guess there is any protein related info there). Plus, if you scale things down to a normal size, things get crowded and difficult to read very quickly. A more general solution would be to clearly separate protein from DNA features by always having each on its own line. Above each other if you want. Same goes for RNA, IMO. Definition would be easy: I think this would be intuitive, visually appealing and still very compact. |
When I've been marking protease cleavage sites in the past, I've tended to use something like the H figure, so that I can indicate the DNA location of the encoding for the protease cleavage site. |
How are protein-stemmed glyphs to be used in conjunction with CDS/CDS domain glyphs?
Should CDS glyphs be required to represent the full start->stop codon open reading frame?
The polypeptide-stemmed glyph indicates a feature that manifests in the polypeptide form. Extended from this stem, the protein cleavage site glyph might represent a TEV protease site and a protein stability element glyph might represent a solubilization domain or(?) a degradation tag². Polypeptides are encoded by CDSs (Coding DNA Sequences), a DNA feature¹. The clash naturally arises in how to indicate a protein-stemmed feature within the CDS it is encoded within. The ambiguity might be why they have been some of the least used glyphs.
Option 1a): require CDS/domain glyphs to cover the contiguous translation unit, i.e. start to stop codons, AND³ 1b) allow superposition of any protein-stem glyphs with the CDS glyph or any of its domains, as shown in example (A).
Rationale: The SO term "CDS" is defined as "A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon." Furthermore, the CDS pentagon/arrow glyph (or a rectangle) has historically been used to faithfully encompass start–>stop codon translated reading frame spans in biology and syn bio well before SBOL's formalization of it and to this day. The translational unit is evidently of extreme importance to denote with a single glyph without interruptions, save for domain indicators that subdivide the glyph without breaking it. Surely we cannot violate such basic definitions and norms. In fact, we sort of already decided on the sanctity of the contiguous CDS glyph when we deliberated on the 2A peptide glyph in Issue 78, where we chose dashed lines that don't interrupt the CDS pentagon shape. I actually brought up the present issue and arguments back then: comment (#1) (#2)
Option 2): Allow protein-stem glyphs to substitute domain glyphs as in example (B), and thus allow CDS/domain glyphs to stand for protein-coding segments of DNA without implication of a full open reading frame, i.e., without the implication of beginning/ending with start/stop codons.
Rationale: Maybe someone thinks glyph superposition must be avoided and that the CDS definition and norms are better to be revised instead of being respected. I think example (B) is misleading: instinct to see the CDS glyph as a translational unit makes the diagram evoke that the CDS wrongly ends after the purple domain, and that the stability element is a feature that follows the CDS, not is part of it. Also, the cleavage site in the middle of (B) interrupts the interlocking domain shape, which is aesthetically unpleasing.
¹ CDSs may be DNA features, perhaps rationalized as information-storage parts, but CDSs truly only manifest their coding function in the RNA, since that's what the ribosome/tRNA read. 🤔
² The stability-top in general is perhaps so rarely used because the +/– direction of stabilization is hugely important to understanding the function of the part and the reason it is used in a circuit. Positive-stability domains are quite rare in syn bio; I can only imagine enzymes being stabilized by, e.g., an MBP or GST tag. It's counterintuitive that a shield glyph can represent negative stabilization when degradation tags would be the predominant use of the glyph in syn bio. Furthermore, it is pretty easy to misuse/misinterpret the protein cleavage site glyph as a degradation tag, as the X top evokes degradation as well as cleavage. Not to mention, technically, the proteasomal degradation process is a series of many proteolytic cleavages. This matter is for a separate issue.
³ option 1B need not necessarily be in conjunction with 1A. But this would mean that either protein-stem glyphs would have to be deprecated or such glyphs could only be used only in isolation, outside genes/CDSs, e.g., in part plasmids where they are in isolation. There must be some implied SBOL rule that prevents glyph composition from invalidating the usage of a glyph, as would happen when, say, a deg tag part and the protein-stemmed glyph that represents it gets used to build a CDS in a gene: the glyph would become invalid in the composition with other CDS domains, where the CDS domains would take precedent. Hence the option being to permit superposition of the glyphs.
The text was updated successfully, but these errors were encountered: