Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of protein feature glyphs in combination with CDSs #167

Open
shyambhakta opened this issue Nov 2, 2023 · 13 comments
Open

Usage of protein feature glyphs in combination with CDSs #167

shyambhakta opened this issue Nov 2, 2023 · 13 comments
Assignees
Labels

Comments

@shyambhakta
Copy link

shyambhakta commented Nov 2, 2023

How are protein-stemmed glyphs to be used in conjunction with CDS/CDS domain glyphs?
Should CDS glyphs be required to represent the full start->stop codon open reading frame?

The polypeptide-stemmed glyph indicates a feature that manifests in the polypeptide form. Extended from this stem, the protein cleavage site glyph might represent a TEV protease site and a protein stability element glyph might represent a solubilization domain or(?) a degradation tag². Polypeptides are encoded by CDSs (Coding DNA Sequences), a DNA feature¹. The clash naturally arises in how to indicate a protein-stemmed feature within the CDS it is encoded within. The ambiguity might be why they have been some of the least used glyphs.

image

Option 1a): require CDS/domain glyphs to cover the contiguous translation unit, i.e. start to stop codons, AND³ 1b) allow superposition of any protein-stem glyphs with the CDS glyph or any of its domains, as shown in example (A).
Rationale: The SO term "CDS" is defined as "A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon." Furthermore, the CDS pentagon/arrow glyph (or a rectangle) has historically been used to faithfully encompass start–>stop codon translated reading frame spans in biology and syn bio well before SBOL's formalization of it and to this day. The translational unit is evidently of extreme importance to denote with a single glyph without interruptions, save for domain indicators that subdivide the glyph without breaking it. Surely we cannot violate such basic definitions and norms. In fact, we sort of already decided on the sanctity of the contiguous CDS glyph when we deliberated on the 2A peptide glyph in Issue 78, where we chose dashed lines that don't interrupt the CDS pentagon shape. I actually brought up the present issue and arguments back then: comment (#1) (#2)

Option 2): Allow protein-stem glyphs to substitute domain glyphs as in example (B), and thus allow CDS/domain glyphs to stand for protein-coding segments of DNA without implication of a full open reading frame, i.e., without the implication of beginning/ending with start/stop codons.
Rationale: Maybe someone thinks glyph superposition must be avoided and that the CDS definition and norms are better to be revised instead of being respected. I think example (B) is misleading: instinct to see the CDS glyph as a translational unit makes the diagram evoke that the CDS wrongly ends after the purple domain, and that the stability element is a feature that follows the CDS, not is part of it. Also, the cleavage site in the middle of (B) interrupts the interlocking domain shape, which is aesthetically unpleasing.

¹ CDSs may be DNA features, perhaps rationalized as information-storage parts, but CDSs truly only manifest their coding function in the RNA, since that's what the ribosome/tRNA read. 🤔

² The stability-top in general is perhaps so rarely used because the +/– direction of stabilization is hugely important to understanding the function of the part and the reason it is used in a circuit. Positive-stability domains are quite rare in syn bio; I can only imagine enzymes being stabilized by, e.g., an MBP or GST tag. It's counterintuitive that a shield glyph can represent negative stabilization when degradation tags would be the predominant use of the glyph in syn bio. Furthermore, it is pretty easy to misuse/misinterpret the protein cleavage site glyph as a degradation tag, as the X top evokes degradation as well as cleavage. Not to mention, technically, the proteasomal degradation process is a series of many proteolytic cleavages. This matter is for a separate issue.

³ option 1B need not necessarily be in conjunction with 1A. But this would mean that either protein-stem glyphs would have to be deprecated or such glyphs could only be used only in isolation, outside genes/CDSs, e.g., in part plasmids where they are in isolation. There must be some implied SBOL rule that prevents glyph composition from invalidating the usage of a glyph, as would happen when, say, a deg tag part and the protein-stemmed glyph that represents it gets used to build a CDS in a gene: the glyph would become invalid in the composition with other CDS domains, where the CDS domains would take precedent. Hence the option being to permit superposition of the glyphs.

@graik
Copy link

graik commented Nov 2, 2023 via email

@fxbuson
Copy link

fxbuson commented Nov 2, 2023

@graik Would this be what you're suggesting with the glyphs on top of the CDS box? This would be my intuition if I were to represent all my features (DNA, RNA, protein) on the same diagram.

image

In that case, would the glyphs necessarily sit on top of each peptide region (A), assuming a 1:1 relationship, or could they sit anywhere along the line (B), including between peptides?

@Gonza10V Gonza10V self-assigned this Nov 3, 2023
@Gonza10V Gonza10V added the Draft label Nov 3, 2023
@Gonza10V Gonza10V added this to the SBOL Visual 3.1 milestone Nov 3, 2023
@Gonza10V
Copy link
Contributor

Gonza10V commented Nov 3, 2023

CDS_parts_representation

I liked @shyambhakta alternative A, then if we implement that I have some questions.
Lines inside a TU should be arrows or can be a straight line? In my examples I use straight line as allows it uses less space in the X axis.

A is how I show my designs using the actual standard.

B Then to represent internal components of a part the trivial way would be to just show the part as a component with a line inside separating different parts. This is also compatible with how to represent it in the data standard.

C Including @shyambhakta A) ideas, I could represent them in this way but then if I represent an assembly scar how should it be done.

D Is a mixture between B and C where you just represent sequence features relevant to the image and show that is the composite and all the other details are ommited.

@graik
Copy link

graik commented Nov 4, 2023 via email

@graik
Copy link

graik commented Nov 4, 2023 via email

@Gonza10V
Copy link
Contributor

Gonza10V commented Nov 4, 2023

HI Raik,

Could you please edit your post to re upload the image, It is not visible to me. Also, do you have more examples about how practitioners describe features of a protein in DNA, that would be very useful to align with current practices.

@graik
Copy link

graik commented Nov 4, 2023 via email

@shyambhakta
Copy link
Author

shyambhakta commented Nov 4, 2023

Hi @graik, still no image on Github or the email thread.
I do like the concept of a peptide backbone that could mesh with a protein language.
What you're describe is perhaps something like in (E) or (F), going off your 2019 comment and the published example in issue 68. I think these issues may need to merge.
I'm not sure if switching the backbone shape is important. Curves like in (E) might evoke linkers / less-structured peptides and make it more intuitively proteinaceous, since straight might "feel" too rigid to be protein, but I can't say how long a curved backbone needs to be aesthetic like in (I). Then again, a long segment of straight protein backbone would also look weird. Maybe N- and -C termini can be marked if the backbone ends are exposed.
Anyhow, so long as there are pill boxes relatively close on even a straight peptide backbone (F), and perhaps if the ends aren't visible, it wouldn't be confused for a protein superposed on DNA, like I recall being shown in specs to show protein:promoter binding. I think it would be great to work toward formalizing a protein backbone for protein glyphs.

In terms of the protein function glyphs, I think the X for cleavage is still intuitive, but like I mentioned earlier, a shield for what is typically a degradation tag is bad; we need a separate glyph for negative stability/degradation, eventually. Also a secretion/localization and maybe affinity tag function glyph. But any novel functional glyphs are of less priority since they are slow to catch on, as they need familiarization with. The very basics of the protein language with pill boxes on a backbone is open-ended and intuitive.

Superposing protein-stem glyphs in (G) with domains doesn't look too awful aesthetically, but it feels a bit weird to put glyphs where it seems text belongs. CDS/domains seem to "want" text like in (E) and (H). But perhaps the glyphs could stand where abstraction is desired. Protein-stem glyphs on top of the CDS leaves space for text in the CDS, but I'm not sure I like the aesthetics of things dangling off. It can nicely provide abstraction, though, as the TEV site can be shown compactly between the two CDS domains in (H) without needing a separate domain as in (G).

And I agree, cloning scars would not be practically used so prominently. @Gonza10V, I show part boundaries/scars when important with just dots like in (G) and (H). Doesn't steal from the show this way. If you really want to use the scar glyph, I think the specs may require a white box around like in (E), so that it interrupts and stands out from the DNA backbone. At least that's how I've seen it in examples. Also, in your (B)–(D), CDS subdivisions have to be indicated with angled lines like below, not straight, I believe.

image

@graik
Copy link

graik commented Nov 4, 2023

Sorry, I kept using the Reply-to from GMail but forgot that this never works for images. Here it is:
image

@graik
Copy link

graik commented Nov 4, 2023

@shyambhakta Your F and E definitely look most natural to me. Curved linkers are not really something I have seen but it doesn't look bad either. H and G look weird IMO... (I) also... so I guess the simple straight line is best. And I agree that the pillbox+text can for now be used for lots of things before expanding into a whole list of custom symbols.

@Gonza10V
Copy link
Contributor

Gonza10V commented Nov 6, 2023

As I see in @graik image protein engineers show protein features in the protein and not in the CDS. But in SBOL we still dont have a protein language #68 nor RNA language #79. The solution of indicating a part as a composite and then showing its components in RNA or protein needs the development of the latter two. The development of the protein language would be enough to represent something similar to the example provided.
Now focusing in DNA, if we want to represent the details there, G from @shyambhakta example is the intuitive for me.

@graik
Copy link

graik commented Nov 6, 2023

I am not sure G is the best template. Doesn't look intuitive to me (I would never guess there is any protein related info there). Plus, if you scale things down to a normal size, things get crowded and difficult to read very quickly. A more general solution would be to clearly separate protein from DNA features by always having each on its own line. Above each other if you want. Same goes for RNA, IMO. Definition would be easy:
(1) RNA features are to be displayed on a wavy line above the corresponding DNA feature / location.
(2) Protein features are to be displayed on a straight line above the corresponding DNA or RNA feature.
(3) If the RNA or protein line shows the complete molecule, this can be indicated by a line terminator such as ---| or ---o

I think this would be intuitive, visually appealing and still very compact.

@jakebeal
Copy link
Contributor

jakebeal commented Nov 6, 2023

When I've been marking protease cleavage sites in the past, I've tended to use something like the H figure, so that I can indicate the DNA location of the encoding for the protease cleavage site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants