You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first allele has 17 CAG, 1 CAA, 1 CAG, 1 CCG, 1 CCA and 10 CCG.
The second allele has 46 CAG, 1 CAA, 1 CAG, 1 CCG, 1 CCA and 7 CCG.
So the interpretation counts the CAA interruption also as CAG or CCG. I understood that this purity score reflects this interruption but in the example shown here (https://github.com/PacificBiosciences/trgt/blob/main/docs/vcf_files.md) it is also visualised and reflected in the motif span (MS).
The tvz motifs allele svg does not show the interruption (it shows 19_12,48_9). However, the waterfall plot does show the interruption.
Is it normal that one repeat interruption is not counted? Especially at the end of this motif where the structure is well known or is this changed in later versions of trgt and the documentation is based on the newer version.
Many thanks in advance!
Max
The text was updated successfully, but these errors were encountered:
Thank you for brining this up. The current motif counting algorithm in TRGT allows some imperfections in the repeat sequence. This is what is causing CAACAG to be counted as two CAGs (one imperfect and one perfect motif copy). Unfortunately assessment of different repeats involves different rules (sometimes counting imperfections / interruptions and sometimes not), making it difficult to implement repeat segmentation that would be compatible with all assessment strategies. Because of this, it is pretty common to post-process TRGT's repeat sizes (for example by subtracting two motif counts for HTT alleles) to get the size estimates you are looking for. Does this sound reasonable?
Also the upcoming version of TRGT should make motif counting much more flexible (and should eliminate the need for any adjustment of the HTT allele lengths). If you'd like, we can send you a binary of this development version by the end of the week (just send me an email).
Thank you for the quick response and the explanation. This is very interesting and important to consider as those interruptions or the loss of those interruptions can be clinically relevant. This is also true for other repeat expansions such as in FMR1.
We would be happy to try the development version as soon as it is ready. I will contact you via email.
Hello,
I am new to the analysis of trgt. We run the analysis pipeline through SMRT Link. Currently there is trgt v0.8 installed.
I am looking at the HTT locus of the vcf and found an issue. Or maybe a misunderstanding on my part.
The vcf looks like this:
GT:AL:ALLR:SD:MC:MS:AP:AM
1/2:93,171:86-109,161-201:144,134:19_12,48_9:0(0-57)_1(57-93),0(0-144)_1(144-171):0.978495,0.988304:0.14,0.15
The first allele has 19 CAG and 12 CCG according to the motif count "MC".
The second allele has 48 CAG and 9 CCG according to the motif count "MC".
But in the sequences in the vcf I see this:
CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCGCCGCCG,
CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCG
The first allele has 17 CAG, 1 CAA, 1 CAG, 1 CCG, 1 CCA and 10 CCG.
The second allele has 46 CAG, 1 CAA, 1 CAG, 1 CCG, 1 CCA and 7 CCG.
So the interpretation counts the CAA interruption also as CAG or CCG. I understood that this purity score reflects this interruption but in the example shown here (https://github.com/PacificBiosciences/trgt/blob/main/docs/vcf_files.md) it is also visualised and reflected in the motif span (MS).
The tvz motifs allele svg does not show the interruption (it shows 19_12,48_9). However, the waterfall plot does show the interruption.
Is it normal that one repeat interruption is not counted? Especially at the end of this motif where the structure is well known or is this changed in later versions of trgt and the documentation is based on the newer version.
Many thanks in advance!
Max
The text was updated successfully, but these errors were encountered: