Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AL values is not withing the range (ALLR) #48

Open
jakobht96 opened this issue Nov 21, 2024 · 6 comments
Open

AL values is not withing the range (ALLR) #48

jakobht96 opened this issue Nov 21, 2024 · 6 comments

Comments

@jakobht96
Copy link

Hi,

We are interested in RFC1 in human samples for diagnosis of CANVAS. We have a patient with a clinical CANVAS diagnosis. After we have run the analysis, we get an AL value that is lower than the ALLR range. We have discussed this and how to interpret this. We actually first thought the AL was average. But we now understand that AL maybe should be construed as an "educated guess"/consensus, where all the reads add to the evidence of the length. Is this correctly understood?

We can see that every read in this region is a bit funky and very heterogeneous.

The allele where length is outside ALLR:
image

Piece from the vcf file:

GT:AL:ALLR:SD:MC:MS:AP:AM 1/2:2938,3534:2511-3181,3587-3944:29,6:0_594_0_0_0_0_0_0,40_611_46_0_2_0_9_10:1(0-2938),1(0-1174)_2(1174-1201)_1(1201-1500)_7(1500-1541)_1(1541-2545)_7(2545-2551)_2(2551-2595)_6(2595-2618)_1(2618-2690)_4(2690-2700)_2(2700-2828)_0(2828-2846)_7(2846-2856)_6(2856-2873)_1(2873-2888)_0(2888-2942)_0(2958-2971)_2(2971-2979)_1(2979-3077)_0(3077-3165)_1(3184-3514)_0(3514-3534):0.952189,0.879505:0.52,0.42

Please share some thoughts.

//Jakob

@pbsena
Copy link
Contributor

pbsena commented Nov 21, 2024

Hello,

Thank you for sharing this very interesting RFC1 example. All the information shown in the VCF file regards the consensus sequence, which is drafted based on the multiple sequence alignment of reads assigned to alleles. This is the top sequence with higher contrast shown in trgt plot outputs. When reads are aligned to this consensus, the regions in individual reads are always colored based on the consensus segmentation of motifs, even if they are more heterogeneous and the individual read sequence does not resemble the consensus motifs.

I hope this helps interpreting the output and how plots are colored, we'd be happy to discuss these results further.

@egor-dolzhenko
Copy link
Collaborator

Thank you for sharing this example, Jakob. Just to add to Guilherme’s reply, yes your understanding is correct. ALLR is currently just the range of repeat sizes observed in reads and so, in rare cases, it might be inconsistent with the consensus allele sequence. We will add better size intervals to the list of future TRGT improvements.

Are you using ALLR to assess the confidence of an allele call? Or are you interested in profiling repeat heterogeneity / mosaicism?

Best wishes,
Egor

@jakobht96
Copy link
Author

Hi both,

Thank you for your inputs, that is great. We are currently trying to validate the method for diagnostic purpose, and we would like to have some "confidence interval" or similiar, to say how certain are we that the length is correct. We also look nat the purity, which, as I understand it, is a parameter of how well the reads fits consensus, or how heterogenous the reads are, right?
I can add to this example that this is actually a PureTarget where the sample has been loaded twice, because we see that RFC1 only get few reads in pathogenic repeats. Here we actually find that the consensus changes and more of the "grey" regions is resolved. But we want to find some measures that can tell our Clinical Laboratory Geneticists that if a result should be trusted or being interpreted with uncertainty.

One more suggestionis to have some of the data from the VCF file added as a legend to the plot figure (maybe you are working on that) and maybe a seperat figure could be added with a histogram?

To answer if we look for heterogeneity, we have case of that as well in FMR1, but in that case we look in both allele and waterfall plot.

Once again, thank you. I will look forward to future updates. This tool has great potential!

@egor-dolzhenko
Copy link
Collaborator

Good to know, thank you! We can help with defining some additional measures / visualizations that might help your geneticists assess a given repeat expansion call. Would you be open to moving this conversation to email so that we can discuss your data in a bit more detail?

To answer your other question, the purity score just measures how close your consensus allele sequence is to a perfect repeat. A purity score of 1.0 means that the allele is a perfect repeat, while a purity score close to 0.0 means that you are dealing with mostly non-repetitive sequence.

@jakobht96
Copy link
Author

Hi Egor,

That would be very helpful. I have send you an email.

// Jakob

@egor-dolzhenko
Copy link
Collaborator

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants