Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large expansions without reads covering the entire repeat region #44

Open
gabeng opened this issue Sep 13, 2024 · 4 comments
Open

Large expansions without reads covering the entire repeat region #44

gabeng opened this issue Sep 13, 2024 · 4 comments

Comments

@gabeng
Copy link

gabeng commented Sep 13, 2024

Hi,

Thank you for this great tool and for diligently answering questions here. If I understand correctly, TRGT only considers reads in the analysis that span the entire repeat region. It therefore fails to report any haplotype where the repeat expansion exceeds the read length - is that correct?
Are there any plans to also report evidence about these large repeats in the future? You could give at least a lower limit of its size.

Regards,
Ben

@egor-dolzhenko
Copy link
Collaborator

Hi Ben,

Thanks for a great question. That's right, TRGT only uses reads that span the entire repeat region. We are definitely planning to add support for reads that partially overlap the repeat in the future versions of TRGT.

Do you by any chance have a sample with a known pathogenic repeat expansion exceeding HiFi read length? Originally we planned to add support for very long repeats much earlier, but then it turned out that all very long expansions we had access to were detectable with the current TRGT approach. Perhaps there is a tendency for long repeat expansions to be highly mosaic and hence allowing us to fully capture the expanded alleles within 15Kb+ reads? (This of course applies to known pathogenic repeats and not to other very long repeats in the human genome.)

Best wishes,
Egor

@gabeng
Copy link
Author

gabeng commented Sep 14, 2024

Hi Egor,

thanks for the quick response. I was hoping that you'd add this functionality.
I am looking at hybrid capture data with an average read length of 3..4kb. Maybe if I extract the reads around the repeat I can share the data. I have to check.
I also noticed that there are very, very few reads supporting the presence of a large expansion (compared to the other allele). My first guess was a selection in the library prep/capture process. But I cannot rule out mosaicism. It's interesting that you see that correlation in whole genome data. I do not know the exact size of the expansions, just a lower limit.

I am going to shelf my validation data for now, but will be happy to pick this up when you make modifications to the algorithm.

Thanks again!
Ben

@egor-dolzhenko
Copy link
Collaborator

Hi Ben,

I see, thanks. Does your hybrid capture protocol involve PCR amplification? In my experience, PCR can lead to complete or nearly complete dropout of the expanded alleles. If you’d like, we could create a one-off version of TRGT that uses flanking reads to help evaluate your data. Let’s connect by email if this is something you’d like to explore?

Best wishes,
Egor

@gabeng
Copy link
Author

gabeng commented Sep 16, 2024

Hi Egor,

yes, like probably every hybrid capture protocol this one includes a few cycles of PCR. Under-representation of expanded alleles has to be expected, you are right. That is why we are looking at cranking up sensitivity as much as possible. We will have to monitor the effect on precision.
I'd be happy to test any development version that you can throw at me. Thanks!

Regards,
Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants