-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is detecting and genotyping Short Tandem Repeats (STRs) challenging? #21
Comments
Thank you for the questions. In many cases identifying and counting motifs in reads is straightforward. But sometimes it gets more complicated because of mosaicim, sequence composition changes, nested repeats, etc... Different tools resolve these challenges in different ways and may also be designed to profile different kinds of repeats. It would makes sense to pick a tool that best aligns with the needs of your project. As for benchmarking, here is a recent paper that proposes a new benchmarking framework designed specifically for tandem repeats. Many groups that work on repeat expansions also sequence some samples with known expansions of repeats they are interested in and then confirm that their tool of choice can detect them. I hope this response is helpful! |
Hi @egor-dolzhenko, |
Thank you for the question. Yes, Truvari is a sequence-level benchmark. In my opinion, evaluating the accuracy of motifs and their counts is a much more elusive task. For example, some tools might count only the exact motif copies while other tools might also detect imperfect motifs. Because of this, different tools might produce very different motif counts which would all be "correct". When it comes to resolving motif counts, it might be best to do a project-specific benchmarking study and pick a tool that best fits the needs of the specific project. |
Hi,
Thank you for developing the excellent
TRGT
tool. I've read your paper "Resolving the unsolved: Comprehensive assessment of tandem repeats at scale". To gain a better understanding, I've also read several other papers on STR detection and genotyping. However, I'm still confused by the following questions:TRGT requires specifying the parameter
--repeats <REPEATS> BED file with reference coordinates and the structure of tandem repeats
. Since we know the structure and location of motifs on the reference genome, what are the challenges in detecting motifs and their repeat counts in reads? What distinguishes TRGT from existing tools like straglr and RepeatHMM?How can we evaluate the performance between TRGT and various STR detection and genotyping tools? Are there established and reliable benchmark datasets available for this purpose?
The text was updated successfully, but these errors were encountered: