The difference between two genotyper algorithms #50

ywzhang071394 · 2024-12-04T18:37:06Z

Hi,

Thank you for the nice tool! Could you give a introduction on the two genotyper algorithms "size" and "cluster".
I looked through your paper and github page but did not find any related info.

Thanks

pbsena · 2024-12-04T19:24:53Z

Hello,

we recommend the use of the cluster genotyper when there are not many repeats to genotype. The cluster genotyper will compare all STR sequences with each other and cluster them, whereas the size genotyper splits reads based on the STR sequence size in each read to maximize the difference between alleles to the difference within them. The cluster genotyper uses more information but is significantly slower, so for WGS applications we recommend using size. The size genotyper is described in the TRGT manuscript, whereas the cluster genotyper algorithm paper is still a work in progress.

Hope this helps! Happy to clarify further.

ywzhang071394 · 2024-12-04T20:41:11Z

Thank you so much!
We previously detected a wired repeat expansion event based on the 'size' algorithm, as shown below. The alternative sequence (human T2T reference) is the actual downstream sequence of "AATTTTCTATTTTTATTTTTATTTTT". How can it be classified into an expansion event? Could you help check it?
Thanks!

chr2 74292837 . AATTTTCTATTTTTATTTTTATTTTT AATTTTCTATTTTTATTTTTATTTTTGTAGAGACGAGGTCTTCCTATGTTGTCCAGGCTGGTCTTGAACTCCTGGGCTCAAGCAATCTGCCTGTCTTGGCCTCTCAAAATGTTGGCATTACAGGCATGAGCCACTGTGCCTAGCCCTTATTCCTTATTTCTTTTTTTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCCAGGTCGGACTGCGGACTGCAGTGGCGCAATCTCGGCTCACTGCAAGCTCCGCTTCCCGGGTTCACGCCATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCGCGCCCGGCTAATTTTTTTTTTGTATTTTTA 0 . TRID=chr2_74292837_74292862_trsolve;END=74292862;MOTIFS=TATTTT;STRUC=(TATTTT)n GT:AL:ALLR:SD:MC:MS:AP:AM 0/1:25,359:23-25,358-361:70,7:4,14:0(0-25),0(0-24)_0(146-184)_0(337-359):0.88,0.189415:.,0.69

and we cannot find any supporting reads in IGV.

my code is
/software/trgt genotype -g /data/human/t2t_chm13/chm13v2.0.fa -r /Longread/t2t/${i}_Tumor/04_phase/tumor_haplotagSV.bam -b /data/human/t2t_chm13/chm13v2.0_maskedY_rCRS.platinumTRs-v1.0.trgt.bed -k XY --output-prefix ${path}/Tumor_cluster -t 10

pbsena · 2024-12-05T12:02:46Z

Hello,

Without looking at the reads, I'd guess, since this region is repetitive, that, when looking for flanking bases, some reads aligned to the repetitive sequence donwstream of this repeat. Maybe try to increase the --flank-len parameter to, say, 500?

ywzhang071394 · 2024-12-05T17:57:16Z

Thank you for the suggestion. I tested the cluster algorithm, and found that this false positive event was absent. It seems that the cluster algorithm improves the repeat expansion detection a lot and is not such time-consuming as I expected.
Thanks a lot!

egor-dolzhenko · 2024-12-06T00:02:40Z

@ywzhang071394, thank you for letting us know. Would you be open to sharing a waterfall plot of this repeat with us? We could do it by email if you prefer.

ywzhang071394 · 2024-12-12T04:24:39Z

Hi,

Sorry I need to reopen this issue, since I want to debug the size algorithm. I tried to increase --flank-len parameter to 500, but nothing was improved. Could you help comment on this?

ywzhang071394 closed this as completed Dec 5, 2024

ywzhang071394 reopened this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The difference between two genotyper algorithms #50

The difference between two genotyper algorithms #50

ywzhang071394 commented Dec 4, 2024

pbsena commented Dec 4, 2024

ywzhang071394 commented Dec 4, 2024 •

edited

Loading

pbsena commented Dec 5, 2024

ywzhang071394 commented Dec 5, 2024

egor-dolzhenko commented Dec 6, 2024

ywzhang071394 commented Dec 12, 2024

The difference between two genotyper algorithms #50

The difference between two genotyper algorithms #50

Comments

ywzhang071394 commented Dec 4, 2024

pbsena commented Dec 4, 2024

ywzhang071394 commented Dec 4, 2024 • edited Loading

pbsena commented Dec 5, 2024

ywzhang071394 commented Dec 5, 2024

egor-dolzhenko commented Dec 6, 2024

ywzhang071394 commented Dec 12, 2024

ywzhang071394 commented Dec 4, 2024 •

edited

Loading