You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Thanks for your wonderful tools TRGT. Currently I'm using the version1.1.1 with the TR catalog from Platinum Tandem Repeats, and have some problems in the AP field of output-vcf.
One of the TR result like this:
The AP of this TR is "0.145455,0.145455" which is a very low value comparing to most of other TR. And the motif ACCC has two repeats in two places, which are separated by a long sequence compared with the length of motif.
I would like to know how the AP is calculated here, 0.145455 seems like the result of 4*4(ACCC)/110(length of whole suquence), and whether the user should be warned in the output that there is a big break in the repetition of this TR? Because there are also some other STR results that retrieve all parts of a long sequence that match the motif, but are not actually "tandem", result in low AP values as well.
The text was updated successfully, but these errors were encountered:
Thank you for the question. The AP / purity field is meant to indicate how close an allele sequence is to being a perfect repeat composed of the specified motif(s). (The actual algorithm is based on computing the edit distance between the given sequence and the corresponding perfect repeat.) It sounds like your understanding is correct: When the purity is low, the allele sequence contains a small number of perfect motif copies relative to its length. This can occur in several ways: the allele can contain a few perfect motif copies with the rest of the sequence not matching the motif at all; or there could be many imperfect motif copies scattered throughout the repeat sequence. The information about the location of these matches can be found in the MS field (described here). I agree that it would be convenient to add additional output fields that summarize different repeat configurations (especially for low purity repeats like in your example). This is something that we are continuing to work on. Did I answer your question?
Hello,
Thanks for your wonderful tools TRGT. Currently I'm using the version1.1.1 with the TR catalog from Platinum Tandem Repeats, and have some problems in the AP field of output-vcf.
One of the TR result like this:
sequence:
GCCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATCTCCCCTCATCACCTCCCCAGCCAC
and the plot:
The AP of this TR is "0.145455,0.145455" which is a very low value comparing to most of other TR. And the motif
ACCC
has two repeats in two places, which are separated by a long sequence compared with the length of motif.I would like to know how the AP is calculated here,
0.145455
seems like the result of4*4(ACCC)/110(length of whole suquence)
, and whether the user should be warned in the output that there is a big break in the repetition of this TR? Because there are also some other STR results that retrieve all parts of a long sequence that match the motif, but are not actually "tandem", result in low AP values as well.The text was updated successfully, but these errors were encountered: