-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpreting EHDN motif-based analysis results #49
Comments
Hey @gspirito. Thanks for trying EHdn and I hope that you find it useful! I have been thinking about this question and your approach to testing for an increased burden of repeat expansions using the motif-based rather than locus-based analysis sounds reasonable. However, based on what you described I wouldn’t take this to anything more than suggestive evidence. You might also consider try some other (complementary) approaches, which may help give addition supporting evidence. One idea would be to run a PCA and see if this sample is an outlier compared to the rest of your cohort. You could convert the motif normalised paired-IRR counts to a matrix to to do this. However if you go down this route you may be better served running ExpansionHunter using a genome wide catalog. (@egor-dolzhenko may have some additional thoughts on this.) |
Hi @mfbennett , thank you for the reply, I will try to do some PCAs with the motif normalised paired-IRR counts. Regarding the analysis with ExpansionHunter and a catalog I have a few questions:
Thank you |
Hi @gspirito. You can get a genome-wide STR catalog for ExpansionHunter here: https://github.com/Illumina/RepeatCatalogs/releases/tag/v1.0.0. This catalog contains repeats with similar properties to known pathogenic repeats (polymorphism, complexity of the sequence surrounding the repeat, etc.) You could normalize the read counts by dividing each count by the locus depth (which ExpansionHunter reports) and then multiplying by the target depth. For example, if the number of in-repeat reads is 20 and the locus depth is 32x, the corresponding count normalized to 40x depth is 20 * 40 / 32 = 25. (Note that this very simplistic normalization procedure is best used when the depths are pretty similar in all the samples.) |
Hi @mfbennett, Thank you for your advise. |
Hi, I wanted to ask some questions about the motif-based outlier analysis.
I have a cohort of 40 individuals (WGS), and I suspect that one of them may have am increased burden of repeat expansions compared to the other samples. Since I am not looking for expansions at specific loci I did an outlier motif-based analysis labeling all samples as "case" in the manifest file.
As a result I have that one sample has 44 repeat motifs with Z-score > 3, while all other samples have between 0 and 5 motifs with Z-score > 3. Would it make sense to use this result as suggestive evidence for a general increased burden of repeat expansions in that sample? What would be a suitable Z-score cutoff value?
Thank you in advance.
The text was updated successfully, but these errors were encountered: