-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difficulty understanding the output from EHDN #48
Comments
Thanks for the questions! You are looking into the supplementary TSV file generated by the "profile" command, right? If yes, the forth column says that the repeat unit (motif) is CCCCCGCCCCGGCCCCGGCCCCGGCCCCGGC... and the fifth column says that EHDN identified just a single read with that motif. This information suggests that the repeat size is close to the read length. The het_str_size is a very approximate repeat size estimate which is made assuming that the long repeat occurs only on one allele. In this case, it equals to 2 meaning that the program roughly estimated the size of the repeat to be 66 * 2 = 132bp. (Though since EHDN identified one in-repeat read, 132bp is likely an underestimate and the true size of the repeat is at least 150bp.) Also note that EHDN determines repeat units from the sequence of the read and, generally, estimation of motifs longer than 15bp can be prone to error. Did I answer your questions? Please let me know if I didn't explain something well. |
Hello @egor-dolzhenko, Thank you so much for providing a detailed explanation. Yes, I am looking at the supplementary tsv file created by the "profile" command. I was actually trying to find a well-known repeat from ALS - (GGCCCC)* located at 9:27573528-27573546 of the C9ORF72 gene. When I ran ExpansionHunter (v5.0.0) on the same data, I got ~300 expansions of the repeat. That should be at least 1800 sized repeat. But EHDN was not able to find it. Is there any other parameter setting that I could try? Thank you. |
@sagnikbanerjee15 Thank you for letting me know. Would it be permissible for you to share parts of EHdn output for this sample? If yes, perhaps we could follow up by email? |
Hello, Would it be possible to provide some support for this issue? Last I spoke with @egor-dolzhenko he was going to send me the strawberry program to better locate the expansion sites. Since then he has moved on. So I was wondering if someone could help me with that. Thanks. |
Hello,
I have executed EHDN on a CRAM file with paired-ended reads of length 150 with the parameters
--min-anchor-mapq 0
and--max-irr-mapq 50
. I started analyzing the locus.tsv file and I came upon the entrychr9 27573523 27573524 CCCCCGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGGCCCCGG 1 0.92 2
. The length of the motif is 66 which is less than 150. But I was under the impression that EHDN would report only those repeats that are longer than the read length. The AnchoredIrrCount for this repeat was 1 and the IrrPairCount was 0.Also, could you please explain what the column
het_str_size
represents?Thank you.
The text was updated successfully, but these errors were encountered: