Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not capturing ZNF713 after changing coordinates from hg38 to hg19 #19

Open
Tianyibian opened this issue Dec 1, 2023 · 8 comments
Open

Comments

@Tianyibian
Copy link

Hi, I downloaded the 'pathogenic_repeats.hg38.bed' file from (https://github.com/PacificBiosciences/trgt/blob/main/repeats/pathogenic_repeats.hg38.bed).
As all my aligned files were previously aligned to hg19, I used UCSC LiftOver to convert the coordinates from the BED file to hg19 coordinates. I then just followed the example and the program successfully captured almost all the genes, except for ZNF713. For some reason, tandem repeats in this gene are not being genotyped or counted. I confirmed with IGV that my aligned reads do indeed cover the ZNF713 region (which I lifted over from the downloaded BED file), and these reads display a pattern matching CGG(n) repeats. However, the program still fails to detect them. This is confusing since all other genes are processed correctly. I'm unsure about what might be causing this issue.

@hdashnow
Copy link
Collaborator

hdashnow commented Dec 1, 2023

Could you please show the line from the hg19 bed file for ZNF713?

@Tianyibian
Copy link
Author

chr7 55955294 55955333 ID=ZNF713;MOTIFS=CGG;STRUC=(CGG)n

@hdashnow
Copy link
Collaborator

hdashnow commented Dec 1, 2023

That looks fine. Are there reads that completely span the locus? Does the locus appear in the output vcf?

@Tianyibian
Copy link
Author

Hi, No actually they don't. They typically experience a few bps of deletions at the front end. and for the output vcf, the line exsists but the informative parts containing the counts and the genotypes are missing. here is one output vcf from one of the samples I tested.
chr7 55955295 . CGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGT . 0 . TRID=ZNF713;END=55955333;MOTIFS=CGG;STRUC=(CGG)n GT:AL:ALLR:SD:MC:MS:AP:AM .:.:.:.:.:.:.:.

image

@Tianyibian
Copy link
Author

I have also tried to extend a few bp upstream testing with this region
chr7 55955240 55955333 ID=ZNF713;MOTIFS=CGG;STRUC=(CGG)n
but the problem still persists.

@egor-dolzhenko
Copy link
Collaborator

Could you please try running the latest (pre-release) version of TRGT available here with the -vv command line option to enable a more verbose output? Also, are you working with HiFi whole-genome sequencing data? And, if this is permissible, would you be open to sharing a slice of your BAM file containing this repeat? If yes, here is my email.

@Tianyibian
Copy link
Author

Hi, thank you both for your help. I will try to test the pre-released version and post the results later. To answer your question I am using the HiFi WGS data. I will share you with some of the bam file that I tested shortly after as well.
Thank you
Tianyi

@Tianyibian
Copy link
Author

Hi, this problem is solved using the pre-release version of TRGT with the -v option. Yet with the current version of trgt even with the -v option, this strange error of not capturing ZNF713 still exsist.
Thank you so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants