-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modkit calling methylation on incompatible nucleotide #286
Comments
Hello @Ge0rges, Could you send me an IGV shot of "Region 2" ( Maybe email me a modBAM of the reads that align to that region so I can check the tags ( |
Here's the summary. Ran |
Thanks. The summaries look fine, you're not getting 4mC on A bases in the tags. My current hypothesis is that you have a lot of reads with cytosines mapping to a reference adenine and bringing along the 4mC call. For the IGV shot, could you add the modBAM of alignments? |
Here it is. Looks like perhaps this is more of an issue with my assembly than modkit? |
Looks like there is a pretty consistent T->G (reverse mapped A->C) transversion there. You have a couple more in this screen shot. That's actually pretty interesting. Have you tried polishing the assembly with medaka's new |
I have not tried the new polisher. Not against trying it. I can't really tell if the error is strand-biased, maybe a little? Do you have any understanding of how the polishing performs in these methylated cases? I could run this one through the polisher as a test. |
Yeah, I agree, it's not completely strand-biased but that doesn't mean it's not a methylation-induced error. The new medaka model is designed to fix cases where methylation causes errors, it's pretty quick - so it might be worth a shot. Plus you could look back at this position and see if the assembly changes. |
Alright, I'm running it. So if I'm following you, the modkit output assuming this reference is correct? |
Correct. You'll have to re-align the reads to the polished assembly (I think medaka might do this for you). Then use those alignments in modkit. But first I'd just look at this region on IGV and see if the assembly has changed. |
So modkit doesn't strictly require the reference to have a compatible base then? I'm trying to understand how to interpret this output while I wait for the medaka result. |
As you've seen |
Got it. So if I had continue this analysis with the current assembly, a simplification would be to pass a filter over the pileup to enforce the modification/base to match. I'm interested to see how Medaka performs! It might take 48h still as I don't have a GPU to run it on. Will keep you posted here. |
Sounds good. I've considered adding an argument to |
Yeah I definitely don't think it should be a default. That column might be a good in-between. For the longest time I was doing analyses that weren't sequence aware, which is why I only recently found this (I started looking at methylation around start codons). |
So after running medaka, it looks like at least at the position considered above there was no change. |
Hi @ArtRand,
Let me preface this by saying this may be a mistake on my end. However after checking, I have a suspicion that modkit is setting an incompatible methylation call/nucleotide pairing.
Here's my example.
Consider the following (concatenated, filtered) bed file of a certain example regions:
Region 1:
Region 2:
Here's the sequence of this contig:
So, the region is on the negative strand, so I first slice the sequence to
seq[3259: 3262]
obtainTCT
, whose complement is thenAGA
for region 1. Region 2 I getAT
.The text was updated successfully, but these errors were encountered: