Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No methylation information was calculated for some reads #299

Open
Wshengquan opened this issue Nov 13, 2024 · 3 comments
Open

No methylation information was calculated for some reads #299

Wshengquan opened this issue Nov 13, 2024 · 3 comments
Labels
troubleshooting workflow and data preparation questions

Comments

@Wshengquan
Copy link

Hi,I used this command to extract methylation information:
modkit pileup ${mod_bam} ${bedmethyl_cpg} --cpg --ref ${ref} --no-filtering
However, I found that the CpG methylation information in the result file did not appear in the form of positive and negative chain pairs. I then went back to the bam file to see if I didn't have enough data and didn't have enough reads to map to this location. Take this location for example:
5442cfec84055d16a98ee6c5931f228
The resulting file revealed only one reads with methylation information and it was on the positive chain. But when I looked at the bam file, there were 26 reads in this location with both positive and negative chains, and each one carried labels with methylation information: MM and ML.
fe100edd2a538a6afada0b2e4a6fe45
Why did this happen
Thanks a lot for your help

@ArtRand
Copy link
Contributor

ArtRand commented Nov 13, 2024

Hello @Wshengquan,

Do you mean that you never see the (-)-strand C calls or just not always? I usually see a pattern like this:

chr20   66405   66406   h       9       +       66405   66406   255,0,0 9       0.00    0       9       0       0       0       0       6
chr20   66405   66406   m       9       +       66405   66406   255,0,0 9       0.00    0       9       0       0       0       0       6
chr20   66406   66407   h       7       -       66406   66407   255,0,0 7       0.00    0       7       0       4       0       4       1
chr20   66406   66407   m       7       -       66406   66407   255,0,0 7       0.00    0       7       0       4       0       4       1
chr20   66594   66595   h       8       +       66594   66595   255,0,0 8       0.00    0       8       0       0       0       0       9
chr20   66594   66595   m       8       +       66594   66595   255,0,0 8       0.00    0       8       0       0       0       0       9
chr20   66595   66596   h       7       -       66595   66596   255,0,0 7       0.00    0       5       2       0       0       9       0
chr20   66595   66596   m       7       -       66595   66596   255,0,0 7       28.57   2       5       0       0       0       9       0
chr20   66830   66831   h       7       +       66830   66831   255,0,0 7       0.00    0       3       4       0       0       0       7
chr20   66830   66831   m       7       +       66830   66831   255,0,0 7       57.14   4       3       0       0       0       0       7
chr20   66831   66832   h       8       -       66831   66832   255,0,0 8       0.00    0       2       6       0       0       3       1
chr20   66831   66832   m       8       -       66831   66832   255,0,0 8       75.00   6       2       0       0       0       3       1
chr20   67134   67135   h       7       +       67134   67135   255,0,0 7       0.00    0       2       5       0       0       0       5
chr20   67134   67135   m       7       +       67134   67135   255,0,0 7       71.43   5       2       0       0       0       0       5
chr20   67135   67136   h       5       -       67135   67136   255,0,0 5       20.00   1       2       2       2       0       3       1
chr20   67135   67136   m       5       -       67135   67136   255,0,0 5       40.00   2       2       1       2       0       3       1
chr20   67856   67857   h       6       +       67856   67857   255,0,0 6       0.00    0       1       5       0       0       0       1
chr20   67856   67857   m       6       +       67856   67857   255,0,0 6       83.33   5       1       0       0       0       0       1
chr20   67857   67858   h       7       -       67857   67858   255,0,0 7       28.57   2       0       5       0       0       1       0
chr20   67857   67858   m       7       -       67857   67858   255,0,0 7       71.43   5       0       2       0       0       1       0
chr20   67858   67859   h       6       +       67858   67859   255,0,0 6       0.00    0       1       5       0       0       0       1
chr20   67858   67859   m       6       +       67858   67859   255,0,0 6       83.33   5       1       0       0       0       0       1
chr20   67859   67860   h       6       -       67859   67860   255,0,0 6       0.00    0       1       5       1       0       0       1
chr20   67859   67860   m       6       -       67859   67860   255,0,0 6       83.33   5       1       0       1       0       0       1

with a command like yours. To debug that particular position, could you produce a plot from IGV (or a similar browser) at that position with the input modBAM and modification coloring turned on? You may also run modkit extract calls ${bam} --include-bed ${bed} where the bed file has contents like this

1 15189 15190 . 0 -

which will extract only the reads aligning to the negative strand at that position. I bet the genome browser shot will help a lot.

@ArtRand ArtRand added the troubleshooting workflow and data preparation questions label Nov 13, 2024
@Wshengquan
Copy link
Author

I mean for cpg methylation sites, there are always pairs of positive and negative chains.
For example, in your results, here is a pair of cpg methylation sites:
chr20 66405 66406 h 9 + 66405 66406 255,0,0 9 0.00 0 9 0 0 0 0 6
chr20 66405 66406 m 9 + 66405 66406 255,0,0 9 0.00 0 9 0 0 0 0 6
chr20 66406 66407 h 7 - 66406 66407 255,0,0 7 0.00 0 7 0 4 0 4 1
chr20 66406 66407 m 7 - 66406 66407 255,0,0 7 0.00 0 7 0 4 0 4 1
But in my results, sometimes there are cpg sites with only positive or only negative chains,, such as the sites in my diagram:
image
I thought that due to my insufficient data, there were no redundant reads on the negative chain, but from my bam file, there were reads on the negative chain at this site, but modkit does not seem to recognize these reads. I also checked to see if any of these reads were labeled MM and found that they were.
image
So my question is why does modkit not recognize reads with methylation labels

@ArtRand
Copy link
Contributor

ArtRand commented Nov 21, 2024

Hello @Wshengquan,

It is possible that you have reads covering the position on the negative strand but for one reason or another they don't have modification calls at that position. My suggestion is to look at one of these positions where you don't see a pair of (+)-strand and (-)-strand modification calls on IGV (or modkit extract full). Sometimes you'll have a C>D mismatch preventing the modification calling model to make a prediction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
troubleshooting workflow and data preparation questions
Projects
None yet
Development

No branches or pull requests

2 participants