-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
m6A methylation analysis and explanation of output files #276
Comments
Hello @JhinJhinJhin, I apologize for the slow response, I've been OOO. Your commands for
When you use
As far as the scoring, please refer to the documentation on the models, for your application I suggest looking at the MAP-based p-value. You can sort the results either by the |
Thank you for your patient reply! But I still have some questions.
After I get the entropy, then how do I calculate if the hughs from two different samples are significantly different on a transcript? And I want to detect |
Hi, I'm also using modkit for DRS analysis, but I'm unsure about how to determine the filter threshold and --mod-thresholds for different base modifications (m6A, m5C, pseU). Is this the correct code to use?
If I don't specify the 'filter threshold' and '--mod-thresholds' parameters, what are the default threshold values? |
Hello @JhinJhinJhin,
I don't have a worked out solution for this, adding differential entropy detection is something I'm working on. I would start with plotting the distribution of difference in entropy between your samples.
You can use |
Hello @Taylorain, Your command looks correct. It will estimate the probability distributions for each of the modified primary bases and their respective chemical modifications. The table (schema here) can be used to determine a threshold value for each modified base and canonical call. There is documentation describing how to use the |
@ArtRand I get, thanks pretty much! |
Hello, @JhinJhinJhin, @ArtRand
As for your second question: "Detecting differential modification at single base positions" Using the m6A Dorado model, a site is considered significantly differentially modified if there is a statistically significant difference in modification rates between two conditions (e.g., condition A vs. condition B). The xPore documentation outlines an approach using Z-statistics to determine significance, although direct xPore analysis isn’t possible with Dorado basecalling outputs (e.g., .bam files from nanopore RNA/DNA-seq). However, you can apply similar statistical approaches in R or any language of your choice. I would say processing your modkit pileup results to generate .bed files for each sample, then applying statistical tests (e.g., p-values for each site) to filter for significant sites. If it helps, I have developed a pipeline that does these tasks, including identifying significant modification sites, motif detection (like DRACH or others), and standard RNA-seq analysis. Let me know if you'd like more details. |
Hello, @tamalarafat . |
Hello, @JhinJhinJhin |
Hello, @tamalarafat |
methylation distribution on read
I want to analyze the RNA m6A modification. I selected m6A_DRACH model when dorado basecaller.
Then I use modkit for downstream analysis.
modkit pileup $bam \ ${id}_pileup.bed \ --ref $ref \ -t 12 --log-filepath ${id}_pileup.log
bgzip -@ 4 -k $bed
tabix -@ 4 -p bed ${bed}.gz
modkit motif bed DM8.1.fa DRACH 2 1> DM8_DRACH.bed
modkit extract calls -t 8 $bam extract_calls/SDL-1_ref_calls.tsv --ref $genome --include-bed DM8_DRACH.bed
I use
modkit extract calls
command to generate a table of read-level base modification calls.read_id forward_read_position ref_position chrom mod_strand ref_strand ref_mod_strand fw_soft_clipped_start fw_soft_clipped_end read_length call_prob call_code base_qual ref_kmer query_kmer canonical_base modified_primary_base fail inferred within_alignment flag
ad2e3781-d1bf-4457-a6e5-e48638151a51 1177 901581 chr01 + - - 0 87 1299 0.9941406 - 24 TGTTT AAACA A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 1082 901676 chr01 + - - 0 87 1299 0.9980469 - 28 TGTCA TGACA A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 1008 901750 chr01 + - - 0 87 1299 0.9980469 - 23 GGTTT AAACC A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 955 901803 chr01 + - - 0 87 1299 0.9433594 - 12 TGTTC GAACA A A true false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 887 901881 chr01 + - - 0 87 1299 0.9980469 - 17 TGTCA TGACA A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 878 901890 chr01 + - - 0 87 1299 0.9980469 - 23 AGTTT AAACT A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 809 901960 chr01 + - - 0 87 1299 0.9980469 - 20 AGTCC GGACT A A false false true 16
ad2e3781-d1bf-4457-a6e5-e48638151a51 786 901983 chr01 + - - 0 87 1299 0.8808594 - 19 AGTCA TGACT A A true false true 16
c90cf7b9-674f-46d6-bfa4-936058b6ea0e 126 971421 chr01 + - - 0 68 1258 0.8222656 a 23 TGTTA TAACA A A true false true 16
c90cf7b9-674f-46d6-bfa4-936058b6ea0e 54 972330 chr01 + - - 0 68 1258 0.9980469 - 15 AGTTT AAACT A A false false true 16
c90cf7b9-674f-46d6-bfa4-936058b6ea0e 2 972382 chr01 + - - 0 68 1258 0.9980469 - 19 TGTCA TGACA A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 253 901581 chr01 + - - 2 80 340 0.9980469 - 19 TGTTT AAACA A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 176 970230 chr01 + - - 2 80 340 0.9980469 - 30 GGTCT AGACC A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 163 970243 chr01 + - - 2 80 340 0.9902344 - 31 TGTTA TAACA A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 89 970317 chr01 + - - 2 80 340 0.9980469 - 17 GGTTT AAACC A A false false true 16
6ade67ef-11c0-4ea6-8755-cbc0da1a4f1f 36 970370 chr01 + - - 2 80 340 0.9980469 - 27 TGTTC GAACA A A false false true 16
I would like to know what conditions satisfy the site that can be considered as a modified site on the read. Is it when 'fail' and 'inferred' are both false? And I’m not quite sure what ‘implicit canonical’ means either.
Detecting differential modification at single base positions
modkit dmr pair
-a SDL-1_pileup.bed.gz
-a SDT-1_pileup.bed.gz
-b SDL-2_pileup.bed.gz
-b SDT-1_pileup.bed.gz
-o SLSD_dmr/SLSD
--ref $genome
--base A
-t 8
-f
--log-filepath SLSD_dmr/dmr_multi.log
SLSD:
chr01 428377 428378 . -0.2595111954850844 a:0 1 a:0 1 a:0.00 a:0.00 0 0 1 0 1 0 50 50 1 0
chr01 428819 428820 . 0.25131442828090655 a:0 1 a:1 2 a:0.00 a:50.00 0 0.5 0.9321833846700683 -0.5 0.9321833846700683 -0.5 50 100 - -
chr01 428876 428877 . -0.29257296267155475 a:3 4 a:3 4 a:75.00 a:75.00 0.75 0.75 1 0 1 0 100 100 1,1 0,0
chr01 428877 428878 . -0.2595111954850844 a:1 1 a:1 1 a:100.00 a:100.00 1 1 1 0 1 0 50 50 1 0
chr01 428912 428913 . -0.3006875128982047 a:0 2 a:0 2 a:0.00 a:0.00 0 0 1 0 1 0 100 100 1,1 0,0
chr01 428921 428922 . -0.2595111954850844 a:0 1 a:0 1 a:0.00 a:0.00 0 0 1 0 1 0 50 50 1 0
chr01 428926 428927 . -0.27243263335211054 a:18 26 a:25 34 a:69.23 a:73.53 0.6923077 0.7352941 1 -0.042986425339366585 1 0.015837104072398134 100 100 1,1 -0.04029304029304026,0
chr01 428934 428935 . -0.33202943912385763 a:0 10 a:0 5 a:0.00 a:0.00 0 0 1 0 1 0 100 100 1,1 0,0
chr01 428944 428945 . 0.3431471740471608 a:45 45 a:45 46 a:100.00 a:97.83 1 0.9782609 1 0.021739130434782594 1 0.021739130434782594 100 100 1,1 0.0357142857142857,0
chr01 428951 428952 . 1.1360555327687223 a:11 36 a:5 36 a:30.56 a:13.89 0.30555555 0.1388889 0.1987868090862803 0.16666666666666669 0.17709312475070266 0.16666666666666669 100 100 0.10617958219713411,1 0.3000000chr01 429017 429018 . -0.2769449125416088 a:49 63 a:45 60 a:77.78 a:75.00 0.7777778 0.75 1 0.02777777777777779 1 0.04086021505376347 100 100 0.8874779988874301,1 0.07027649769585254,0
chr01 429027 429028 . -0.20933341083340906 a:37 54 a:38 52 a:68.52 a:73.08 0.6851852 0.7307692 1 -0.045584045584045496 1 -0.026353276353276334 100 100 0.781323132265617,1 -0.08333333333333337,0
chr01 429028 429029 . 0.24018299513881303 a:11 14 a:11 18 a:78.57 a:61.11 0.78571427 0.6111111 0.5489948392453672 0.17460317460317454 0.8185923129030523 0.10317460317460314 100 100 0.3876206189317763,1 0.3555555chr01 429029 429030 . 0.25131442828090655 a:0 1 a:1 2 a:0.00 a:50.00 0 0.5 0.9321833846700683 -0.5 0.9321833846700683 -0.5 50 100 - -
chr01 429077 429078 . -0.3295915950843096 a:44 61 a:47 64 a:72.13 a:73.44 0.72131145 0.734375 1 -0.01306352459016391 1 -0.017708333333333326 100 100 1,1 -0.025252525252525193,0
chr01 429078 429079 . -0.2776317365982791 a:0 1 a:0 2 a:0.00 a:0.00 0 0 1 0 1 0 50 100 - -
chr01 429192 429193 . -0.2584535893156925 a:14 52 a:15 49 a:26.92 a:30.61 0.26923078 0.30612245 1 -0.036891679748822626 1 -0.04326923076923078 100 100 0.8664138256652923,1 -0.10287081339712917,0
chr01 429196 429197 . 0.2908615565132848 a:5 16 a:2 14 a:31.25 a:14.29 0.3125 0.14285715 0.5291319892925755 0.16964285714285715 0.403690372946813 0.17857142857142858 100 100 0.21078940602940324,1 0.428571428571428chr01 429201 429202 . -0.3076466561820299 a:0 3 a:0 2 a:0.00 a:0.00 0 0 1 0 1 0 100 50 - -
chr01 429207 429208 . -0.3101721989665336 a:1 31 a:1 27 a:3.23 a:3.70 0.032258064 0.037037037 1 -0.004778972520908004 1 -0.005128205128205131 100 100 1,1 -0.024242424242424246,0
chr01 429218 429219 . -0.2994907503308539 a:24 27 a:30 33 a:88.89 a:90.91 0.8888889 0.90909094 1 -0.02020202020202022 1 0.016826923076923128 100 100 1,1 0,0
chr01 429223 429224 . -0.26578015463795346 a:1 15 a:1 10 a:6.67 a:10.00 0.06666667 0.1 1 -0.03333333333333334 1 -0.02857142857142858 100 100 1,1 0,0
chr01 429224 429225 . -0.3006875128982047 a:0 2 a:0 2 a:0.00 a:0.00 0 0 1 0 1 0 100 100 1,1 0,0
chr01 429229 429230 . 0.6238435756252585 a:0 17 a:1 11 a:0.00 a:9.09 0 0.09090909 1 -0.09090909090909091 1 -0.1 100 100 0.5762151737229898,1 -0.3333333333333333,0
chr01 429234 429235 . -0.1418632197203351 a:3 6 a:4 6 a:50.00 a:66.67 0.5 0.6666667 0.8376587524553095 -0.16666666666666663 0.837695883342009 -0.16666666666666669 100 100 0.7710310269619939,1 -0.25,0
chr01 429239 429240 . 0.0001373942800455552 a:678 701 a:700 730 a:96.72 a:95.89 0.9671897 0.9589041 1 0.0102679011129716 1 0.0102679011129716 100 100 1,1 0.020855904658721558,-0.00029547916871863755
chr01 429250 429251 . -0.27925811985412086 a:4 5 a:3 4 a:80.00 a:75.00 0.8 0.75 1 0.050000000000000044 0.7710310269619939 0.25 100 100 0.9321833846700683,1 0.5,0
chr01 429253 429254 . -0.2595111954850844 a:0 1 a:0 1 a:0.00 a:0.00 0 0 1 0 1 0 50 50 1 0
chr01 429264 429265 . 0.25131442828090655 a:1 2 a:0 1 a:50.00 a:0.00 0.5 0 0.9321833846700683 0.5 0.9321833846700683 0.5 50 50 0.9321833846700683 0.5
chr01 429330 429331 . -0.18012700383224 a:479 656 a:497 694 a:73.02 a:71.61 0.73018295 0.7161383 1 0.012090022653402976 1 0.022678026199152934 100 100 1,1 0.043030674504778044,0.001280409731113985
I don’t know which sites can be considered as sites with significant differential methylation . Can you give me an example?
The text was updated successfully, but these errors were encountered: