To predict RNA modification stoichiometry using Nanopolish, nanoRMS will need: i) Nanopolish eventalign output files, and ii) a list of predicted candidate RNA modification sites. It then performs the following steps:
-
- Collapse Nanopolish eventalign output
-
- Convert Nanopolish eventalign outputs into processed output for each 15-mer region
-
- Visualization of the per-read results (PCA, per-read current intensities) -- optional step, but highly recommended to see how your samples look like in terms of modified/unmodified reads
-
- Stoichiometry prediction, using either KMEANS or KNN.
Before you start, you first need to run Nanopolish index and Nanopolish eventalign on the raw FAST5 reads using the following command line:
nanopolish index -d fast5_file fastq_file -s sequencing_summary_file
#Index file should be in the same path as the fastq file
nanopolish eventalign \
--reads path/to/fastq_file \
--bam path/to/bam_file \
--genome path/to/reference_file.fasta \
--scale-events > output.txt
## Convert BAM to TSV
samtools view -h bamfile.bam | java -jar sam2tsv.jar -r reference_file > bamfile.bam.tsv
## Run EpiNano
python3.7 TSV_to_Variants_Freq.py3 -f bamfile.bam.tsv -t 10
Example using test data:
samtools view -h test_data/test.bam | java -jar sam2tsv.jar -r test_data/yeast_rRNA_ref > test.bam.tsv
python3.7 TSV_to_Variants_Freq.py3 -f test.bam.tsv -t 10
Rscript summed_errors.R <epinano_file> <summed_errors_output_file>
Example using test data:
Rscript summed_errors.R test_data/wt_epinano.csv wt
- a) Scatterplot representations
Rscript epinano_scatterplot.R <input1> <label1> <input2> <label2> <feature>
## Feature can be: "mis", "ins", "del", "q_mean", "q_median", "q_std" and "sum" if you are using the input with sum errors
Example using test data:
Rscript epinano_scatterplot.R test_data/wt_epinano.csv wt test_data/sn34ko_epinano.csv sn34ko mis
- b) Per-transcript representations
Rscript epinano_barplot.R input1 label1 input2 label2 feature
Example with test data:
Rscript epinano_barplot.R test_data/wt_epinano.csv wt test_data/sn34ko_epinano.csv sn34ko mis
Result: There are two regions that show distinct mismatch profiles when comparing WT and snR34-KO, one centered in 25s_rRNA:2880 and one centered in 25s_rRNA:2826. These are actually the two exact locations that are expected to be affected by snR34 depletion. You can check predicted target sites of snR34 as well as of other snoRNAs in yeast here.
Generate a collapsed Nanopolish event align output, by collapsing all the multiple observations for a given position from a same read.
python3 per_read_mean.py <event_align_file>
Example using test data:
python3 per_read_mean.py test_data/data1_eventalign_output.txt
The output of Nanopolish event align generated in the previous step is used as input in this script.
Rscript --vanilla nanopolish_window.R positions_file <input_table> <label>
Example using test data:
Rscript --vanilla nanopolish_window.R test_data/positions test_data/data1_eventalign_output.txt_processed_perpos_mean.csv data1
Rscript --vanilla density_nanopolish.R <window_file1> <window_file2> <window_file3(optional)> <window_file4(optional)>
Example using test data:
Rscript --vanilla nanopolish_density_plot.R test_data/sn34_window_file.tsv test_data/wt_window_file.tsv
Rscript --vanilla nanopolish_meanlineplot.R <window_file1> <window_file2> <window_file3(optional)> <window_file4(optional)>
Example using test data:
Rscript --vanilla nanopolish_meanlineplot.R test_data/sn34_window_file.tsv test_data/wt_window_file.tsv
Rscript --vanilla nanopolish_perreadlineplot.R <window_file1> <window_file2> <window_file3(optional)> <window_file4(optional)>
Example using test data:
Rscript --vanilla nanopolish_perreadlineplot.R test_data/sn34_window_file.tsv test_data/wt_window_file.tsv
Rscript --vanilla nanopolish_pca.R <window_file1.tsv> <window_file2.tsv> <window_file3.tsv(optional)> <window_file4.tsv(optional)>
Example using test data:
Rscript --vanilla nanopolish_pca.R test_data/sn34_window_file.tsv test_data/wt_window_file.tsv
Rscript --vanilla nanopolish_export_each_position.R <window_file1.tsv> <window_file2.tsv> <window_file3.tsv(optional)> <window_file4.tsv(optional)>
Example using test data:
Rscript --vanilla nanopolish_export_each_position.R test_data/bc2_window_file.tsv
R --vanilla < read_clustering.R --args <file1.tsv> <file2.tsv> kmeans
Example using test data:
R --vanilla < visualization_per_read/read_clustering.R --args visualization_per_read/test_data/25s_2880.wt.15mer.perread.h.tsv visualization_per_read/test_data/25s_2880.sn34.15mer.perread.h.tsv kmeans
R --vanilla < read_clustering.R --args <file1.ts>v <file2.tsv> kmeans_pca
Example using test data:
R --vanilla < visualization_per_read/read_clustering.R --args visualization_per_read/test_data/25s_2880.wt.15mer.perread.h.tsv visualization_per_read/test_data/25s_2880.sn34.15mer.perread.h.tsv kmeans_pca
Important note: this option should be only used when 0% and 100% (or similar) modified datasets are available for training, e.g. rRNAs coupled with KO condition.
R --vanilla < read_clustering.R --args <file1.tsv> <file2_UNM.tsv> <validation_set.tsv(optional)> knn
Example using test data:
R --vanilla < visualization_per_read/read_clustering.R --args visualization_per_read/test_data/25s_2880.wt.15mer.perread.h.tsv visualization_per_read/test_data/25s_2880.sn34.15mer.perread.h.tsv knn
Example using test data that includes predictions in independent validation data:
R --vanilla < visualization_per_read/read_clustering.R --args visualization_per_read/test_data/25s_2880.wt.15mer.perread.h.tsv visualization_per_read/test_data/25s_2880.sn34.15mer.perread.h.tsv visualization_per_read/test_data/25s_2880.sn3.15mer.perread.h.tsv visualization_per_read/test_data/25s_2880.sn36.15mer.perread.h.tsv knn_validation
(Note: PCA is not used for stoichiometry prediction when using KNN - the PCA has only been used for illustration purposes)