Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel_reditools.py error #9

Open
ojziff opened this issue Jun 8, 2021 · 2 comments
Open

parallel_reditools.py error #9

ojziff opened this issue Jun 8, 2021 · 2 comments

Comments

@ojziff
Copy link

ojziff commented Jun 8, 2021

Dear @tizianoflati @tflati

I am very keen to use REDItools2, which seems a fantastic tool for analysing RNA editing with very useful instructions.

Although I have got the serial mode working I am keen to use the parallel version to save time as I am scanning the whole transcriptome for RNA editing – however I am unable to get it to run. I am getting this error within 5 seconds:

Traceback (most recent call last):
  File "/camp/lab/luscomben/home/users/ziffo/bin/reditools2.0/src/cineca/parallel_reditools.py", line 183, in <module>
    if not os.path.isfile(coverage_file):
  File "/camp/home/ziffo/.conda/envs/mypython2.7/lib/python2.7/genericpath.py", line 37, in isfile
    st = os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

I am using a slurm scheduling HPC cluster and a virtual conda environment. I ran the script using:

FASTA=/genomes/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa
sbatch -N 1 -c 8 --mem=40G -t 24:00:00 --wrap="parallel_reditools.py -f mn_nuc_d35_ctrl.bam -r $FASTA -o REDItools.para.nuc_ctrl.rna_table.txt -s 2 --strict" --mail-type=ALL,ARRAY_TASKS [email protected] --job-name=REDItoolspara

It is worth noting that when running the parallel test I got error SRR2135332.chr21.cov not existing:

Launching REDItool on SRR2135332 (output_file=test_results/output/parallel_table.txt.gz)
Tue  8 Jun 09:48:31 BST 2021
[STATS] [COVERAGE] START=Tue 8 Jun 09:48:31 BST 2021 [1623142111]
[STATS] Creating single chromosome coverage files [Tue 8 Jun 09:48:31 BST 2021]
CHROMOSOMES=1
CHUNK SIZE=1
NEW BATCH [1-1]
Calculating coverage file for chromosome chr21 = test_results/coverage/chr21
[STATS] Creating complete file test_results/coverage/SRR2135332.cov [Tue 8 Jun 09:48:32 BST 2021]
[STATS] Finished creating coverage data [Tue 8 Jun 09:48:32 BST 2021]
[STATS] [COVERAGE] START=Tue 8 Jun 09:48:31 BST 2021 [1623142111] END=Tue 8 Jun 09:48:32 BST 2021 [1623142112] ELAPSED=1 HUMAN=00:00:01
START:Tue 8 Jun 09:48:32 BST 2021
[ERROR] Coverage file test_results/coverage/SRR2135332.chr21.cov not existing!
[ERROR] Coverage file test_results/coverage/SRR2135332.chr21.cov not existing!
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
 
  Process name: [[8839,1],1]
  Exit code:    1
--------------------------------------------------------------------------
[STATS] [PARALLEL] START=Tue 8 Jun 09:48:32 BST 2021 [1623142112] END=Tue 8 Jun 09:48:37 BST 2021 [1623142117] ELAPSED=5 HUMAN=00:00:05
Merging files in test_results/temp/ using  threads and writing to output=2
FILE LIST NOT EXISTING OR EMPTY: test_results/temp//files.txt
[STATS] [MERGE] START=Tue 8 Jun 09:48:37 BST 2021 [1623142117] END=Tue 8 Jun 09:48:37 BST 2021 [1623142117] ELAPSED=0 HUMAN=00:00:00
END:Tue 8 Jun 09:48:37 BST 2021

I created the conda environment with:

conda create -n mypython2.7 # create empty environment
conda activate mypython2.7
conda install -c conda-forge python=2.7 openmpi r-rjava
conda install -c bioconda samtools=1.8 tabix htslib bedtools bamutil java-jdk blat
git clone https://github.com/tflati/reditools2.0.git
cd reditools2.0
pip install -r requirements.txt --user

I would be extremely grateful if you could point me in the right direction in how to get parallel_reditools.py working,

Many thanks!
Oliver

@SammiLyu
Copy link

Hi @ojziff

I am running into the same issue when testing so just want to follow up on this. Were you ever able to solve this error?

Thank you,
Sammi

@oligomyeggo
Copy link

oligomyeggo commented Mar 23, 2023

Hi @ojziff and @SammiLyu , I also ran into the issue you were describing and on closer inspection of the documentation, I believe I have figured this out. In section 5.2 the developers state that: "The parallel version leverages on the existence of coverage information which reports for each position the number of supporting reads."

Digging into their test files, if you look in the test_results directory after running the parallel_test.sh script, you should see a SRR2135332.cov file (whether the test successfully completed or not; it did not successfully run in my case but I still got the .cov file to inspect). You need a file in a similar format for running parallel_reditools.py on your own data.

Check out the extract_coverage.sh script; this will help you generate the appropriate coverage files needed to run the parallel version of REDItools. Specifically, you need to provide two coverage-related arguments: -G --coverage-file, the coverage file of the sample to analyze, and -D --coverage-dir, the coverage directory containing the coverage file of the sample to analyze divided by chromosome. This can all be generated by running extract_coverage.sh, which uses samtools depth in order to compute the read depth at each position.

Hopefully this helps, and helps any one else that potentially stumbles across the same issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants