You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I'm working with a program called ROADIES (link) that uses PASTA. ROADIES basically produces guide trees from provided genome assemblies by selecting a bunch of random "c-genes" (non-recombining sites within the genomes), finding each c-gene in each genome assembly, and aligning the matches to the c-gene with PASTA to produce a gene tree for each c-gene. These gene trees are then merged to create a species tree.
However, I keep encountering an issue - the program exits due to an error with PASTA when the "c-gene" happens to be a sequence from a highly paralogous gene family . Examples: KCNJ potassium transporters and olfactory receptor genes. I cannot for the life of me understand why the program crashes -- I have confirmed that there is adequate memory and CPU allocated to the program, and that this issue doesn't seem to be on the ROADIES end (it's a PASTA error).
Here's an example of the error message encountered when one of the c-genes (in this case, a randomly selected "gene 213" that corresponds to a sequence from EPPK1). EPPK1 is a member of a family of cytoskeletal elements.
Error in rule pasta:
jobid: 710
input: /lab/solexa_page/finn/ROADIES/ROADIES_output_10-24B/genes/gene_213.fa
output: /lab/solexa_page/finn/ROADIES/ROADIES_output_10-24B/genes/gene_213.fa.aln
shell:
if [[ `grep -n '>' /lab/solexa_page/finn/ROADIES/ROADIES_output_10-24B/genes/gene_213.fa | wc -l` -gt 15 ]] || [[ `awk 'BEGIN{l=0;n=0;st=0}{if (substr($0,1,1) == ">") {st=1} else {st=2}; if(st==1) {n+=1} else if(st==2) {l+=length($0)}} END{if (n>0) {print int((l+n-1)/n)} else {print 0} }' /lab/solexa_page/finn/ROADIES/ROADIES_output_10-24B/genes/gene_213.fa` -gt 750 ]]
then
input_file=/lab/solexa_page/finn/ROADIES/ROADIES_output_10-24B/genes/gene_213.fa
output_file=/lab/solexa_page/finn/ROADIES/ROADIES_output_10-24B/genes/gene_213.fa.aln
reference=""
all_matched=true
while IFS= read -r line; do
line=$(echo "$line" | tr '[:lower:]' '[:upper:]')
if [[ "$line" != ">"* ]]; then
if [ -z "$reference" ]; then
reference="$line"
elif [ "$line" != "$reference" ]; then
all_matched=false
break
fi
fi
done < "$input_file"
if [ "$all_matched" = true ]; then
cp "$input_file" "$output_file"
else
run_pasta.py -i /lab/solexa_page/finn/ROADIES/ROADIES_output_10-24B/genes/gene_213.fa -j gene_213 --alignment-suffix=fa.aln --num-cpus 4
fi
fi
touch /lab/solexa_page/finn/ROADIES/ROADIES_output_10-24B/genes/gene_213.fa.aln
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job pasta since they might be corrupted:
/lab/solexa_page/finn/ROADIES/ROADIES_output_10-24B/genes/gene_213.fa.aln
Has anyone encountered this error before / can anyone explain to me what this section of code is trying to do, and why it might be failing?
Thank you so much for any help!
The text was updated successfully, but these errors were encountered:
Hello! I'm working with a program called ROADIES (link) that uses PASTA. ROADIES basically produces guide trees from provided genome assemblies by selecting a bunch of random "c-genes" (non-recombining sites within the genomes), finding each c-gene in each genome assembly, and aligning the matches to the c-gene with PASTA to produce a gene tree for each c-gene. These gene trees are then merged to create a species tree.
However, I keep encountering an issue - the program exits due to an error with PASTA when the "c-gene" happens to be a sequence from a highly paralogous gene family . Examples: KCNJ potassium transporters and olfactory receptor genes. I cannot for the life of me understand why the program crashes -- I have confirmed that there is adequate memory and CPU allocated to the program, and that this issue doesn't seem to be on the ROADIES end (it's a PASTA error).
Here's an example of the error message encountered when one of the c-genes (in this case, a randomly selected "gene 213" that corresponds to a sequence from EPPK1). EPPK1 is a member of a family of cytoskeletal elements.
Has anyone encountered this error before / can anyone explain to me what this section of code is trying to do, and why it might be failing?
Thank you so much for any help!
The text was updated successfully, but these errors were encountered: