diff --git a/tools/repeatexplorer2/repex_full_clustering.xml b/tools/repeatexplorer2/repex_full_clustering.xml
index b325ff5..0ede73a 100644
--- a/tools/repeatexplorer2/repex_full_clustering.xml
+++ b/tools/repeatexplorer2/repex_full_clustering.xml
@@ -9,42 +9,46 @@
export PYTHONHASHSEED=0 &&
/repex_tarean/seqclust
+ '$paired'
#if $subsample_size:
--sample '${subsample_size}'
#end if
- --output_dir=tarean_output
- --logfile='${log}'
- --cleanup '$paired'
--taxon '$taxon'
+ --output_dir=output
+ --cpu \${GALAXY_SLOTS:-1}
+ ${FastaFile}
+
+ &&
+
+ tar -cvf '${ReportArchive}' --directory=output .
+
- #if $advanced_options.advanced:
- --mincl $advanced_options.size_threshold $advanced_options.keep_names $advanced_options.automatic_filtering -D $advanced_options.blastx.options_blastx
- --assembly_min $advanced_options.assembly_min_cluster_size
+ ## #if $advanced_options.advanced:
+ ## --mincl $advanced_options.size_threshold $advanced_options.keep_names $advanced_options.automatic_filtering -D $advanced_options.blastx.options_blastx
+ ## --assembly_min $advanced_options.assembly_min_cluster_size
- #if $advanced_options.comparative.options_comparative:
- --prefix_length $advanced_options.comparative.prefix_length
- #end if
+ ## #if $advanced_options.comparative.options_comparative:
+ ## --prefix_length $advanced_options.comparative.prefix_length
+ ## #end if
- #if $advanced_options.custom_library.options_custom_library:
- -d $advanced_options.custom_library.library extra_database
- #end if
+ ## #if $advanced_options.custom_library.options_custom_library:
+ ## -d $advanced_options.custom_library.library extra_database
+ ## #end if
- #if $advanced_options.options.options:
- -opt $advanced_options.options.options
- #end if
- #end if
- ${FastaFile}
+ ## #if $advanced_options.options.options:
+ ## -opt $advanced_options.options.options
+ ## #end if
+ ## #end if
]]>
Graphical summary of the clustering results. Bars represent superclusters, with their heights and widths corresponding to the numbers of reads in the superclusters (y-axis) and to their proportions in all analyzed reads (x-axis), respectively. Rectangles inside the supercluster bars represent individual clusters. If the filtering of abundant satellites was performed, the affected clusters are shown in green, and their sizes correspond to the adjusted values. Blue and pink background panels show proportions of reads that were clustered and remained single, respectively. Top clusters are on the left of the dotted line. Number of input reads: 10000 Number of analyzed reads: 10000 Proportion of reads in top clusters : 14 % Cluster merging: No Paired-end reads: Yes
+ Novak, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J. (2013) -
+ RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next generation sequence reads. Bioinformatics 29:792-793.
+ Classification of repetitive elements using REXdb: Neumann, P., Novak, P., Hostakova, N., Macas, J. (2019) – Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mobile DNA 10:1. Clustering Summary
+
+ Run information:
+
+ Available analyses:
+
+ Supplementary files:
+
+
+ How to cite
+
+ Novak, P., Neumann, P., Macas, J. (2010) - Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11:378. +
+Using TAREAN for satellite repeat detection and characterization: ++ Novak, P., Robledillo, L.A.,Koblizkova, A., Vrbova, I., Neumann, P., Macas, J. (2017) - + TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acid Research 45:e111 +
++-------------------------------------------------------------------------- +PIPELINE VERSION : devel-0.3.8-2917(e753f81) + +PROTEIN DATABASE VERSION : protein_database_viridiplantae_v3.0.fasta + md5 checksum : a36362f4e8b024f1ce97589aac1e6f1a + +DNA DATABASE VERSION : dna_database_masked.fasta + md5 checksum : 86bab7cdd3e70374cd756de13680240d +-------------------------------------------------------------------------- ++
Minimal number of reads in cluster to be considered top cluster : 20
+ +Reserved Memory : 31G
+ +Maximum number of processable reads with the reserved memory : 1557523
diff --git a/tools/repeatexplorer2/test-data/test2_out.html b/tools/repeatexplorer2/test-data/test2_out.html new file mode 100644 index 0000000..e3e1770 --- /dev/null +++ b/tools/repeatexplorer2/test-data/test2_out.html @@ -0,0 +1,64 @@ + + + + +Graphical summary of the clustering results. Bars represent superclusters, with their heights and widths corresponding to the numbers of reads in the superclusters (y-axis) and to their proportions in all analyzed reads (x-axis), respectively. Rectangles inside the supercluster bars represent individual clusters. If the filtering of abundant satellites was performed, the affected clusters are shown in green, and their sizes correspond to the adjusted values. Blue and pink background panels show proportions of reads that were clustered and remained single, respectively. Top clusters are on the left of the dotted line.
Number of input reads: 10000
+ +Number of analyzed reads: 5000
+ +Proportion of reads in top clusters : 8.3 %
+ +Cluster merging: No
+ +Paired-end reads: Yes
+ ++ Novak, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J. (2013) - + RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next generation sequence reads. Bioinformatics 29:792-793. +
+ +Classification of repetitive elements using REXdb:
+Neumann, P., Novak, P., Hostakova, N., Macas, J. (2019) – Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mobile DNA 10:1.
+ + +The principle of repeat identification implemented in the RepeatExplorer: ++ Novak, P., Neumann, P., Macas, J. (2010) - Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11:378. +
+Using TAREAN for satellite repeat detection and characterization: ++ Novak, P., Robledillo, L.A.,Koblizkova, A., Vrbova, I., Neumann, P., Macas, J. (2017) - + TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acid Research 45:e111 +
++-------------------------------------------------------------------------- +PIPELINE VERSION : devel-0.3.8-2917(e753f81) + +PROTEIN DATABASE VERSION : protein_database_viridiplantae_v3.0.fasta + md5 checksum : a36362f4e8b024f1ce97589aac1e6f1a + +DNA DATABASE VERSION : dna_database_masked.fasta + md5 checksum : 86bab7cdd3e70374cd756de13680240d +-------------------------------------------------------------------------- ++
Minimal number of reads in cluster to be considered top cluster : 20
+ +Reserved Memory : 31G
+ +Maximum number of processable reads with the reserved memory : 1557523