diff --git a/tools/repeatexplorer2/repex_full_clustering.xml b/tools/repeatexplorer2/repex_full_clustering.xml index b325ff5..0ede73a 100644 --- a/tools/repeatexplorer2/repex_full_clustering.xml +++ b/tools/repeatexplorer2/repex_full_clustering.xml @@ -9,42 +9,46 @@ export PYTHONHASHSEED=0 && /repex_tarean/seqclust + '$paired' #if $subsample_size: --sample '${subsample_size}' #end if - --output_dir=tarean_output - --logfile='${log}' - --cleanup '$paired' --taxon '$taxon' + --output_dir=output + --cpu \${GALAXY_SLOTS:-1} + ${FastaFile} + + && + + tar -cvf '${ReportArchive}' --directory=output . + - #if $advanced_options.advanced: - --mincl $advanced_options.size_threshold $advanced_options.keep_names $advanced_options.automatic_filtering -D $advanced_options.blastx.options_blastx - --assembly_min $advanced_options.assembly_min_cluster_size + ## #if $advanced_options.advanced: + ## --mincl $advanced_options.size_threshold $advanced_options.keep_names $advanced_options.automatic_filtering -D $advanced_options.blastx.options_blastx + ## --assembly_min $advanced_options.assembly_min_cluster_size - #if $advanced_options.comparative.options_comparative: - --prefix_length $advanced_options.comparative.prefix_length - #end if + ## #if $advanced_options.comparative.options_comparative: + ## --prefix_length $advanced_options.comparative.prefix_length + ## #end if - #if $advanced_options.custom_library.options_custom_library: - -d $advanced_options.custom_library.library extra_database - #end if + ## #if $advanced_options.custom_library.options_custom_library: + ## -d $advanced_options.custom_library.library extra_database + ## #end if - #if $advanced_options.options.options: - -opt $advanced_options.options.options - #end if - #end if - ${FastaFile} + ## #if $advanced_options.options.options: + ## -opt $advanced_options.options.options + ## #end if + ## #end if ]]> - - + + -
@@ -65,13 +69,40 @@
-
- - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Clustering summary + + + +

Clustering Summary

+

Graphical summary of the clustering results. Bars represent superclusters, with their heights and widths corresponding to the numbers of reads in the superclusters (y-axis) and to their proportions in all analyzed reads (x-axis), respectively. Rectangles inside the supercluster bars represent individual clusters. If the filtering of abundant satellites was performed, the affected clusters are shown in green, and their sizes correspond to the adjusted values. Blue and pink background panels show proportions of reads that were clustered and remained single, respectively. Top clusters are on the left of the dotted line.




+

Run information:

+ +

Number of input reads: 10000

+ +

Number of analyzed reads: 10000

+ +

Proportion of reads in top clusters : 14 %

+ +

Cluster merging: No

+ +

Paired-end reads: Yes

+ +

Available analyses:

+

Tandem repeat analysis

Cluster annotation

Supercluster annotation

Repeat annotation summary

+

Supplementary files:

+

CLUSTER_TABLE.csv

SUPERCLUSTER_TABLE.csv

contigs.fasta


+ +

How to cite

+

+ Novak, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J. (2013) - + RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next generation sequence reads. Bioinformatics 29:792-793. +

+ +

Classification of repetitive elements using REXdb:

+

Neumann, P., Novak, P., Hostakova, N., Macas, J. (2019) – Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mobile DNA 10:1.

+ +

+The principle of repeat identification implemented in the RepeatExplorer: +

+ Novak, P., Neumann, P., Macas, J. (2010) - Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11:378. +

+Using TAREAN for satellite repeat detection and characterization: +

+ Novak, P., Robledillo, L.A.,Koblizkova, A., Vrbova, I., Neumann, P., Macas, J. (2017) - + TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acid Research 45:e111 +

+

+

Details:

+
+--------------------------------------------------------------------------
+PIPELINE VERSION         : devel-0.3.8-2917(e753f81)
+
+PROTEIN DATABASE VERSION : protein_database_viridiplantae_v3.0.fasta
+            md5 checksum : a36362f4e8b024f1ce97589aac1e6f1a
+
+DNA DATABASE VERSION     : dna_database_masked.fasta
+            md5 checksum : 86bab7cdd3e70374cd756de13680240d
+--------------------------------------------------------------------------
+
+

Minimal number of reads in cluster to be considered top cluster : 20

+ +

Reserved Memory : 31G

+ +

Maximum number of processable reads with the reserved memory : 1557523

diff --git a/tools/repeatexplorer2/test-data/test2_out.html b/tools/repeatexplorer2/test-data/test2_out.html new file mode 100644 index 0000000..e3e1770 --- /dev/null +++ b/tools/repeatexplorer2/test-data/test2_out.html @@ -0,0 +1,64 @@ + + + + + Clustering summary + + + +

Clustering Summary

+

Graphical summary of the clustering results. Bars represent superclusters, with their heights and widths corresponding to the numbers of reads in the superclusters (y-axis) and to their proportions in all analyzed reads (x-axis), respectively. Rectangles inside the supercluster bars represent individual clusters. If the filtering of abundant satellites was performed, the affected clusters are shown in green, and their sizes correspond to the adjusted values. Blue and pink background panels show proportions of reads that were clustered and remained single, respectively. Top clusters are on the left of the dotted line.




+

Run information:

+ +

Number of input reads: 10000

+ +

Number of analyzed reads: 5000

+ +

Proportion of reads in top clusters : 8.3 %

+ +

Cluster merging: No

+ +

Paired-end reads: Yes

+ +

Available analyses:

+

Tandem repeat analysis

Cluster annotation

Supercluster annotation

Repeat annotation summary

+

Supplementary files:

+

CLUSTER_TABLE.csv

SUPERCLUSTER_TABLE.csv

contigs.fasta


+ +

How to cite

+

+ Novak, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J. (2013) - + RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next generation sequence reads. Bioinformatics 29:792-793. +

+ +

Classification of repetitive elements using REXdb:

+

Neumann, P., Novak, P., Hostakova, N., Macas, J. (2019) – Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mobile DNA 10:1.

+ +

+The principle of repeat identification implemented in the RepeatExplorer: +

+ Novak, P., Neumann, P., Macas, J. (2010) - Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11:378. +

+Using TAREAN for satellite repeat detection and characterization: +

+ Novak, P., Robledillo, L.A.,Koblizkova, A., Vrbova, I., Neumann, P., Macas, J. (2017) - + TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acid Research 45:e111 +

+

+

Details:

+
+--------------------------------------------------------------------------
+PIPELINE VERSION         : devel-0.3.8-2917(e753f81)
+
+PROTEIN DATABASE VERSION : protein_database_viridiplantae_v3.0.fasta
+            md5 checksum : a36362f4e8b024f1ce97589aac1e6f1a
+
+DNA DATABASE VERSION     : dna_database_masked.fasta
+            md5 checksum : 86bab7cdd3e70374cd756de13680240d
+--------------------------------------------------------------------------
+
+

Minimal number of reads in cluster to be considered top cluster : 20

+ +

Reserved Memory : 31G

+ +

Maximum number of processable reads with the reserved memory : 1557523