-
Notifications
You must be signed in to change notification settings - Fork 2
/
count.yml
358 lines (356 loc) · 20.3 KB
/
count.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
id: assembly
title: Count and Normalisation
tabs:
- id: tools
title: Tools
heading_html: >
Common tools are listed here, or search for more in the full tool panel to the left.
content:
- title_html: <code>RaceID</code> - Initial processing using RaceID
description_html: >
<p>
Performs filtering, normalisation, and confounder removal to generate a normalised and filtered count matrix of single-cell RNA data.
</p>
inputs:
- datatypes:
- count_matrix
button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fraceid_filtnormconf%2Fraceid_filtnormconf%2F0.2.3+galaxy3"
- title_html: <code>Flye</code> - assembly with PacBio or Nanopore data
description_html: >
<p>
<em>de novo</em> assembly of single-molecule sequencing reads, designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies.
</p>
inputs:
- datatypes:
- fasta
- fastq
button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fbgruening%2Fflye%2Fflye"
- title_html: <code>Unicycler</code> - assembly with Illumina, PacBio or Nanopore data - bacteria only
description_html: >
<p>
Hybrid assembly pipeline for bacterial genomes, uses both Illumina reads and long reads (PacBio or Nanopore).
</p>
inputs:
- datatypes:
- fastq
button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Funicycler%2Funicycler"
- title_html: <code>Salsa</code> - scaffold assembly with HiC data
description_html: >
<p>
Salsa is a scaffolding tool based on a computational method that exploits the genomic proximity information in Hi-C data sets for long-range scaffolding of <em>de novo</em> genome assemblies.
</p>
inputs:
- datatypes:
- fasta
button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fsalsa%2Fsalsa"
- title_html: <code>Quast</code> - assess genome assembly quality
description_html: >
<p>
QUAST = QUality ASsessment Tool. The tool evaluates genome assemblies by computing various metrics. If you have one or multiple genome assemblies, you can assess their quality with Quast. It works with or without reference genome.
</p>
inputs:
- datatypes:
- fasta
button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fquast%2Fquast"
- title_html: <code>Busco</code> - assess genome assembly quality
description_html: >
<p>
BUSCO: assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs. The tool attempts to provide a quantitative assessment of the completeness in terms of the expected gene content of a genome assembly, transcriptome, or annotated gene set.
</p>
inputs:
- datatypes:
- fasta
button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fbusco%2Fbusco"
- id: workflows
title: Workflows
heading_html: >
A workflow is a series of Galaxy tools that have been linked together to perform a specific analysis. You can use and customize the example workflows below.
<a href="https://galaxyproject.org/learn/advanced-workflow/" target="_blank">Learn more.</a>
content:
subsections:
- id: pacbio
title: Assembly with PacBio HiFi data
content:
- title_html: About these workflows
description_html: >
<p>
This <a href="https://australianbiocommons.github.io/how-to-guides/genome_assembly/hifi_assembly" target="_blank"> How-to-Guide </a> will describe the steps required to assemble your genome on the Galaxy Australia platform, using multiple workflows.
</p>
- title_html: BAM to FASTQ + QC v1.0
description_html: >
<p>
Convert a BAM file to FASTQ format to perform QC analysis (required if your data is in BAM format).
</p>
inputs:
- datatypes:
- bam
label: PacBio <em>subreads.bam</em>
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=220"
view_link: https://workflowhub.eu/workflows/220
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- title_html: PacBio HiFi genome assembly using hifiasm v2.1
description_html: >
<p>
Assemble a genome using PacBio HiFi reads.
</p>
inputs:
- datatypes:
- fastqsanger
label: HiFi reads
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=221"
view_link: https://workflowhub.eu/workflows/221
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- title_html: Purge duplicates from hifiasm assembly v1.0
description_html: >
<p>
Optional workflow to purge duplicates from the contig assembly.
</p>
inputs:
- datatypes:
- fastqsanger
label: HiFi reads
- datatypes:
- fasta
label: Primary assembly contigs
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=237"
view_link: https://workflowhub.eu/workflows/237
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- title_html: Genome assessment post-assembly
description_html: >
<p>
Evaluate the quality of your genome assembly with a comprehensive report including <code>FASTA stats</code>, <code>BUSCO</code>, <code>QUAST</code>, <code>Meryl</code> and <code>Merqury</code>.
</p>
inputs:
- datatypes:
- fasta
label: Primary assembly contigs
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=403"
view_link: https://workflowhub.eu/workflows/403
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- id: nanopore
title: Assembly with Nanopore data and polishing with Illumina data
content:
- title_html: About these workflows
description_html: >
<p>
This <a href="https://training.galaxyproject.org/training-material/topics/assembly/tutorials/largegenome/tutorial.html" target="_blank"> tutorial </a> describes the steps required to assemble a genome on Galaxy with Nanopore and Illumina data.
</p>
- title_html: Flye assembly with Nanopore data
description_html: >
<p>
Assemble Nanopore long reads. This workflow can be run alone or as part of a combined workflow for large genome assembly.
</p>
inputs:
- datatypes:
- fastqsanger
label: Long reads (may be raw, filtered and/or corrected)
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=225"
view_link: https://workflowhub.eu/workflows/225
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- title_html: Assembly polishing
description_html: >
<p>
Polishes (corrects) an assembly, using long reads (<code>Racon</code> and <code>Medaka</code>) and short reads (<code>Racon</code>).
</p>
inputs:
- datatypes:
- fasta
label: Assembly to polish
- datatypes:
- fastq
label: Long reads (those used in assembly)
- datatypes:
- fastq
label: Short reads to be used for polishing (R1 only)
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=226"
view_link: https://workflowhub.eu/workflows/226
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- title_html: Assess genome quality
description_html: >
<p>
Assesses the quality of the genome assembly. Generates statistics, determines if expected genes are present and align contigs to a reference genome.
</p>
inputs:
- datatypes:
- fasta
label: Polished assembly
- datatypes:
- fasta
label: Reference genome assembly (e.g. related species)
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=229"
view_link: https://workflowhub.eu/workflows/229
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- id: hic
title: Assembly with PacBio HiFi and HiC data
content:
- title_html: About these workflows
description_html: >
<p>
These workflows have been developed as part of the global Vertebrate Genome Project (VGP). A guide to using these in Galaxy Australia can be found <a href="/vgp-workflows.md" target="_blank">here</a>. A complete guide to the individual workflows and sample results can be found <a href="https://galaxyproject.org/projects/vgp/workflows/" target="_blank">here</a>. There are many different ways that these workflows can be used in practice - for a comprehensive example, check out this <a href="https://training.galaxyproject.org/training-material/topics/assembly/tutorials/vgp_genome_assembly/tutorial.html" target="_blank">Galaxy tutorial</a>.
</p>
- title_html: Kmer profiling
description_html: >
<p>
This workflow produces a Meryl database and Genomescope outputs that will be used to determine parameters for following workflows, and assess the quality of genome assemblies. Specifically, it provides information about the genomic complexity, such as the genome size and levels of heterozygosity and repeat content, as well about the data quality.
</p>
inputs:
- datatypes:
- fastq
label: PacBio HiFi reads
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=dockstore.org&trs_id=%23workflow/github.com/iwc-workflows/kmer-profiling-hifi-VGP1/main"
view_link: https://dockstore.org/workflows/github.com/iwc-workflows/kmer-profiling-hifi-VGP1/main:main
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- title_html: Hifi assembly and HiC phasing
description_html: >
<p>
This workflow uses <code>hifiasm</code> (HiC mode) to generate HiC-phased haplotypes (<code>hap1</code> and <code>hap2</code>). This is in contrast to its default mode, which generates primary and alternate pseudohaplotype assemblies. This workflow includes three tools for evaluating assembly quality: <code>gfastats</code>, <code>BUSCO</code> and <code>Merqury</code>.
</p>
<p>
<small> Note: if you have multiple input files for each HiC set, they need to be concatenated. The forward set needs to be concatenated in the same order as reverse set. </small>
</p>
inputs:
- datatypes:
- fasta
label: PacBio HiFi reads
- datatypes:
- fastq
label: PacBio HiC reads (forward)
- datatypes:
- fastq
label: PacBio HiC reads (reverse)
- datatypes:
- meryldb
label: <code>Meryl</code> kmer database
- datatypes:
- txt
label: <code>GenomeScope</code> genome profile summary
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=dockstore.org&trs_id=%23workflow/github.com/iwc-workflows/Assembly-Hifi-HiC-phasing-VGP4/main"
view_link: https://dockstore.org/workflows/github.com/iwc-workflows/Assembly-Hifi-HiC-phasing-VGP4/main:main
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- title_html: HiC scaffolding
description_html: >
<p>
This workflow scaffolds the assembly contigs using information from HiC data.
</p>
inputs:
- datatypes:
- gfa
label: Assembly of haplotype 1
- datatypes:
- fastq
label: HiC forward reads
- datatypes:
- fastq
label: HiC reverse reads
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=dockstore.org&trs_id=%23workflow/github.com/iwc-workflows/Scaffolding-HiC-VGP8/main"
view_link: https://dockstore.org/workflows/github.com/iwc-workflows/Scaffolding-HiC-VGP8/main:main
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- title_html: Decontamination
description_html: >
<p>
This workflow identifies and removes contaminants from the assembly.
</p>
inputs:
- datatypes:
- fasta
label: Assembly
button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=dockstore.org&trs_id=%23workflow/github.com/iwc-workflows/Assembly-decontamination-VGP9/main:v0.1"
view_link: https://dockstore.org/workflows/github.com/iwc-workflows/Assembly-decontamination-VGP9/main:v0.1
view_tip: View in WorkflowHub
button_tip: Import to Galaxy Australia
- id: help
title: Help
content:
- title_html: Can I use Galaxy Australia to assemble a large genome?
description_html: >
<p>
Yes. Galaxy Australia has assembly tools for small prokaryote genomes as well as larger eukaryote genomes. We are continually adding new tools and optimising them for large genome assemblies - this means adding enough computer processing power to run data-intensive tools, as well as configuring aspects such as parallelisation.
</p>
<p>
Please contact us if:
</p>
<ul>
<li>you need to increase your data storage limit</li>
<li>there is a tool you wish to request</li>
<li>a tool appears to be broken or running slowly</li>
</ul>
button_html: Request support
button_link: /request
- title_html: How can I learn about genome assembly?
description_html: >
<ul>
<li>See the tutorials in this Help section. They cover different approaches to genome assembly.</li>
<li>Read the methods in scientific papers about genome assembly, particularly those about genomes with similar characteristics to those in your project</li>
<li>See the Workflows section for examples of different approaches to genome assembly - these cover different sequencing data types, and a variety of tools.</li>
</ul>
- title_html: Genome assembly overview
description_html: >
<p>
Genome assembly can be a very involved process. A typical genome assembly procedure might look like:
</p>
<ul>
<li>Data QC - check the quality and characteristics of your sequencing reads.</li>
<li>Kmer counting - to determine genome characteristics such as ploidy and size.</li>
<li>Data preparation - trimming and filtering sequencing reads if required.</li>
<li>Assembly - for large genomes, this is usually done with long sequencing reads from PacBio or Nanopore.</li>
<li>Polishing - the assembly may be polished (corrected) with long and/or short (Illumina) reads.</li>
<li>Scaffolding - the assembly contigs may be joined together with other sequencing data such as HiC.</li>
<li>Assessment - at any stage, the assembly can be assessed for number of contigs, number of base pairs, whether expected genes are present, and many other metrics.</li>
<li>Annotation - identify features on the genome assembly such as gene names and locations.</li>
</ul>
<img class="img-fluid" src="/static/home/labs/genome/static/assembly-overview.png" alt="Genome assembly flowchart">
<p class="text-center">
A graphical representation of genome assembly
</p>
- title_html: Which tools should I use?
description_html: >
<p>
There is no best set of tools to recommend - new tools are developed constantly, sequencing technology improves rapidly, and many genomes have never been sequenced before and thus their characteristics and quirks are unknown. The "Tools" tab in this section includes a list of commonly-used tools that could be a good starting point. You will find other tools in recent publications or used in workflows.
</p>
<p>
You can also search for tools in Galaxy's tool panel. If they aren't installed on Galaxy Australia, you can <a href="/request/tool">request installation</a> of a tool.
</p>
<p>
We recommend testing a tool on a small data set first and seeing if the results make sense, before running on your full data set.
</p>
- title_html: Tutorials
description_html: >
<p>
Find 15+ Galaxy training tutorials <a href="https://training.galaxyproject.org/training-material/topics/assembly/" target="_blank">here</a>.
</p>
<p>
<a href="https://training.galaxyproject.org/training-material/topics/assembly/tutorials/get-started-genome-assembly/slides.html#1" target="_blank"> Introduction to genome assembly and annotation (slides) </a>
</p>
<p>
<a href="https://training.galaxyproject.org/training-material/topics/assembly/tutorials/vgp_genome_assembly/tutorial.html" target="_blank"> Vertebrate genome assembly pipeline (tutorial) </a>
</p>
<p>
<a href="https://training.galaxyproject.org/training-material/topics/assembly/tutorials/largegenome/tutorial.html" target="_blank"> Nanopore and illumina genome assembly (tutorial) </a>
</p>
<p>
<a href="https://gxy.io/GTN:T00165" target="_blank"> Share workflows and results with workflow reports (tutorial) </a>
</p>
- title_html: How can I assess the quality of my genome assembly?
description_html: >
<p>
Once a genome has been assembled, it is important to assess the quality of the assembly, and in the first instance, this quality control (QC) can be achieved using the workflow described here.
</p>
button_html: Workflow tutorial
button_link: https://australianbiocommons.github.io/how-to-guides/genome_assembly/assembly_qc
- title_html: Galaxy Australia support
description_html: >
<p>
Any user of Galaxy Australia can request support through an online form.
</p>
button_html: Request support
button_link: /request/support