Skip to content

Commit

Permalink
Merge pull request galaxyproject#441 from clsiguret/bacterial_genome_…
Browse files Browse the repository at this point in the history
…annotation

Add 'Bacterial genome annotation' workflow
  • Loading branch information
mvdbeek authored Jun 19, 2024
2 parents f14dce1 + ae04820 commit bc289fd
Show file tree
Hide file tree
Showing 5 changed files with 1,320 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /bacterial_genome_annotation.ga
testParameterFiles:
- /bacterial_genome_annotation-tests.yml
authors:
- name: ABRomics
email: [email protected]
- name: abromics-consortium
url: https://www.abromics.fr/
- name: Pierre Marin
alternateName: pimarin
orcid: 0000-0002-8304-138X
- name: Clea Siguret
alternateName: clsiguret
orcid: 0009-0005-6140-0379
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Changelog

## [1.0] - 14-06-2024

- First release
36 changes: 36 additions & 0 deletions workflows/bacterial_genomics/bacterial_genome_annotation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Bacterial genome annotation workflow (v1.0)

This workflow uses assembled bacterial genome fasta files (but can be any fasta file) and executes the following steps:
1. Genomic annotation
- **Bakta** to predict CDS and small proteins (sORF)
2. Integron identification
- **IntegronFinder2** to identify CALIN elements, In0 elements, and complete integrons
3. Plasmid gene identification
- **Plasmidfinder** to identify and typing plasmid sequences
4. Inserted sequence (IS) detection
- **ISEScan** to detect IS elements
5. Aggregating outputs into a single JSON file
- **ToolDistillator** to extract and aggregate information from different tool outputs to JSON parsable files

## Inputs

1. Assembled bacterial genome in fasta format.

## Outputs

1. Genomic annotation:
- genome annotation in tabular, gff and several other formats
- annotation plot
- nucleotide and protein sequences identified
- summary of genomic identified elements
2. Integron identification:
- integron identification in tabular format and a summary
3. Plasmid gene identification:
- plasmid gene identified and associated blast hits
4. Inserted Element (IS) detection:
- IS element list in tabular format
- is hits in fasta format
- ORF hits in protein and nucleotide fasta format
- IS annotation gff format
5. Aggregating outputs:
- JSON file with information about the outputs of **Bakta**, **IntegronFinder2**, **Plasmidfinder**, **ISEScan**
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
- doc: Test outline for bacterial_genome_annotation.ga
job:
Input sequence fasta:
class: File
path: https://zenodo.org/records/11488310/files/shovill_contigs_fasta
Select a plasmid detection database: plasmidfinder_81c11f4_2023_12_04
Select a bacterial genome annotation database: V5.0_2023-02-20
Select a AMRFinderPlus database: amrfinderplus_V3.12_2024-05-02.2
outputs:
integronfinder2_logfile_text:
assert:
has_text:
text: "Writing out results for replicon"
integronfinder2_summary:
assert:
has_n_columns:
n: 6
integronfinder2_results_tabular:
assert:
has_n_columns:
n: 14
bakta_hypothetical_tabular:
assert:
has_n_columns:
n: 9
bakta_annotation_json:
assert:
has_text:
text: "aa_hexdigest"
bakta_annotation_tabular:
assert:
has_n_columns:
n: 9
isescan_results_tabular:
assert:
has_n_columns:
n: 24
isescan_summary_tabular:
assert:
has_text:
text: "nIS"
isescan_logfile_text:
assert:
has_text:
text: "Both complete and partial IS elements are reported."
plasmidfinder_result_json:
assert:
has_text:
text: "positions_in_contig"
plasmidfinder_results_tabular:
assert:
has_n_columns:
n: 8
tooldistillator_summarize:
assert:
has_text:
text: "CDS12738(DOp1)"
has_text:
text: "CALIN"
has_text:
text: "insertion_sequence"
Loading

0 comments on commit bc289fd

Please sign in to comment.