Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phabox initial commit #68

Merged
merged 4 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions tools/phabox/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
categories:
- Metagenomics
description: Identify and analyze phage contigs in metagenomic data
long_description: |
PhaBOX can comprehensively identify and analyze phage contigs in metagenomic
data. It supports integrated phage analysis, including phage contig
identification from the metagenomic assembly, lifestyle prediction, taxonomic
classification, and host prediction.
name: phabox
owner: ufz
homepage_url: https://github.com/KennthShang/PhaBOX
remote_repository_url: https://github.com/Helmholtz-UFZ/ufz-galaxy-tools/blob/main/tools/phabox
type: unrestricted
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for phabox task: {{ tool_name }}."
suite:
name: "suite_phabox"
description: "A suite of tools that brings the phabox project into Galaxy."
long_description: |
PhaBOX can comprehensively identify and analyze phage contigs in metagenomic
data. It supports integrated phage analysis, including phage contig
identification from the metagenomic assembly, lifestyle prediction, taxonomic
classification, and host prediction.
59 changes: 59 additions & 0 deletions tools/phabox/cherry.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<tool id="phabox_cherry" name="PhaBOX cherry" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.1" license="MIT">
<description>Host prediction</description>
<macros>
<import>macros.xml</import>
</macros>
<xrefs>
<xref type="bio.tools">phabox</xref>
</xrefs>
<requirements>
<requirement type="package" version="@TOOL_VERSION@">phabox</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
@CRISPR_PRE@
phabox2 --task cherry
@GENERAL@
@NETWORK@
@CRISPR@
]]></command>
<inputs>
<expand macro="general"/>
<expand macro="network"/>
<expand macro="crispr"/>
</inputs>
<outputs>
<data name="out" format="tabular" from_work_dir="output/final_prediction/cherry_prediction.tsv"/>
</outputs>
<tests>
<test>
<param name="dbdir" value="phaboxdb"/>
<param name="contigs" value="example_contigs.fa"/>
<output name="out">
<assert_contents>
<has_line line="Accession&#9;Length&#9;Host&#9;CHERRYScore&#9;Method&#9;Host_NCBI_lineage&#9;Host_GTDB_lineage"/>
<has_n_lines n="11"/>
<has_n_columns n="7"/>
</assert_contents>
</output>
</test>
</tests>
<help><![CDATA[
Predict hosts for viruses.

**Output**

A tabular dataset with the following columns:

- Accession: the accession or the name of the input contigs.
- Length: the length of input contigs.
- Host: the predicted host (NCBI taxonomy) of the contigs. '-' means unknown host.
- CHERRYScore: the predicted score from the model.
- Method:
- CRISPR-based(MAG): CRISPRs alignment results from provided MAG (if any)
- CRISPR-based(DB): CRISPRs alignment results from database.
- AAI-based: predicting host based on virus-simil
]]></help>
<expand macro="citations">
<citation type="doi">10.1093/bib/bbac182</citation>
</expand>
</tool>
58 changes: 58 additions & 0 deletions tools/phabox/contamination.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
<tool id="phabox_contamination" name="PhaBOX contamination" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.1" license="MIT">
<description>Contamination/provirus detection</description>
<macros>
<import>macros.xml</import>
</macros>
<xrefs>
<xref type="bio.tools">phabox</xref>
</xrefs>
<requirements>
<requirement type="package" version="@TOOL_VERSION@">phabox</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
phabox2 --task contamination
@GENERAL@
@CONTAMINATION@
]]></command>
<inputs>
<expand macro="general"/>
<expand macro="contamination"/>
</inputs>
<outputs>
<data name="out" format="tabular" from_work_dir="output/final_prediction/contamination_prediction.tsv"/>
</outputs>
<tests>
<test>
<param name="dbdir" value="phaboxdb"/>
<param name="contigs" value="example_contigs.fa"/>
<output name="out">
<assert_contents>
<has_line line="Accession&#9;Length&#9;Total_genes&#9;Viral_genes&#9;Prokaryotic_genes&#9;Kmer_freq&#9;Contamination&#9;Provirus&#9;Pure_viral"/>
<has_n_lines n="11"/>
<has_n_columns n="9"/>
</assert_contents>
</output>
</test>
</tests>
<help><![CDATA[

Check for contaminations / proviruses.

**Output**:

A tabular dataset with the following columns:

- Accession: the accession or the name of the input contigs.
- Length: the length of input contigs.
- Total_genes: number of genes in the contigs (predicted by prodigal-gv)
- Viral_genes: number of viral marker genes
- Prokaryotic_genes: number of prokaryotic marker genes
- Kmer_freq: average frequency of 20-mer. This is a value to estimate the copy number of the genes; usually, the Kmer_freq of 99.9% virus is less than 1.25.
- Contamination:
- Provirus: Whether the sequence is a provirus
- Pure_viral: High quality or Medium quality or Low quality


]]></help>
<expand macro="citations"/>
</tool>
118 changes: 118 additions & 0 deletions tools/phabox/end_to_end.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
<tool id="phabox_end_to_end" name="PhaBOX end to end" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.1" license="MIT">
<description></description>
<macros>
<import>macros.xml</import>
</macros>
<xrefs>
<xref type="bio.tools">phabox</xref>
</xrefs>
<requirements>
<requirement type="package" version="@TOOL_VERSION@">phabox</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
@CRISPR_PRE@
phabox2 --task end_to_end
@GENERAL@
@PHAMER@
@NETWORK@
@CRISPR@
]]></command>
<inputs>
<expand macro="general"/>
<expand macro="phamer"/>
<expand macro="network"/>
<expand macro="crispr"/>
<param name="supplements" type="select" optional="true" multiple="true" label="Output supplementary collections">
<option value="phamer">phamer</option>
<option value="phagcn">phagcn</option>
<option value="cherry">cherry</option>
</param>
</inputs>
<outputs>
<data name="end_to_end_out" format="tabular" from_work_dir="output/final_prediction/final_prediction_summary.tsv"/>
<expand macro="supp_out" task="phamer"/>
<expand macro="supp_out" task="phagcn"/>
<expand macro="supp_out" task="cherry"/>
<!-- final_prediction
├── final_prediction_summary.tsv
├── phamer_supplementary
│ ├── all_predicted_contigs.fa
│ ├── all_predicted_protein.fa
│ ├── gene_annotation.tsv || outputs of phavip
│ ├── predicted_virus.fa
│ ├── predicted_virus_protein.fa
│ ├── alignment_results.tab
│ └── uncertain_sequences_for_contamination_task.fa || please run contamination task
├── phagcn_supplementary
│ ├── phagcn_network_edges.tsv
│ └── phagcn_network_nodes.tsv
├── cherry_supplementary
│ ├── cherry_network_edges.tsv
│ └── cherry_network_nodes.tsv
└── phatyp_supplementary


├── final_prediction
│   ├── cherry_prediction.tsv
│   ├── cherry_supplementary
│   │   ├── cherry_network_edges.tsv
│   │   ├── cherry_network_nodes.tsv
│   │   └── CRISPRs_alignment_DB.tsv
│   ├── final_prediction_summary.tsv
│   ├── phagcn_prediction.tsv
│   ├── phagcn_supplementary
│   │   ├── phagcn_network_edges.tsv
│   │   └── phagcn_network_nodes.tsv
│   ├── phamer_prediction.tsv
│   ├── phamer_supplementary
│   │   ├── alignment_results.tab
│   │   ├── all_predicted_contigs.fa
│   │   ├── all_predicted_protein.fa
│   │   ├── gene_annotation.tsv
│   │   ├── predicted_virus.fa
│   │   ├── predicted_virus_protein.fa
│   │   └── uncertain_sequences_for_contamination_task.fa
│   ├── phatyp_prediction.tsv
│   ├── phatyp_supplementary
│   └── phavip_prediction.tsv -->

</outputs>
<tests>
<test expect_num_outputs="1">
<param name="dbdir" value="phaboxdb"/>
<param name="contigs" value="example_contigs.fa"/>
<output name="end_to_end_out">
<assert_contents>
<has_n_lines n="11"/>
<has_n_columns n="17"/>
<has_text text="Accession&#9;Length&#9;Pred&#9;Proportion&#9;PhaMerScore&#9;PhaMerConfidence&#9;Lineage&#9;PhaGCNScore&#9;Genus&#9;GenusCluster&#9;TYPE&#9;PhaTYPScore&#9;Host&#9;CHERRYScore&#9;Method&#9;Host_NCBI_lineage&#9;Host_GTDB_lineage"/>
</assert_contents>
</output>
</test>
</tests>
<help><![CDATA[

.. class:: infomark

**What it does**

Runs the phabox2 pipeline
bernt-matthias marked this conversation as resolved.
Show resolved Hide resolved

- phamer
- phagcn
- cherry
- phatyp

Usage
bernt-matthias marked this conversation as resolved.
Show resolved Hide resolved
.....


**Input**


**Output**


]]></help>
<expand macro="citations"/>
</tool>
Loading
Loading