RepliDecPlus

RepliDecPlus integrate tools for predict phage replication cycle.

Current support: RepliDec, PhaBOX/phaTYP, BACPHLIP, DeePhage.

Introduction

RepliDecPlus has 3 steps:

Running individual tools.
- RepliDec used for complete genomes and metagenomic assemblies;
- PhaBOX/phaTYP and DeePhage for metagenomic assemblies;
- BACPHLIP for complete genomes;
Collect resultes and scores from these tools.
- After running each software, we used a custom script to calculate the replication cycle of each input sequences in the same bin in PhaBOX/phaTYP and DeePhage. Becasue they will treat each sequence as a seperate query, which will cause sequences from same bin have multiple replication cycle.
Use the an in-house scoring system to re-calculate the confidence for the final prediction.
- Following the evaluation results, we have formulated a comprehensive scoring system. This system is instrumental in assigning appropriate weights to the confidence levels associated with each result, thereby facilitating the derivation of a refined final prediction.

Installation

We prepare the environment use Conda. Please install conda first.

1. Conda installation (If you have, please skip it)

## linux
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

Other platform please follow this download url: https://docs.conda.io/projects/miniconda/en/latest/

PS: Because some software can run only on Linux, so we recommand use linux based system.

2. Clone RepliDecPlus Git repository and set up a Conda environment and all necssary dependent packages

git clone https://github.com/pengSherryYel/ReplidecPlus.git
cd ReplidecPlus
sh ./prepare_env.sh

prepare_env.sh not only prepare the environment but also install all the related packages.

After success prepare the environment and packages. There will be five conda environment genererted. All five enviroment will startswith "RP".

RP_base: main environment
RP_bacphlip：environment for BACPHIP
RP_deephage: environment for DeePhage
RP_phabox: environment for PhaTYP/PhaBOX
RP_replidec: environment for RepliDec

Usage

current support: RepliDec, PhaBOX/phaTYP, BACPHLIP, PhageAI, DeePhage.

Qucik start

conda activate RP_base
python ./ReplidecPlus.py -i input.txt -r -p -b -a -d -t 10

INPUT (TEXT OR FASTA file) (`-i`)

TEXT

To support the binning results. we use text file as input (-i). This file is a two columns tab seperated file.
1. first column: sampleID which will used as identifier in the output file.
2. second column: sequence path(Nucleic Acids Sequences).
```
###
NC_001447.1      $path/NC_001447.1.fasta
NC_023556.1      $path/NC_023556.1.fasta
```
FASTA

RepliDecPlus can not direct use fasta file. We prepare a scirpt to transform fasta file into text format
```
cd utility
sh fasta2list.sh your_query_seq.fasta sequence.list 
```

Output (`-o`)

There will be four folders generate under the path set by -o, default is current workdir. And two important file

FOLDER: store the results from each tools

bacphlip
deephage
phabox
replidec

File: main outputs

ReplidecPlus.summary.detail.txt

Merged results of prediction detail from each tools.
ReplidecPlus.summary.final.txt

Final prediction of merged weighted results from each tools.

parameters

Usage: python RepliPhage.py -i  -r -p -b -a -d

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -i I                  input file, two cloumn. sample seqence_path. tab sepearte.
  -o O                  path to deposit output folder and temporary files, will create if doesn't exist [default= working directory]
  -t T                  thread number used in each software
  -r, --replidec        run replidec
  -rd {all,prokaryote}, --replidec_db {all,prokaryote}
                        define replidec database
  -rp REPLIDEC_PARA, --replidec_parameter REPLIDEC_PARA
                        define replidec parameter
  -rf, --replidecF      force rerun replidec
  -d, --deephage        run deephage
  -df, --deephageF      force rerun deephage
  -b, --bacphlip        run bacphlip
  -bf, --bacphlipF      force rerun bacphlipF
  -p, --phabox          run phaTYP from PhaBOX
  -pp PHABOX_PARA, --phabox_parameter PHABOX_PARA
                        define phabox parameter
  -pf, --phaboxF        force rerun phaTYP

Example

#!/usr/bin/bash
conda activate RP_base
cd example
sh ../utility/fasta2list.sh sequences.fasta sequence.list sequence_split 
python ../ReplidecPlus.py -i sequence.list -o example_repliplus -t 4 -r -b -p -d

Known issues

the minimum length of input sequence is 3k bp. If the length is too short, it will significantly infulece the prediction accuracy.
RepliDec Plus will take long time to predict very large dataset. if possible, you can seperate the input query sequences into small ones. Then run them parallel. This will save a lot of time.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
bin		bin
env		env
example		example
src		src
utility		utility
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RepliDecPlus.png		RepliDecPlus.png
RepliPhage.py		RepliPhage.py
ReplidecPlus.py		ReplidecPlus.py
prepare_env.sh		prepare_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepliDecPlus

Introduction

Installation

1. Conda installation (If you have, please skip it)

2. Clone RepliDecPlus Git repository and set up a Conda environment and all necssary dependent packages

Usage

Qucik start

INPUT (TEXT OR FASTA file) (`-i`)

Output (`-o`)

parameters

Example

Known issues

References

Citation

About

Releases

Packages

Languages

License

pengSherryYel/ReplidecPlus

Folders and files

Latest commit

History

Repository files navigation

RepliDecPlus

Introduction

Installation

1. Conda installation (If you have, please skip it)

2. Clone RepliDecPlus Git repository and set up a Conda environment and all necssary dependent packages

Usage

Qucik start

INPUT (TEXT OR FASTA file) (-i)

Output (-o)

parameters

Example

Known issues

References

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

INPUT (TEXT OR FASTA file) (`-i`)

Output (`-o`)

Packages