sbx_virus_id is a sunbeam extension for identifying viruses in samples. This pipeline uses MEGAHIT or SPAdes for assembly of contigs and Cenote-Taker2 or Virsorter2 for viral identification.
N.B. If using Megahit for assembly, this extension requires also having sbx_assembly installed.
sunbeam extend https://github.com/sunbeam-labs/sbx_virus_id.git
Install blast db:
conda create -n blast
conda activate blast
conda install -c bioconda blast
mkdir refseq_select_prot/
cd refseq_select_prot/
perl `which update_blastdb.pl` --decompress refseq_select_prot
Install viral blast db:
conda stuff from above ^^^
mkdir viral_prot/ && cd viral_prot/
wget https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz && gzip -d viral.1.protein.faa.gz
makeblastdb -in viral.1.protein.faa -parse_seqids -title "viral" -dbtype prot
Run with sunbeam on the target all_virus_id
,
sunbeam run --profile /path/to/project/ all_virus_id
- blast_db: path to blast db (default: "") (NOTE: this should be the database file not just the directory it's in)
- blastx_threads: number of threads for running blastx (default: 4)
- bowtie2_build_threads: number of threads for running bowtie2-build (default: 4)
- cenote_taker2_db: path to cenote-taker2 db (default: "") (NOTE: this should be a directory)
- virsorter_db: path to virsorter2 db (default: "") (NOTE: this should be a directory)
- include_phages: Whether to include phages in the output (default: False)
- use_spades: Whether to use SPAdes instead of MEGAHIT (default: False)
- use_virsorter: Whether to use Virsorter2 instead of Cenote-Taker2 (default: False)
git clone https://github.com/sunbeam-labs/sbx_virus_id.git extensions/sbx_virus_id
cd extensions/sbx_virus_id
cat config.yml >> /path/to/sunbeam_config.yml