GitHub - chiulab/SURPIrt-dist

SURPIrt v1.0

Hardware & Software Requirements

• Linux server, tested Ubuntu 16.04 with 512 GB memory, 18 TB shared disk volume

Additional Software Dependencies

• Python interpreter, tested Python v2.7.12
• Perl interpreter, tested Perl v5.22.1

Required Scripts

Linux shell scripts
	• blast_ncores.sh [OBFUSCATED]
	• extractAlltoFast.sh [OBFUSCATED]
	• FastaToTab.csh [OBFUSCATED]
	• filterOverlapSAM.sh [OBFUSCATED] 
	• split_barcodes.sh [OBFUSCATED]
	• SURPIrt.sh [OBFUSCATED]
	• SURPIrt_viz.sh [OBFUSCATED]
	• TabToFasta.csh [OBFUSCTED]
	• taxonomy_annotation.sh [OBFUSCATED]

Python scripts
	• classify_annotated.py [BINARIZED]
	• mask_primers.py [BINARIZED]
	• parse_overlapping.py [BINARIZED]
	• trim_primers.py [BINARIZED]

Perl scripts
	• fasta_to_fastq.pl
	• subtractBlastFromFasta.pl

C/C++ executables
	• fqextract_5m

Instructions for Installing and Running the SURPIrt Software

The reference databases used by SURPIrt for identification of human, bacterial, fungal, and parasitic reads and for taxonomy lookup are not provided in the Github distribution. They will need to be regenerated as follows:

• The reference headers for the fasta reference database are provided in the /reference_headers subdirectory. Note that the reference headers can be in either gi or accession number format and also may include extraneous descriptive text. Use the reference headers to reconstruct the individual fasta files and place them in the directory structure as described in the README file.

• The subdirectory /taxonomy_files contains the file CSV-formatted file lineages-2019-01-20.csv. Instructions for generating the 2nd taxonomy file nucl_all_sorted_LCall.txt are provided in the README file. These files will need to be placed in the $taxonomy_folder (default /reference/surpirt/taxonomy).
Once the human/microbial reference and taxonomy lookup databases have been generated and placed in their appropriate directories, the pipeline can be run using the SURPIrt.sh script with the following command-line switches:

SURPIrt version 1.0

This program will run the SURPIrt pipeline.

Command Line Switches:

	-h	Show this help & ignore all other switches

	-r	Specify reference folder [optional - default: "/reference/surpirt"]

	-f 	Specify input FASTQ [required]

	-v	Execute pipeline in virus-only mode

		This is implemented for speed, if only looking for viruses.

	-w	Create files necessary for SURPIviz

	-x	Execute pipeline in verification-only mode.

		This mode will verify all database locations, but not execute the pipeline.

	-t	Specify number of threads to use [optional - will be set to number of cores if unspecified]

	-c	Specify config file [optional]

		This switch is used to initiate a SURPIrt run using a specified config file. Any parameters 
		in the config file will supersede default parameters within the pipeline.
		
		When using a config file, it is best to avoid using other command-line parameters. Instead, all
		parameters should be included with the config file.

	-z	Create default config file. [optional] (specify fastq filename)
		This option will create a standard .config file, and go file.

Test Run

A sample test file named ZIKV-nohuman.fastq is provided, is a metagenomic run of a ZIKV clinical sample with the human reads removed [n=517 sequences].
Using default reference directory of /reference/surpi, run the SURPIrt.sh script from the command line with the following parameters (using 8 threads/cores):

SURPIrt.sh -f "ZIKV-nohuman.fastq" -t 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SURPIrt v1.0

Hardware & Software Requirements

Additional Software Dependencies

Required Scripts

Instructions for Installing and Running the SURPIrt Software

Test Run

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
reference_headers		reference_headers
taxonomy_files		taxonomy_files
FastaToTab.csh		FastaToTab.csh
LICENSE		LICENSE
README.md		README.md
SURPIrt.sh		SURPIrt.sh
TabToFasta.csh		TabToFasta.csh
ZIKV-nohuman.fastq		ZIKV-nohuman.fastq
blast_ncores.sh		blast_ncores.sh
classify_annotated.py		classify_annotated.py
extractAlltoFast.sh		extractAlltoFast.sh
fasta_to_fastq.pl		fasta_to_fastq.pl
filterOverlapSAM.sh		filterOverlapSAM.sh
fqextract_5m		fqextract_5m
mask_primers.py		mask_primers.py
parse_overlapping.py		parse_overlapping.py
prinseq-lite.pl		prinseq-lite.pl
sgrep		sgrep
split_barcodes.sh		split_barcodes.sh
subtractBlastFromFasta.pl		subtractBlastFromFasta.pl
taxonomy_annotation.sh		taxonomy_annotation.sh
trim_primers.py		trim_primers.py
version_notes.txt		version_notes.txt

License

chiulab/SURPIrt-dist

Folders and files

Latest commit

History

Repository files navigation

SURPIrt v1.0

Hardware & Software Requirements

Additional Software Dependencies

Required Scripts

Instructions for Installing and Running the SURPIrt Software

Test Run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages