STREAMclean - Download and filter reads from the SRA in a streaming way

The Challenge

The SRA is full of sequencing data. 🎉
Tons of
- sequencing platforms
- experiment types (genomic, transcriptomic, metagenomic, younameit)
- read qualities
Great, lots of data to play around with, but…
- often you don't want all the data from an experiment
  - saving 100s of read sets takes lots of space
  - files contain contaminants 😭
  - you only want individual genomes out of a metagenome
The big question: How can we easily get only the interesting parts of SRA sets?

Our solution

Get reference genomes of interest or contaminants out of refseq to create a reference database
Streaming the data right out of the SRA and use magicblast to compare to our reference database
only save those reads you actually want!

How to install it:

clone this repository git clone https://github.com/NCBI-Hackathons/STREAMclean
install the required python libraries pip install -r requirements.txt
download magicblast from NCBI

How to use it: Bash wrapper script

./mapper_wrapper.sh -d test1 -i bacteria -s SRR4420340

or more specifically

./mapper_wrapper.sh -d test1 -i "-t 199310 viral" -s SRR4420340

This will:

Download specified reference genomes using the ncbi-genome-download package.
Create a magic-blast database of the collected reference genomes.
Map the SRA accessions against the whitelist/blacklist reference database.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
cutoff_design		cutoff_design
presentations		presentations
util		util
.gitignore		.gitignore
Additional Functionality		Additional Functionality
CONTRIBUTING.md		CONTRIBUTING.md
Documentation of Bioinformatics Code		Documentation of Bioinformatics Code
How to use STREAMclean		How to use STREAMclean
LICENSE		LICENSE
README.md		README.md
Testing		Testing
What is STREAMclean		What is STREAMclean
human_1e01_Rplot.png		human_1e01_Rplot.png
human_5e05_Rplot.png		human_5e05_Rplot.png
human_full_Rplot.png		human_full_Rplot.png
logo.png		logo.png
mapper_wrapper.sh		mapper_wrapper.sh
mmusc_1e01_Rplot.png		mmusc_1e01_Rplot.png
mmusc_full_Rplot.png		mmusc_full_Rplot.png
notepad.md		notepad.md
requirements.txt		requirements.txt
streamin_magicblast.py		streamin_magicblast.py
streamin_sam_to_reads.py		streamin_sam_to_reads.py
summary.md		summary.md
taxosize.README		taxosize.README
taxosize.py		taxosize.py
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STREAMclean - Download and filter reads from the SRA in a streaming way

The Challenge

Our solution

How to install it:

How to use it: Bash wrapper script

About

Releases

Packages

Contributors 7

Languages

License

NCBI-Hackathons/STREAMclean

Folders and files

Latest commit

History

Repository files navigation

STREAMclean - Download and filter reads from the SRA in a streaming way

The Challenge

Our solution

How to install it:

How to use it: Bash wrapper script

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages