GitHub - C3BI-pasteur-fr/taxo_pack: Analyze and visualize the taxonomic abundance of set of sequences

C3BI-pasteur-fr / taxo_pack Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

Analyze and visualize the taxonomic abundance of set of sequences

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
src		src
test		test
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README		README
setup.py		setup.py

Repository files navigation

All programs permit to analyze and visualize the taxonomic abundance of set of sequences compared against a sequence database with BLAST.

First, taxoptimizer.py parse the blast output report and add the NCBI Taxonomy database information to each HSP. It requires the bsddb3 python library
(https://pypi.python.org/pypi/bsddb3/), the Berkeley DB library (http://www.oracle.com),  the Golden program
(https://github.com/C3BI-pasteur-fr/golden) and the taxodb_ncbi (https://github.com/C3BI-pasteur-fr/taxodb_ncbi) or/and the taxo_rrna 
(https://github.com/C3BI-pasteur-fr/taxo_rrna) program to work. 

And, rankoptimizer.py analyze the taxonomy abundance of a set of sequences, pre-process by the taxoptimizer.py program, and format result with Krona,
an interactive metagenomic visualization in a Web browser (https://github.com/marbl/Krona). 

Finaly, kronaextract.py extract sub-list of Query ID from a set of sequences matching a given taxon name and/or their offset number in the rankoptimizer.py output.

INSTALL:
 $ tar xfvz taxo_pack-2.0.tar.gz
 $ cd taxo_pack-2.0
 $ python setut.py install 
 
Downloading and Installing prerequisites:
----------------------------------------
 
 1) Install https://github.com/C3BI-pasteur-fr/golden
 2) Install https://github.com/C3BI-pasteur-fr/taxodb_ncbi and/or https://github.com/C3BI-pasteur-fr/taxo_rrna
 3) Install Berkeley DB library from http://www.oracle.com
 4) Install https://pypi.python.org/pypi/bsddb3 

DATA PREPRATION:
---------------
Creation of database indexes with goldin (https://github.com/C3BI-pasteur-fr/golden)
 $ cd my_working_directory
 $ mkdir uniprot
 $ tar xfvz ~/taxo_pack-2.0/test/part_of_uniprot.db.tar.gz
 $ mv part_of_uniprot.db uniprot
 $ goldin uniprot uniprot/part_of_uniprot.db
 $ export GOLDENDATA='/user_path/my_working_directory'

Creation of the NCBI Taxonomy Database in Berkeley DB library (https://github.com/C3BI-pasteur-fr/taxodb_ncbi)
 $ taxodb.py -n names.dmp -d nodes.dmp -b ncbi_osVSoc.bdb 
Creation of the RRNA taxonomy in Berkeley DB library (https://github.com/C3BI-pasteur-fr/taxo_rrna)
$ python taxodb_rrna.py -i current_GREENGENES_gg16S_unaligned.fasta -n greengenes -b gg_accVosoc.bdb
$ python taxodb_rrna.py -i LSURef_111_tax_silva.fasta -n silva_lsu -b silva_lsu_accVosoc.bdb
$ python taxodb_rrna.py -i SSURef_111_NR_tax_silva.fasta -n silva_ssu  -b silva_ssu_accVosoc.bdb

 Note: A compress ncbi_taxodb.bdb file is in taxo_pack-2.0/test repository 

You have two possibilities to use the Berkeley DB library.
The first one is to specify the bdb file on the command line with the -b option of the taxoptimizer program and the second one is to to store bdb files in one ore
more specific directories. In that case, you have to define NCBITAXODB_BDB, GGTAXODB_BDB or/and SILVATAXODB_BDB environment variable(s). 

Download the krona javascript library
$ wget https://github.com/marbl/Krona/blob/master/KronaTools/src/krona-2.0.js
You have two possibilities to use the krona, javascript library.
The first one is to specify the krona-2.0.js file on the command line with the -s option of the rankoptimizer program and the second one is to to store krona-2.0.js file
in one specific directory. In that case, you have to define RANKOPTIMIZERSHARE environment variable.


Don't forget to define RANKOPTIMIZERLIB and GOLDENDATA environment variables which respectively correspond to the rankoptimizer.py and Golden program environments. 


USAGE
-----
Get help:

 $ python taxoptimizer.py -h
 $ python rankoptimizer.py -h
 $ python kronaextract.py -h

 $ taxoptimizer.py -i ~/taxo_pack-2.0/test/sequence_test.blast.m8 -o sequence_test.taxo -b ~/taxo_pack-2.0/test/ncbi_taxodb.bdb -t ncbi
 
 $ rankoptimizer.py -i ~/taxo_pack-2.0/test/sequence_test.tr.bl8.taxo -k R_k.xml -t R_t.txt -v R_v.html -V R_Vj.html -j R_j.json -p R_p.dmp -a -s krona-2.0.js
 $ kronaextract.py -i R_k.xml -n 'Retroviridae'  -o Retroviridae.out -s Retroviridae