Skip to content

Analyze and visualize the taxonomic abundance of set of sequences

License

Notifications You must be signed in to change notification settings

C3BI-pasteur-fr/taxo_pack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

All programs permit to analyze and visualize the taxonomic abundance of set of sequences compared against a sequence database with BLAST.

First, taxoptimizer.py parse the blast output report and add the NCBI Taxonomy database information to each HSP. It requires the bsddb3 python library
(https://pypi.python.org/pypi/bsddb3/), the Berkeley DB library (http://www.oracle.com),  the Golden program
(https://github.com/C3BI-pasteur-fr/golden) and the taxodb_ncbi (https://github.com/C3BI-pasteur-fr/taxodb_ncbi) or/and the taxo_rrna 
(https://github.com/C3BI-pasteur-fr/taxo_rrna) program to work. 

And, rankoptimizer.py analyze the taxonomy abundance of a set of sequences, pre-process by the taxoptimizer.py program, and format result with Krona,
an interactive metagenomic visualization in a Web browser (https://github.com/marbl/Krona). 

Finaly, kronaextract.py extract sub-list of Query ID from a set of sequences matching a given taxon name and/or their offset number in the rankoptimizer.py output.

INSTALL:
 $ tar xfvz taxo_pack-2.0.tar.gz
 $ cd taxo_pack-2.0
 $ python setut.py install 
 
Downloading and Installing prerequisites:
----------------------------------------
 
 1) Install https://github.com/C3BI-pasteur-fr/golden
 2) Install https://github.com/C3BI-pasteur-fr/taxodb_ncbi and/or https://github.com/C3BI-pasteur-fr/taxo_rrna
 3) Install Berkeley DB library from http://www.oracle.com
 4) Install https://pypi.python.org/pypi/bsddb3 

DATA PREPRATION:
---------------
Creation of database indexes with goldin (https://github.com/C3BI-pasteur-fr/golden)
 $ cd my_working_directory
 $ mkdir uniprot
 $ tar xfvz ~/taxo_pack-2.0/test/part_of_uniprot.db.tar.gz
 $ mv part_of_uniprot.db uniprot
 $ goldin uniprot uniprot/part_of_uniprot.db
 $ export GOLDENDATA='/user_path/my_working_directory'

Creation of the NCBI Taxonomy Database in Berkeley DB library (https://github.com/C3BI-pasteur-fr/taxodb_ncbi)
 $ taxodb.py -n names.dmp -d nodes.dmp -b ncbi_osVSoc.bdb 
Creation of the RRNA taxonomy in Berkeley DB library (https://github.com/C3BI-pasteur-fr/taxo_rrna)
$ python taxodb_rrna.py -i current_GREENGENES_gg16S_unaligned.fasta -n greengenes -b gg_accVosoc.bdb
$ python taxodb_rrna.py -i LSURef_111_tax_silva.fasta -n silva_lsu -b silva_lsu_accVosoc.bdb
$ python taxodb_rrna.py -i SSURef_111_NR_tax_silva.fasta -n silva_ssu  -b silva_ssu_accVosoc.bdb

 Note: A compress ncbi_taxodb.bdb file is in taxo_pack-2.0/test repository 

You have two possibilities to use the Berkeley DB library.
The first one is to specify the bdb file on the command line with the -b option of the taxoptimizer program and the second one is to to store bdb files in one ore
more specific directories. In that case, you have to define NCBITAXODB_BDB, GGTAXODB_BDB or/and SILVATAXODB_BDB environment variable(s). 

Download the krona javascript library
$ wget https://github.com/marbl/Krona/blob/master/KronaTools/src/krona-2.0.js
You have two possibilities to use the krona, javascript library.
The first one is to specify the krona-2.0.js file on the command line with the -s option of the rankoptimizer program and the second one is to to store krona-2.0.js file
in one specific directory. In that case, you have to define RANKOPTIMIZERSHARE environment variable.


Don't forget to define RANKOPTIMIZERLIB and GOLDENDATA environment variables which respectively correspond to the rankoptimizer.py and Golden program environments. 


USAGE
-----
Get help:

 $ python taxoptimizer.py -h
 $ python rankoptimizer.py -h
 $ python kronaextract.py -h

 $ taxoptimizer.py -i ~/taxo_pack-2.0/test/sequence_test.blast.m8 -o sequence_test.taxo -b ~/taxo_pack-2.0/test/ncbi_taxodb.bdb -t ncbi
 
 $ rankoptimizer.py -i ~/taxo_pack-2.0/test/sequence_test.tr.bl8.taxo -k R_k.xml -t R_t.txt -v R_v.html -V R_Vj.html -j R_j.json -p R_p.dmp -a -s krona-2.0.js
 $ kronaextract.py -i R_k.xml -n 'Retroviridae'  -o Retroviridae.out -s Retroviridae


About

Analyze and visualize the taxonomic abundance of set of sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages