-
Notifications
You must be signed in to change notification settings - Fork 1
Analyze and visualize the taxonomic abundance of set of sequences
License
C3BI-pasteur-fr/taxo_pack
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
All programs permit to analyze and visualize the taxonomic abundance of set of sequences compared against a sequence database with BLAST. First, taxoptimizer.py parse the blast output report and add the NCBI Taxonomy database information to each HSP. It requires the bsddb3 python library (https://pypi.python.org/pypi/bsddb3/), the Berkeley DB library (http://www.oracle.com), the Golden program (https://github.com/C3BI-pasteur-fr/golden) and the taxodb_ncbi (https://github.com/C3BI-pasteur-fr/taxodb_ncbi) or/and the taxo_rrna (https://github.com/C3BI-pasteur-fr/taxo_rrna) program to work. And, rankoptimizer.py analyze the taxonomy abundance of a set of sequences, pre-process by the taxoptimizer.py program, and format result with Krona, an interactive metagenomic visualization in a Web browser (https://github.com/marbl/Krona). Finaly, kronaextract.py extract sub-list of Query ID from a set of sequences matching a given taxon name and/or their offset number in the rankoptimizer.py output. INSTALL: $ tar xfvz taxo_pack-2.0.tar.gz $ cd taxo_pack-2.0 $ python setut.py install Downloading and Installing prerequisites: ---------------------------------------- 1) Install https://github.com/C3BI-pasteur-fr/golden 2) Install https://github.com/C3BI-pasteur-fr/taxodb_ncbi and/or https://github.com/C3BI-pasteur-fr/taxo_rrna 3) Install Berkeley DB library from http://www.oracle.com 4) Install https://pypi.python.org/pypi/bsddb3 DATA PREPRATION: --------------- Creation of database indexes with goldin (https://github.com/C3BI-pasteur-fr/golden) $ cd my_working_directory $ mkdir uniprot $ tar xfvz ~/taxo_pack-2.0/test/part_of_uniprot.db.tar.gz $ mv part_of_uniprot.db uniprot $ goldin uniprot uniprot/part_of_uniprot.db $ export GOLDENDATA='/user_path/my_working_directory' Creation of the NCBI Taxonomy Database in Berkeley DB library (https://github.com/C3BI-pasteur-fr/taxodb_ncbi) $ taxodb.py -n names.dmp -d nodes.dmp -b ncbi_osVSoc.bdb Creation of the RRNA taxonomy in Berkeley DB library (https://github.com/C3BI-pasteur-fr/taxo_rrna) $ python taxodb_rrna.py -i current_GREENGENES_gg16S_unaligned.fasta -n greengenes -b gg_accVosoc.bdb $ python taxodb_rrna.py -i LSURef_111_tax_silva.fasta -n silva_lsu -b silva_lsu_accVosoc.bdb $ python taxodb_rrna.py -i SSURef_111_NR_tax_silva.fasta -n silva_ssu -b silva_ssu_accVosoc.bdb Note: A compress ncbi_taxodb.bdb file is in taxo_pack-2.0/test repository You have two possibilities to use the Berkeley DB library. The first one is to specify the bdb file on the command line with the -b option of the taxoptimizer program and the second one is to to store bdb files in one ore more specific directories. In that case, you have to define NCBITAXODB_BDB, GGTAXODB_BDB or/and SILVATAXODB_BDB environment variable(s). Download the krona javascript library $ wget https://github.com/marbl/Krona/blob/master/KronaTools/src/krona-2.0.js You have two possibilities to use the krona, javascript library. The first one is to specify the krona-2.0.js file on the command line with the -s option of the rankoptimizer program and the second one is to to store krona-2.0.js file in one specific directory. In that case, you have to define RANKOPTIMIZERSHARE environment variable. Don't forget to define RANKOPTIMIZERLIB and GOLDENDATA environment variables which respectively correspond to the rankoptimizer.py and Golden program environments. USAGE ----- Get help: $ python taxoptimizer.py -h $ python rankoptimizer.py -h $ python kronaextract.py -h $ taxoptimizer.py -i ~/taxo_pack-2.0/test/sequence_test.blast.m8 -o sequence_test.taxo -b ~/taxo_pack-2.0/test/ncbi_taxodb.bdb -t ncbi $ rankoptimizer.py -i ~/taxo_pack-2.0/test/sequence_test.tr.bl8.taxo -k R_k.xml -t R_t.txt -v R_v.html -V R_Vj.html -j R_j.json -p R_p.dmp -a -s krona-2.0.js $ kronaextract.py -i R_k.xml -n 'Retroviridae' -o Retroviridae.out -s Retroviridae
About
Analyze and visualize the taxonomic abundance of set of sequences
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published