The metaSNV pipeline performs variant calling on aligned metagenomic samples.
Via Git:
git clone [email protected]:costea/metaSNV.git
or download a zip file of the repository.
-
Python-2.7 or above
- numpy
- pandas
On an Ubuntu/debian system, the following sequence of commands will install all required packages (the first two are only necessary if you have not enabled the universe repository before):
sudo add-apt-repository "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc) universe"
sudo apt-get update
sudo apt-get install libhts-dev libboost-dev
If you use anaconda, you can create an environment with all necessary dependencies using the following commands:
conda create --name metaSNV boost htslib pkg-config numpy pandas
source activate metaSNV
export CFLAGS=-I$CONDA_ENV_PATH/include
export LD_LIBRARY_PATH=$CONDA_ENV_PATH/lib:$LD_LIBRARY_PATH
If you do not have a C++ compiler, anaconda can also install G++:
conda create --name metaSNV boost htslib pkg-config numpy pandas
source activate metaSNV
# Add this command:
conda install gcc
export CFLAGS=-I$CONDA_ENV_PATH/include
export LD_LIBRARY_PATH=$CONDA_ENV_PATH/lib:$LD_LIBRARY_PATH
make
- 'all_samples' = a list of all BAM files, one /path/2/sample.bam per line (no duplicates)
- 'ref_db' = the reference database in fasta format (f.i. multi-sequence fasta)
- 'gen_pos' = a list with start and end positions for each sequence in the reference (format:
sequence\_id start end
)
- 'db_ann' = a gene annotation file for the reference database (format: ).
./getRefDB.sh
metaSNV.py project_dir/ all_samples ref_db [options]
Note: requires SNP calling (Part II) to be done!
metaSNV_post.py project_dir [options]
$ ./getRefDB.sh
Select freeze9, as the tutorial files have been mapped against this freeze.
$ cd EXAMPLE
$ ./getSamplesScript.sh
$ find `pwd`/EXAMPLE/samples -name “*.bam” > sample_list
$ python metaSNV.py tutorial sample_list db/freeze9.genomes.RepGenomesv9.fna --threads 8
$ python metaSNV_post.py tutorial
Voila! Your distances will be in the tutorial/distances folder. Enjoy!
If you want to run a lot of samples and would like to use the power of your cluster, we will print out the commands you need to run and you can decide on how to schedule and manage them.
$ python metaSNV.py tutorial sample_list db/freeze9.genomes.RepGenomesv9.fna --n_splits 8 --print-commands
Note the addition of the "--print-commnads". This will print out one-liners that you need to run. When done, run same again.
$ python metaSNV.py tutorial sample_list db/freeze9.genomes.RepGenomesv9.fna --n_splits 8 --print-commands
This will calculate the "load balancing" and give you the commands for running the SNV calling.
$ python metaSNV_post.py tutorial