-
Notifications
You must be signed in to change notification settings - Fork 24
FreeBayes Lab
Log into the acf, get a qsub job, and go to your bwa folder.
cd /lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/3_bwa
Index the newest bam file. (Indexing the bam file is needed for many types of software to operate on a bam file)
../../../../software/samtools-1.10/bin/samtools \
index \
SRR2584866.bam
And launch the visualization:
/lustre/haven/proj/UTK0138/software/samtools-1.10/bin/samtools \
tview \
SRR2584866.bam \
Ecoli_REL606.fasta
Some instructions:
- '?' for help
- 'g' then '=10,000' to go to a particular region (this latter instruction only works if you have a single reference sequence).
- 'q' to quit
Try to find a real variant in the viewer. For example, type g, then
=1600
=16500
=29500
Create a new FreeBayes directory. Back out of 3_bwa and
mkdir 4_freebayes
We need to add read groups to our sam files. This was possible to do while we were running BWA but I forgot. Also, we can name them something a bit more easy to remember.
/lustre/haven/proj/UTK0138/software/samtools-1.10/bin/samtools addreplacerg \
-r "ID:SRR2584863" \
-r "SM:15kgen" \
-o 15kgen.bam \
../3_bwa/SRR2584863.bam
/lustre/haven/proj/UTK0138/software/samtools-1.10/bin/samtools addreplacerg \
-r "ID:SRR2584866" \
-r "SM:50kgen" \
-o 50kgen.bam \
../3_bwa/SRR2584866.bam
/lustre/haven/proj/UTK0138/software/samtools-1.10/bin/samtools addreplacerg \
-r "ID:SRR2589044" \
-r "SM:5kgen" \
-o 5kgen.bam \
../3_bwa/SRR2589044.bam
Now we need to mark duplicate reads. We'll use sambamba. Another option is Picard tools. This is pretty memory intensive, so we'll need a submission script
#PBS -S /bin/bash
#PBS -A ACF-UTK0138
#PBS -l nodes=1:ppn=2,walltime=01:00:00
#PBS -l advres=epp622.13776065
#PBS -N markdups
#PBS -m abe
#PBS -M [email protected]
cd $PBS_O_WORKDIR
/lustre/haven/proj/UTK0138/software/sambamba-0.7.1-linux-static markdup 5kgen.bam 5kgen.dedup.bam
/lustre/haven/proj/UTK0138/software/sambamba-0.7.1-linux-static markdup 15kgen.bam 15kgen.dedup.bam
/lustre/haven/proj/UTK0138/software/sambamba-0.7.1-linux-static markdup 50kgen.bam 50kgen.dedup.bam
/lustre/haven/proj/UTK0138/software/freebayes/bin/freebayes \
-f ../3_bwa/Ecoli_REL606.fasta \
-p 1 \
*.dedup.bam > Ecoli.vcf
We can filter our results with bcftools
/lustre/haven/proj/UTK0138/software/bcftools-1.10.2/bin/bcftools filter -e 'QUAL<20' Ecoli.vcf > Ecoli.filtered20.vcf
How many variants were filtered?
Viewing our alignment files is really helpful to evaluate how our SNP calling performed.
First, let's index our genome file
cd ../3_bwa
../../../../software/samtools-1.10/bin/samtools faidx Ecoli_REL606.fasta
Lets transfer our bam, bai and genome files to our laptops, then we can load into IGV
Open a new terminal that is not connected to the ACF. Your initial commands to get to your desktop may vary slightly
cd Desktop
mkdir Ecoli
cd ecoli
scp [email protected]:/lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/4_freebayes/*dedup.bam .
scp [email protected]:/lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/4_freebayes/*dedup.bam.bai .
scp [email protected]:/lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/3_bwa/Ecoli_REL606.fasta .
scp [email protected]:/lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/3_bwa/Ecoli_REL606.fasta.bwt .