Skip to content

FreeBayes Lab

Meg Staton edited this page Sep 23, 2020 · 7 revisions

Log into the acf, get a qsub job, and go to your bwa folder.

cd /lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/3_bwa

Visualize alignments with samtools

Index the newest bam file. (Indexing the bam file is needed for many types of software to operate on a bam file)

../../../../software/samtools-1.10/bin/samtools \
index \
SRR2584866.bam

And launch the visualization:

/lustre/haven/proj/UTK0138/software/samtools-1.10/bin/samtools  \
tview \
SRR2584866.bam \
Ecoli_REL606.fasta

Some instructions:

  • '?' for help
  • 'g' then '=10,000' to go to a particular region (this latter instruction only works if you have a single reference sequence).
  • 'q' to quit

Try to find a real variant in the viewer. For example, type g, then =1600 =16500 =29500

Prep for FreeBayes

Create a new FreeBayes directory. Back out of 3_bwa and

mkdir 4_freebayes

We need to add read groups to our sam files. This was possible to do while we were running BWA but I forgot. Also, we can name them something a bit more easy to remember.

/lustre/haven/proj/UTK0138/software/samtools-1.10/bin/samtools addreplacerg \
-r "ID:SRR2584863" \
-r "SM:15kgen" \
-o 15kgen.bam \
../3_bwa/SRR2584863.bam

/lustre/haven/proj/UTK0138/software/samtools-1.10/bin/samtools addreplacerg \
-r "ID:SRR2584866" \
-r "SM:50kgen" \
-o 50kgen.bam \
../3_bwa/SRR2584866.bam 

/lustre/haven/proj/UTK0138/software/samtools-1.10/bin/samtools addreplacerg \
-r "ID:SRR2589044" \
-r "SM:5kgen" \
-o 5kgen.bam \
../3_bwa/SRR2589044.bam 

Now we need to mark duplicate reads. We'll use sambamba. Another option is Picard tools. This is pretty memory intensive, so we'll need a submission script

#PBS -S /bin/bash
#PBS -A ACF-UTK0138
#PBS -l nodes=1:ppn=2,walltime=01:00:00
#PBS -l advres=epp622.13776065
#PBS -N markdups
#PBS -m abe
#PBS -M [email protected]

cd $PBS_O_WORKDIR

/lustre/haven/proj/UTK0138/software/sambamba-0.7.1-linux-static markdup 5kgen.bam 5kgen.dedup.bam
/lustre/haven/proj/UTK0138/software/sambamba-0.7.1-linux-static markdup 15kgen.bam 15kgen.dedup.bam
/lustre/haven/proj/UTK0138/software/sambamba-0.7.1-linux-static markdup 50kgen.bam 50kgen.dedup.bam

Run FreeBayes

/lustre/haven/proj/UTK0138/software/freebayes/bin/freebayes \
-f ../3_bwa/Ecoli_REL606.fasta \
-p 1 \
*.dedup.bam > Ecoli.vcf

We can filter our results with bcftools

/lustre/haven/proj/UTK0138/software/bcftools-1.10.2/bin/bcftools filter -e 'QUAL<20' Ecoli.vcf > Ecoli.filtered20.vcf

How many variants were filtered?

IGV

Viewing our alignment files is really helpful to evaluate how our SNP calling performed.

First, let's index our genome file

cd ../3_bwa
../../../../software/samtools-1.10/bin/samtools faidx Ecoli_REL606.fasta

Lets transfer our bam, bai and genome files to our laptops, then we can load into IGV

Open a new terminal that is not connected to the ACF. Your initial commands to get to your desktop may vary slightly

cd Desktop
mkdir Ecoli
cd ecoli
scp [email protected]:/lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/4_freebayes/*dedup.bam .
scp [email protected]:/lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/4_freebayes/*dedup.bam.bai .
scp [email protected]:/lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/3_bwa/Ecoli_REL606.fasta .
scp [email protected]:/lustre/haven/proj/UTK0138/yourusername/e_coli/analysis/3_bwa/Ecoli_REL606.fasta.bwt .