Skip to content

BUSCO, Dot Plots, SyRI

Lavadav edited this page Oct 14, 2022 · 24 revisions

Slides for this lab are here

BUSCO

BUSCO stands for Benchmarking Universal Single-Copy Orthologs. BUSCO is the tool used to measure completeness of the genome, transcriptome assemblies. Genes used for this assessment are selected from orthologous groups, present in 90% of the species as single-copy orthologs.

mkdir 4_Busco
cd 4_Busco

Install BUSCO

conda create -n busco -c conda-forge -c bioconda busco=5.3.2
conda activate busco

Lets Run BUSCO

ln -s /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/4_busco/hap1_subset_9.fa .
ln -s /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/4_busco/hap2_subset_9.fa .

busco -i hap1_subset_9.fa -m genome -l embryophyta -c 2 --out Dogwood_hap1.BUSCO
busco -i hap2_subset_9.fa -m genome -l embryophyta -c 2 --out Dogwood_hap2.BUSCO

conda deactivate

Dot Plot

D-Geneous is the online web tool to generate dot plot comparison between two genomes. Similarity, repetition, inversions and breaks can be assessed through the plot.

Input Files: Hap1 and Hap2 Fasta

D-Geneous can be accessed at: https://dgenies.toulouse.inra.fr/

SyRI

SyRI predicts genome relatedness between whole genome assemblies. SyRI identifies structural rearrangements, deletions and local variations.

mkdir 5_Syri
cd Syri

ln -s /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/4_busco/hap1_subset_9.fa .
ln -s /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/4_busco/hap2_subset_9.fa .

## Alignment of Hap1 and Hap2 using Minimap
/sphinx_local/software/minimap2-2.24_x64-linux/minimap2 \
        -ax asm5 \
        --eqx \
        -o Dogwood_hap1-vs-hap2.sam \
        -t 2 \
        hap1_subset_9.fa \
        hap2_subset_9.fa \
        >& minimap2_output

## Convert SAM into BAM format
spack load /r67sol
samtools view -b -@ 1 Dogwood_hap1-vs-hap2.sam > Dogwood_hap1-vs-hap2.bam

##Delete the SAM file to save space
rm Dogwood_hap1-vs-hap2.sam

## Install SyRI via Conda
conda create -n syri_env -c bioconda syri
conda activate syri

## Running SyRI on BAM file
syri \
     -c Dogwood_hap1-vs-hap2.bam \
     -r hap1_subset_9.fa \
     -q hap2_subset_9.fa \
     -F B --cigar --nc 3

conda deactivate

## SyRI needs plotsr to generate Image
conda create -n plotsr -c bioconda plotsr

## Copy the "genome.txt" file for plotsr to current directory
cp /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/5_SyRI/genomes.txt .

## Plot Figure
conda activate plotsr

plotsr --sr syri.out --genomes genomes.txt -o Dogwood_Hap1-vs-Hap2_Chr9.png -H 8 -W 10 -d 300

conda deactivate