-
Notifications
You must be signed in to change notification settings - Fork 1
BUSCO, Dot Plots, SyRI
Lavadav edited this page Oct 14, 2022
·
24 revisions
Slides for this lab are here
BUSCO stands for Benchmarking Universal Single-Copy Orthologs. BUSCO is the tool used to measure completeness of the genome, transcriptome assemblies. Genes used for this assessment are selected from orthologous groups, present in 90% of the species as single-copy orthologs.
mkdir 4_Busco
cd 4_Busco
conda create -n busco -c conda-forge -c bioconda busco=5.3.2
conda activate busco
Lets Run BUSCO
ln -s /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/4_busco/hap1_subset_9.fa .
ln -s /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/4_busco/hap2_subset_9.fa .
busco -i hap1_subset_9.fa -m genome -l embryophyta -c 2 --out Dogwood_hap1.BUSCO
busco -i hap2_subset_9.fa -m genome -l embryophyta -c 2 --out Dogwood_hap2.BUSCO
conda deactivate
D-Geneous is the online web tool to generate dot plot comparison between two genomes. Similarity, repetition, inversions and breaks can be assessed through the plot.
Input Files: Hap1 and Hap2 Fasta
D-Geneous can be accessed at: https://dgenies.toulouse.inra.fr/
SyRI predicts genome relatedness between whole genome assemblies. SyRI identifies structural rearrangements, deletions and local variations.
mkdir 5_Syri
cd Syri
ln -s /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/4_busco/hap1_subset_9.fa .
ln -s /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/4_busco/hap2_subset_9.fa .
## Alignment of Hap1 and Hap2 using Minimap
/sphinx_local/software/minimap2-2.24_x64-linux/minimap2 \
-ax asm5 \
--eqx \
-o Dogwood_hap1-vs-hap2.sam \
-t 2 \
hap1_subset_9.fa \
hap2_subset_9.fa \
>& minimap2_output
## Convert SAM into BAM format
spack load /r67sol
samtools view -b -@ 1 Dogwood_hap1-vs-hap2.sam > Dogwood_hap1-vs-hap2.bam
##Delete the SAM file to save space
rm Dogwood_hap1-vs-hap2.sam
## Install SyRI via Conda
conda create -n syri_env -c bioconda syri
conda activate syri
## Running SyRI on BAM file
syri \
-c Dogwood_hap1-vs-hap2.bam \
-r hap1_subset_9.fa \
-q hap2_subset_9.fa \
-F B --cigar --nc 3
conda deactivate
## SyRI needs plotsr to generate Image
conda create -n plotsr -c bioconda plotsr
## Copy the "genome.txt" file for plotsr to current directory
cp /pickett_shared/teaching/EPP622_Fall2022/long_read/analysis/5_SyRI/genomes.txt .
## Plot Figure
conda activate plotsr
plotsr --sr syri.out --genomes genomes.txt -o Dogwood_Hap1-vs-Hap2_Chr9.png -H 8 -W 10 -d 300
conda deactivate