California Conservation Genomics Project (CCGP) repository for the genome assembly working group.
This repository contains scripts used for the CCGP's reference genome assembly efforts.
CCGP reference genomes are assembled following a protocol adapted from Rhie et al. (2021). Assemblies comprise PacBio HiFi long read data, which is scaffolded using proximity ligation/chromatin conformation capture (HiC or OmniC) (Dovetail Genomics). Our minimum target reference genome quality is 6.7.Q40, and in most cases, we expect to reach 7.C.Q50 or better (see Table 1 in Rhie et al. (2021)).
Here is the overview of our current pipeline:
There have been multiple versions since the beginning of the project and this is an overview of how the pipeline has evolved.
Color blocks:
- Yellow: sequencing datatypes
- Dark gray: Fixed processes
- Light gray: Optional processes
- Blue: Iterative step
- PacBio HiFi
- PacBio Adapter filtering
- K-mer counting with meryl
- Genome size, heterozygosity, and repeat content estimation
- Coverage validation (calculation of expected coverage given the sequencing data
- HiC/OmniC
- Library QC with Dovetail Genomics tools
- Contig assembly with HiFiasm
- We are using single or HiC mode on HiFiasm depending on the datasets available or ploidy.
- Alignment of HiFi data with minimap2 and purging with purge_dups
- Alignments with Arima Genomics Mapping Pipeline
- Scaffolding with SALSA
- Generation and visualization of contact maps
- HiGlass
- Generation of tracks
- HiFi coverage
- HiC/OmniC coverage
- Genome assembly mappability
- Gap description
- PretextSuite
- Using YAGCloser - based on gap-spanning of long reads
- Mitogenome assembly pipeline or MitoHiFi
- Organelle filtering from nuclear assemblies
- Contamination screening with Blobtools
- Contiguity metrics (contig and scaffold N50)
- BUSCO scores
- per base quality / k-mer completeness
- Frameshift errors
- Gap description
- Genome mappability
- Mapping quality
- For further information about our project and efforts, please redirect to the CCGP website
- For more information about the project, you can also check this: