v1.0.0 – Gwaihir the Windlord
Overview
The pipeline takes a CSV file that contains assembly accession number, Ensembl species names (as they may differ from Tree of Life ones !), output directories.
Assembly accession numbers are optional too. If missing, the pipeline assumes it can be retrieved from files named ACCESSION
in the standard location on disk.
The pipeline downloads the repeat annotation as the masked Fasta file and a BED file.
All files are compressed with bgzip
, and indexed with samtools faidx
or tabix
.
Steps involved:
- Download the masked fasta file from Ensembl.
- Extract the coordinates of the masked regions into a BED file.
- Compress and index the BED file with
bgzip
andtabix
.
Dependencies
All dependencies are automatically fetched by Singularity.
- bgzip
- samtools
- tabix
- python3
- wget
- awk
- gzip