Skip to content

v1.0.0 – Gwaihir the Windlord

Compare
Choose a tag to compare
@muffato muffato released this 19 Oct 00:54
· 97 commits to main since this release

Overview

The pipeline takes a CSV file that contains assembly accession number, Ensembl species names (as they may differ from Tree of Life ones !), output directories.
Assembly accession numbers are optional too. If missing, the pipeline assumes it can be retrieved from files named ACCESSION in the standard location on disk.
The pipeline downloads the repeat annotation as the masked Fasta file and a BED file.
All files are compressed with bgzip, and indexed with samtools faidx or tabix.

Steps involved:

  • Download the masked fasta file from Ensembl.
  • Extract the coordinates of the masked regions into a BED file.
  • Compress and index the BED file with bgzip and tabix.

Dependencies

All dependencies are automatically fetched by Singularity.

  • bgzip
  • samtools
  • tabix
  • python3
  • wget
  • awk
  • gzip