Skip to content

Latest commit

 

History

History
75 lines (56 loc) · 2.42 KB

README.md

File metadata and controls

75 lines (56 loc) · 2.42 KB

REPET-Slurm

A collection of scripts to get started with running the REPET pipeline on a cluster with the SLURM resource manager and a module system installed.

Caveats/Warnings

  1. FASTA Format
    • Header
      • Recommended format: ">XX_i" (XX = letters, i = numbers)
      • avoid spaces and symbols like "=;:|"
    • 60 bps (or less) per line for sequences

Prerequisite Files

TEdenovo

  1. Host genome (FASTA format)
  2. REPET-specific Pfam HMM File
  3. rDNA (FASTA format) of host genome
  4. RepBase Amino Acid Database
  5. RepBase Nucleotide Database
  6. cDNA of host genome (FASTA format)

A RepeatScout bank can also be provided but there are additional pre-processing steps before it can be used in the pipeline. See the TEdenovo tuto webpage or text file included with REPET. These scripts currently do NOT perform this pre-processing steps.

TEannot

  1. Host genome (FASTA format)
  2. TE library (FASTA format)
    • from TEdenovo or another source
  3. RepBase Amino Acid Database
  4. RepBase Nucleotide Database

Getting Started

TEdenovo

  1. Clone the repository and copy the default configuration.
$ git clone https://github.com/stajichlab/REPET-slurm
$ cd REPET-slurm/TEdenovo
$ cp /path/to/REPET/config/TEdenovo.cfg .
  1. Change the settings in TEdenovo.cfg and TEdenovo_AllSteps.sh to match your environment/project.
  2. Copy/link the prerequisite files into the TEdenovo folder.
  3. sh TEdenovo_AllSteps.sh or sbatch TEdenovo_AllSteps.sh.

TEannot

If you already ran TEdenovo, then skip step 1.

  1. Clone the repository and copy the default configuration.
$ git clone https://github.com/stajichlab/REPET-slurm
$ cd REPET-slurm/TEannot
$ cp /path/to/REPET/config/TEannot.cfg .
  1. Change the settings in TEannot.cfg and TEannot_AllSteps.sh to match your environment/project.
  2. Copy/link the prerequisite files into the TEannot folder.
    • TE library has a required naming format: <project_name>_refTEs.fa
  3. sh TEannot_AllSteps.sh or sbatch TEannot_AllSteps.sh.