LTRpred(ict): a pipeline for automated functional annotation of LTR retrotransposons for comparative genomics studies
An easy way to perform de novo functional annotation of LTR retrotransposons
from any genome assembly in fasta
format.
# install the current version of LTRpred on your system
source("http://bioconductor.org/biocLite.R")
biocLite("devtools")
biocLite("HajkD/LTRpred")
The fastest way to generate a LTR retrotransposon prediction for a genome of interest (after installing all prerequisite command line tools) is to use the
LTRpred()
function and relying on the default parameters. In the following example,
a LTR transposon prediction is performed for parts of the Human Y chromosome.
# load LTRpred package
library(LTRpred)
# de novo LTR transposon prediction for the Human Y chromosome
LTRpred(genome.file = system.file("Hsapiens_ChrY.fa", package = "LTRpred"))
When running your own genome, please specify genome.file = "path/to/your/genome.fasta
instead of system.file(..., package = "LTRpred")
. The command system.file(..., package = "LTRpred")
merely references the path to the example file stored in the LTRpred package itself.
This tutorial introduces users to LTRpred
:
Users can also read the tutorials within (RStudio) :
library(LTRpred)
browseVignettes("LTRpred")
I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.
Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:
https://github.com/HajkD/LTRpred/issues
In the LTRpred
framework users can find:
- de novo prediction of LTR retrotransposons (nested, overlapping, or pure template) using LTRharvest and LTRdigest
- annotation of predicted LTR retrotransposons using Dfam or Repbase as reference
- solo LTR prediction based on specialized BLAST searches
- LTR retrotransposons family clustering using vsearch
- open reading frame prediction in LTR retrotransposons using usearch
- age estimation of predicted LTR retrotransposons in Mya (not implemented yet, but soon to come..)
- CHH, CHG, CG, ... content quantification in predicted LTR retrotransposons
- filtering for (potentially) functional LTR retrotransposons
- quality assesment of input genomes used to predict LTR retrotransposons
- run
LTRpred
on entire kingdoms of life using only one command (see?LTRpred.meta
) - perform meta genomics studies customized for LTR retrotransposons
- cluster LTR retrotransposons within and between species
- quantify the diversity space of LTR retrotransposons for entire kingdoms of life
LTRpred()
: Major pipeline to predict LTR retrotransposons in a given genomeLTRpred.meta
: Perform Meta-Analyses with LTRpredmeta.summarize()
: Summarize (concatenate) all predictions of aLTRpred.meta()
runmeta.apply()
: Apply functions to meta data generated byLTRpred()
LTRharvest()
: Run LTRharvest to predict putative LTR RetrotransposonsLTRdigest()
: Run LTRdigest to predict putative LTR Retrotransposons
CLUSTpred()
: Cluster Sequences with VSEARCHcluster.members()
: Select members of a specific clusterclust2fasta()
: Export sequences of TEs belonging to the same cluster to fasta filesAllPairwiseAlign()
: Compute all pairwise (global) alignments with VSEARCHfilter.uc()
: Filter for cluster membersSimMatAbundance()
: Compute histogram shape similarity between species
ltr.cn()
: Detect solo LTR copies of predicted LTR transposonscn2bed()
: Write copy number estimation results to BED file format.
filter.jumpers()
: Detect LTR retrotransposons that are potential jumperstidy.datasheet()
: Select most important columns of 'LTRpred' output for further analytics
read.prediction()
: Import the output of LTRharvest or LTRdigestread.tabout()
: Import information sheet returned by LTRdigestread.orfs()
: Read output ofORFpred()
read.seqs()
: Import sequences of predicted LTR transposonsread.ltrpred()
: Import the data sheet file generated byLTRpred()
read.uc()
: Read file in USEARCH cluster formatread.blast6out()
: Read file in blast6out format generated by USEARCH or VSEARCH
pred2bed()
: Format LTR prediction data to BED file formatpred2fasta()
: Save the sequence of the predicted LTR Transposons in a fasta filepred2gff()
: Format LTR prediction data to GFF3 file formatpred2annotation()
: Match LTRharvest, LTRdigest, or LTRpred prediction with a given annotation file in GFF3 formatpred2csv()
: Format LTR prediction data to CSV file format
ORFpred()
: Open Reading Frame prediction in putative LTR transposons
dfam.query()
: Annotation ofde novo
predicted LTR transposons via Dfam searchesread.dfam()
: Import Dfam Query Outputrepbase.clean()
: Clean the initial Repbase database for BLASTrepbase.query()
: Query the RepBase to annotate putative LTRsrepbase.filter()
: Filter the Repbase query output
motif.count()
: Low level function to detect motifs in strings
plot_ltrsim_individual()
: Plot the age distribution of predicted LTR transposonsplot_ltrwidth_individual()
: Plot the width distribution of putative LTR transposons or LTRs for individual speciesplot_ltrwidth_species()
: Plot the width distribution of putative LTR transposons or LTRs for all speciesplot_ltrwidth_kingdom()
: Plot the width distribution of putative LTR transposons or LTRs for all kingdomsplot_copynumber_individual()
: Plot the copy number distribution of putative LTR transposons or LTRs for individual speciesplot_copynumber_species()
: Plot the copy number distribution of putative LTR transposons or LTRs for all speciesplot_copynumber_kingdom()
: Plot the copy number distribution of putative LTR transposons or LTRs for all kingdomsplotLTRRange()
: Plot Genomic Ranges of putative LTR transposonsPlotSimCount()
: Plot LTR Similarity vs. predicted LTR countplotSize()
: Plot Genome size vs. LTR transposon countplotSizeJumpers()
: Plot Genome size vs. LTR transposon count for jumpersplotFamily()
: Visualize the Superfamily distribution of predicted LTR retrotransposonsplotDomain()
: Visualize the Protein Domain distribution of predicted LTR retrotransposonsplotCN()
: Plot correlation between LTR copy number and methylation contextplotCluster()
: Plot correlation between Cluster Number and any other variablePlotInterSpeciesCluster()
: Plot inter species similarity between TEs (for a specific cluster)PlotMainInterSpeciesCluster()
: Plot inter species similarity between TEs (for the top n clusters)
bcolor()
: Beautiful colors for plotsfile.move()
: Move folders from one location to anotherget.pred.filenames()
: Retrieve file names of files genereated by LTRpredget.seqs()
: Quickly retrieve the sequences of a 'Biostrings' objectws.wrap.path()
: Wrap whitespace in pathsrename.fasta()
: rename.fasta
I would like to thank the Paszkowski team for incredible support and motivating discussions that led to the realization of this project.