TreeGrafter

This repository contains the code for InterPro's implementation of TreeGrafter (1).

Unlike the original implementation, we use EPA-ng (2) instead of RAxML (3) to graft the sequence to the annotated family tree.

Getting started

Download and extract PANTHER data:

$ wget http://data.pantherdb.org/ftp/downloads/TreeGrafter/PANTHER17.0_data.tar.gz
$ tar -zxvf PANTHER17.0_data.tar.gz

Prepare PANTHER annotations. This is only required once:

$ python treegrafter/treegrafter.py prepare PANTHER17.0_data

Run hmmsearch (4) on your input sequences:

$ hmmsearch PANTHER17.0_data/famhmm/binHmm query.fasta > hits.out

Run TreeGrafter. TreeGrafter takes at least three arguments as input:

the query sequence file
the hmmsearch output file
the directory of prepared PANTHER data

$ python treegrafter/treegrafter.py run query.fasta hits.out PANTHER17.0_data > predictions.tsv

Options

When running treegrafter.py run, options are:

Option	Description
-e	e-value cutoff (default: disabled)
-o	output file (instead of the standard output)
--epa-ng	path to the EPA-ng binary (if not in PATH)
-t	number of threads for EPA-ng to use (default: 1)
-T	path where a temporary directory is created
--keep	keep temporary directory (default: disabled)

Output format

The columns of the output TSV are:

Col	Type	Description
1	string	Query ID
2	string	Predicted PANTHER subfamily (if any) or best matched PANTHER family
3	float	Sequence bit score
4	float	Sequence E-Value
5	float	Domain bit score
6	float	Domain E-Value
7	integer	Start of local alignment (respect to the query profile)
8	integer	End of local alignment start (respect to the query profile)
9	integer	Start of local alignment (respect to the target sequence)
10	integer	End of local alignment start (respect to the target sequence)
11	integer	Start of the envelope of the domain's location (on the target sequence)
12	integer	End of the envelope of the domain's location (on the target sequence)
13	string	Node of the reference tree where the sequence was grafted onto

Docker

TreeGrafter is available as a Docker image. PANTHER data need to be provided to the container with bind mounts. Assuming the PANTHER17.0_data directory is in your current working directory, you can use -v $(pwd):/mnt so the PANTHER data will be mounted in /mnt/PANTHER17.0_data in the container.

To prepare PANTHER data:

$ docker run --rm -v "$(pwd)":/mnt interpro/treegrafter prepare /mnt/PANTHER17.0_data

To search your sequences:

$ docker run --rm -v "$(pwd)":/mnt interpro/treegrafter search /mnt/query.fasta /mnt/PANTHER17.0_data /mnt/predictions.tsv

References

Haiming Tang, Robert D Finn, Paul D Thomas, TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations, Bioinformatics, Volume 35, Issue 3, February 2019, Pages 518–520, https://doi.org/10.1093/bioinformatics/bty625
Pierre Barbera, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, Alexandros Stamatakis, EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences, Systematic Biology, Volume 68, Issue 2, March 2019, Pages 365–369, https://doi.org/10.1093/sysbio/syy054
Alexandros Stamatakis, RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies, Bioinformatics, Volume 30, Issue 9, May 2014, Pages 1312–1313, https://doi.org/10.1093/bioinformatics/btu033
http://hmmer.org/

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
Bio		Bio
Test		Test
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
treegrafter.py		treegrafter.py
treegrafter.sh		treegrafter.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TreeGrafter

Getting started

Options

Output format

Docker

References

About

Contributors 3

Languages

License

ebi-pf-team/treegrafter

Folders and files

Latest commit

History

Repository files navigation

TreeGrafter

Getting started

Options

Output format

Docker

References

About

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages