PhyDCA

PhyDCA is Julia package that implements the inference of the phyletic couplings presented in the paper "A multi-scale coevolutionary approach to predict protein-protein interactions" by Giancarlo Croce, Thomas Gueudré, Maria Virginia Ruiz Cuevas, Victoria Keidel, Matteo Figliuzzi, Hendrik Szurmant, Martin Weigt (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006891).

The phylogenetic profiling is a classic bioinformatics technique in which the joint presence or joint absence of two traits across large numbers of species is used to infer a meaningful biological connections

see Phylogenetic profiling.

We revisit the classical ideas of phylogenetic profiling by introducing the novel concept of phyletic couplings, which can be estimated via a global statistical modelling approach ( taking inspiration by direct coupling analysis DCA )

The following Figure shows a schematic representation of the inference of phylogenetic couplings

Installation

To install the package run 'julia' in the terminal and type the command

    julia>Pkg.clone("https://github.com/GiancarloCroce/PhyCA")

Input file

The first step is to prepare the input file in the right format. Two formats are supported at the moment:

1) Genomes in terms of protein families

In this case you need to construct a file with the composition of genomes in terms of protein families: the first column of the file must be the name of the species, while the columns contains the proteins families included in the genome.

For example the file "test/phylo_data_ecoli.txt" has the following structure:

ACAM1       PF00011 PF00011 PF00023 PF00027 PF00034 PF00011 PF00042 PF00043
ACCPU       PF00011 PF00015 PF00027 PF00034 PF00037
....
ZINIC       PF00037 PF00109 PF00111 PF00115 PF00116 PF00146

2) Phylogenetic profile matrix

It is also possible to give as input file directly the phylogenetic profile matrix : a binary matrix P whose entries capture the presence (Pij = 1) or absence (Pij = 0) of a domain across genomes, with i = 1, . . . , M (the number of genomes) and j = 1, . . . , N (the number of domains).

Consequently, each domain (the columns of the PPM) is represented by a long binary number with a digit for each genome.

See, as an example, the file "test/phylo_matrix_ecoli.txt".

Usage

A real documentation is not available yet, but we report here some usage examples to get started.

To run the program type 'julia' in the terminal and load the module:

    julia> using PhyDCA

The software provides two main functions phydca(filename_data::String, PhyloDCA.PhylogenticDistance) if the input file "filename_data" is in the first format and phydca_matrix(filename_matrix::String, PhyloDCA.PhylogenticDistance) if the input file "filename_matrix" is a Phylogenetic Profile Matrix.

Next is to decide which Phylogenetic Distances we want to use for the analysis (a list of all supported Phylogenetic Distances is in the next section).

For example if we want to use the "phyletic couplings inferred with mean field DCA", then run

    julia> ecoli_results = phydca("phylo_data_ecoli.txt",mfDCA())

Output

The output "ecoli_results" is a type PhyDCA.PhyloOut with 6 fields:

list_domains: a list of all proteins families
list_species: a list of all species
PhyloProfile: the phylogenetic profile matrix
PhyloDistance: the distance matrix between protein domains
result_sorted: a (String, String, Float) vector containing the candidate candidate domain-domain connections in descending order
result_unsort: a (String, String, Float) vector containing the candidate candidate domain-domain connections not sorted

Supported Phylogenetic Distances

For the sake of comparison several Phylogenetic distances have been included in the code:

Hamming distance [ Hamming() ]
Pearson Correlation [ Correlation() ]
pValue of the exact Fisher Test [ pValue() ]
Phylogenetic couplings from the Mean Field DCA [ mfDCA() ]
Phylogenetic couplings from the pseudo-likelihood DCA [ plmDCA() ]

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
results		results
src		src
test		test
LICENSE.md		LICENSE.md
README.md		README.md
REQUIRE		REQUIRE
figure_1.png		figure_1.png
results.dat		results.dat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhyDCA

Installation

Input file

1) Genomes in terms of protein families

2) Phylogenetic profile matrix

Usage

Output

Supported Phylogenetic Distances

License

About

Releases

Packages

Languages

License

GiancarloCroce/PhyDCA

Folders and files

Latest commit

History

Repository files navigation

PhyDCA

Installation

Input file

1) Genomes in terms of protein families

2) Phylogenetic profile matrix

Usage

Output

Supported Phylogenetic Distances

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages