Skip to content

Commit

Permalink
Create Toxo implementation and its documentation
Browse files Browse the repository at this point in the history
Squashed commit of the following:

commit ab35f1a
Author: Christian Ponte <[email protected]>
Date:   Tue Jan 22 08:05:47 2019 +0100

    Include suggestions from Maria J. Martín and Jorge González

commit a427bb7
Author: Christian Ponte <[email protected]>
Date:   Mon Jan 21 14:26:40 2019 +0100

    Minor changes in readme

commit dffc4ef
Author: Christian Ponte <[email protected]>
Date:   Mon Jan 21 13:39:22 2019 +0100

    Replace math expressions with dinamically generated images

commit a73214d
Author: Christian Ponte <[email protected]>
Date:   Mon Jan 21 13:09:27 2019 +0100

    Update README

    Added installation instructions, usage notes, and classes documentation.

commit 2b4154a
Author: Christian Ponte <[email protected]>
Date:   Fri Jan 18 13:45:45 2019 +0100

    Update PTable class documentation

commit 1ee0c18
Author: Christian Ponte <[email protected]>
Date:   Fri Jan 18 13:30:44 2019 +0100

    Update generate_models sample script with Model and PTable changes

commit abcadfa
Author: Christian Ponte <[email protected]>
Date:   Fri Jan 18 13:29:54 2019 +0100

    Remove unused private static method counter

commit b5b993e
Author: Christian Ponte <[email protected]>
Date:   Fri Jan 18 13:28:16 2019 +0100

    Change PTable text output format to csv to be consistent with Model input format

commit fb17906
Author: Christian Ponte <[email protected]>
Date:   Fri Jan 18 13:06:53 2019 +0100

    Change PTable implementation

    -PTable no longer stores the penetrance values using double precision, instead the symbolic expressions are used internally.
    -Simplified static methods for formatting ID.
    -Changed how the calculated Model variable values are stored in PTable, using a Map object instead of direct variables.

commit a882fb9
Author: Christian Ponte <[email protected]>
Date:   Fri Jan 18 13:05:03 2019 +0100

    Fix the multiple solutions problem in find_max_prevalence and find_max_heritability Model methods

commit 7d4ef26
Author: Christian Ponte <[email protected]>
Date:   Fri Jan 18 12:16:40 2019 +0100

    Modify genotype_probabilities to use symbolic expressions for fractions, avoiding losses in decimal precision

commit a383958
Author: Christian Ponte <[email protected]>
Date:   Thu Jan 17 13:49:27 2019 +0100

    Update PTable class constructor and variable names

commit 899b416
Author: Christian Ponte <[email protected]>
Date:   Thu Jan 17 10:45:13 2019 +0100

    Rename PT class to PTable for simplicity

commit 764e117
Author: Christian Ponte <[email protected]>
Date:   Thu Jan 17 10:43:11 2019 +0100

    Rewrite find_max_prevalence method

commit 79857b1
Author: Christian Ponte <[email protected]>
Date:   Thu Jan 17 10:31:50 2019 +0100

    Rewrite find_max_heritability method

commit 5d9b109
Author: Christian Ponte <[email protected]>
Date:   Thu Jan 17 09:04:20 2019 +0100

    Move genotype_probabilities function outside of class Model

commit 6856efa
Author: Christian Ponte <[email protected]>
Date:   Wed Jan 16 14:30:37 2019 +0100

    Update Model class constructor

commit 423ca15
Author: Christian Ponte <[email protected]>
Date:   Wed Jan 16 13:46:18 2019 +0100

    Reformated model files as csv

commit d3dfa5a
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 14:28:27 2018 +0100

    Add basic README

commit 391f800
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 13:15:07 2018 +0100

    Add sample script

commit 3705ba2
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 13:14:24 2018 +0100

    Fix bad MException ID in Model

commit 0812fe1
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 13:02:47 2018 +0100

    Add name property to Model

commit e743920
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 12:31:17 2018 +0100

    Add documentation to class PT

commit 40be863
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 11:18:39 2018 +0100

    Add header to all models with author reference

commit 03af8cb
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 11:11:55 2018 +0100

    Add documentation to Model class, allow comments in model file

commit 19be981
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 11:11:38 2018 +0100

    Add documentation to function nfold

commit ca50e5f
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 09:00:53 2018 +0100

    Add find_max_heritability method to Model class

commit d5d9beb
Author: Christian Ponte <[email protected]>
Date:   Mon Nov 26 08:47:54 2018 +0100

    Fix penetrance table output format

    -Added more digits to decimal numbers
    -Fixed genotype string error in plaintext output

commit dea1087
Author: Christian Ponte <[email protected]>
Date:   Fri Nov 23 18:22:17 2018 +0100

    Create class Model, encapsulating model operations

commit 878fb34
Author: Christian Ponte <[email protected]>
Date:   Fri Nov 23 18:21:56 2018 +0100

    Create class PT, encapsulating all penetrance table operations

commit 82f8539
Author: Christian Ponte <[email protected]>
Date:   Fri Nov 23 18:21:32 2018 +0100

    Create function nfold

commit b66b9df
Author: Christian Ponte <[email protected]>
Date:   Fri Nov 23 13:02:26 2018 +0100

    Add sample epistatic models from Marchini et al. 2005 for second, third and fourth order

commit 4876875
Author: Christian Ponte <[email protected]>
Date:   Fri Nov 23 10:39:52 2018 +0100

    Add .gitattributes

commit 2e7cdc3
Author: Christian Ponte <[email protected]>
Date:   Fri Nov 23 10:38:11 2018 +0100

    Add .gitignore
  • Loading branch information
chponte committed Jan 22, 2019
1 parent 41519c2 commit 0e2650b
Show file tree
Hide file tree
Showing 17 changed files with 895 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Line endings
* text eol=lf
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Jupyter Notebook
.ipynb_checkpoints

### Jetbrains
# All project-related files
.idea
198 changes: 198 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
=====================================
Toxo
=====================================

Toxo is an object-oriented MATLAB library for calculating penetrance tables of any interaction order model. It is centered on bivariate epistasis models, that is, models where the penetrance is an expression of two intervening variables. The user specifies the model, its desired heritability (or prevalence) and Minor Allele Frequency (MAF) and the library maximizes the resulting table's prevalence (or heritability).

Toxo includes, as an example, the `models <https://github.com/chponte/toxo/blob/master/models/>`__ proposed by Marchini *et al*. [1]_ together with a `script <https://github.com/chponte/toxo/blob/master/generate_models.m>`__ to generate penetrance tables derived from those models.

Requirements
-------------------------------------

* MATLAB (checked against version R2018a, it is likely to work on many others).


Installation
-------------------------------------

1) Download the latest Toxo release from `here <https://github.com/chponte/toxo/releases/latest>`__.
2) Unzip the contents of the file.
3) Add the ``src/`` folder into your MATLAB environment or script:

.. code:: matlab
addpath('path/to/src/folder');
Usage
-------------------------------------

Using Toxo starts with defining a model in CSV-like format. Not all models are supported by Toxo, make sure the desired model complies with the two requirements expressed below. Model variable values are then calculated, and the resulting penetrance table is written into a text file. A complete working example can be found `here <https://github.com/chponte/toxo/blob/master/generate_models.m>`__.

Model requirements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order for Toxo to calculate the values of the two variables for which the prevalence (or heritability) is maximum, two requirements must be met:

1) Penetrance expressions are non-decreasing monotonic polynomials in the real positive number space. Polynomials that meet this criteria have positive partial derivatives for all real positive values of both variables. For example, the polynomial |e1| is monotonically non-decreasing because |e2| and |e3| are positive for x > 0 and y > 0.
2) Polynomials can be sorted unequivocally in the real positive number space. This can be demonstrated analytically for all models by comparing the polynomials in pairs. As an example, the demonstration that |e4| is greater than |e5| for real positive numbers would be:

.. |e1| image:: https://latex.codecogs.com/gif.latex?x%281&plus;y%29%5E2
:align: bottom

.. |e2| image:: https://latex.codecogs.com/gif.latex?%5Ctfrac%7B%5Cpartial%7D%7B%5Cpartial%20x%7D%5Cbig%28x%281&plus;y%29%5E2%5Cbig%29%20%3D%20%281&plus;y%29%5E2
:align: bottom

.. |e3| image:: https://latex.codecogs.com/gif.latex?%5Ctfrac%7B%5Cpartial%7D%7B%5Cpartial%20y%7D%5Cbig%28x%281&plus;y%29%5E2%5Cbig%29%20%3D%20x%282y%20&plus;%202%29
:align: bottom

.. |e4| image:: https://latex.codecogs.com/gif.latex?x%281&plus;y%29%5E4
:align: bottom

.. |e5| image:: https://latex.codecogs.com/gif.latex?x%281&plus;y%29%5E3
:align: bottom

.. figure:: https://latex.codecogs.com/gif.latex?x%281&plus;y%29%5E4%20%26%5Cge%20x%281&plus;y%29%5E3
:align: center

.. figure:: https://latex.codecogs.com/gif.latex?%281&plus;y%29%5E4%20%26%5Cge%20%281&plus;y%29%5E3
:align: center

.. figure:: https://latex.codecogs.com/gif.latex?1&plus;y%20%26%5Cge%201
:align: center

.. figure:: https://latex.codecogs.com/gif.latex?y%20%26%5Cge%200
:align: center

Model description
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Models read by Toxo are formatted in CSV-like style, where each row represents a genotype combination and its associated penetrance (separated by a comma). The order in which the rows appear does not matter. Empty lines or lines starting with '#' (comments) are ignored.

Genotypes are represented as two characters, each one corresponding to each of the alleles from genotype. Alleles of the same genotype use the same alphabetic letter, and the difference in capitalization encodes the minor (lowercase) and major (uppercase) allele. There is no limit on the genotype combination size.

Penetrance expressions are functions of two variables. Variables can take any alphabetic name, but for simplicity we will name them x and y.

An example of model would be:

.. code:: text
# Model name
AABB, x
AABb, x
AAbb, x
AaBB, x
AaBb, x*(1+y)
Aabb, x*(1+y)
aaBB, x
aaBb, x*(1+y)
aabb, x*(1+y)
Penetrance table calculation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the model is defined, obtaining the penetrance table is very straightforward. First the model is imported into MATLAB using the Model class. Then, the penetrance class is obtained. The last step is to save the calculated table into a file. The following code snippet exemplifies the process:


.. code:: matlab
model = 'sample_model.csv';
output = 'penetrance_table.csv';
maf = 0.25;
prevalence = 0.1;
m = toxo.Model(model);
p = m.find_max_heritability(maf, prevalence);
p.write(output, toxo.PTable.format_csv);
Classes in Toxo
-------------------------------------
Toxo implements two main classes, Model_ and PTable_, which encapsulate all the functionality:

Model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Model is a symbolic representation of an epistasis model. It is responsible for reading the model, parsing the text file and converting the penetrance strings to symbolic expressions. It offers two methods to calculate penetrance tables which maximize the associated penetrance or heritability under certain constraints.

Attributes
"""""""""""""""""""""""""""""""""""""
name : ``String``
Name of the model.
order : ``Integer``
Number of loci involved in the epistatic model.
penetrances : ``Array of symbolic``
Array of symbolic expressions, representing the epistatic model.
variables : ``Array of symbolic``
List of all variables contained in all symbolic expressions

Methods
"""""""""""""""""""""""""""""""""""""
Model(path)
Construct an instance of this class from the given model.

- ``path`` : ``String`` - Path to the model CSV file.
find_max_prevalence(maf, h)
Calculate the penetrance table(s) of the model with the maximum admissible prevalence given its MAF and heritability.

- ``maf`` : ``Double`` - MAF of the resulting penetrance table.
- ``h`` : ``Double`` - Heritability of the resulting penetrance table.
- ``output`` : ``toxo.PTable`` - Resulting penetrance table.
find_max_heritability(maf, p)
Calculate the penetrance table(s) of the model with the maximum admissible heritability given its MAF and prevalence.

- ``maf`` : ``Double`` - MAF of the resulting penetrance table.
- ``p`` : ``Double`` - Prevalence of the resulting penetrance table.
- ``output`` : ``toxo.PTable`` - Resulting penetrance table.

PTable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Numeric representation of a penetrance table. This class provides methods to calculate several metrics, as well as a method to write the table to a file in several formats.

Static constants
"""""""""""""""""""""""""""""""""""""
format_csv : ``Integer``
Represents the CSV output format, taken as a parameter in the write method.
format_gametes: ``Integer``
Represents the GAMETES output format, taken as a parameter in the write method.

Attributes
"""""""""""""""""""""""""""""""""""""
order : ``Integer``
Number of loci involved in the penetrance table.
maf : ``Double``
Common MAF of all locis involved in the interaction.
vars : ``Map``
Values of the variables present in the original model.
gp : ``Array of symbolic``
Genotype probabilities table array.
pt : ``Array of symbolic``
Penetrances table array.

Methods
"""""""""""""""""""""""""""""""""""""
PTable(model, maf, values)
Create a penetrance table from a given Model, using the MAF and variable values desired.

- ``model`` : ``toxo.Model`` - Model from which the table is constructed.
- ``maf`` : ``Double`` - MAF of the penetrance table.
- ``values`` : ``Array of double`` - Values of the variables in Model.
prevalence( )
Calculate the prevalence of the penetrance table.

- ``output`` : ``Double`` - Prevalence of the table.
heritability( )
Calculate the heritability of the penetrance table.

- ``output`` : ``Double`` - Heritability of the table.
write(path, format)
Write the penetrance table into a text file using a specific output format.

- ``path`` : ``String`` - File path in which the table should be written into.
- ``format`` : ``Integer`` - Format to use for the output.

Troubleshooting
-------------------------------------

If you are having trouble using Toxo, encounter any error or would like to see some additional functionality implemented, feel free to open an `Issue <https://github.com/chponte/toxo/issues>`_.

References
-------------------------------------

.. [1] Marchini, Jonathan, Peter Donnelly, and Lon R. Cardon. 2005. "Genome-Wide Strategies for Detecting Multiple Loci That Influence Complex Diseases". Nature Genetics 37 (4): 413. https://doi.org/10.1038/ng1537.
38 changes: 38 additions & 0 deletions generate_models.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
%% Import Toxo library
addpath('src/');

%% Penetrance table generation
% Read all the models from models/ folder
models = {};
for m = dir('models')'
if ~ m.isdir
models{end + 1} = toxo.Model(fullfile(m.folder, m.name));
end
end

% Create a list of MAFs and heritabilities to test
maf = [0.1, 0.25, 0.4];
h = [0.1, 0.25, 0.5, 0.8];

% Create the output folder
output_folder = "output\";
if ~ isfolder(output_folder)
mkdir(output_folder);
end

% Find the associated penetrance tables and write the results into files
for m = models
for i = maf
for j = h
try
pt = m{:}.find_max_prevalence(i, j);
file_name = sprintf("%s_%.2f_h%.2f.txt", m{:}.name, i, j);
pt.write(fullfile(output_folder, file_name), toxo.PTable.format_gametes);
catch ME
disp(ME.message);
warning("Unable to generate model %s with MAF %f and heritability %f.\n", m{:}.name, i, j);
continue;
end
end
end
end
16 changes: 16 additions & 0 deletions models/additive_2.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Jonathan Marchini, Peter Donnelly, and Lon R. Cardon
# Genome-Wide Strategies for Detecting Multiple Loci That Influence Complex Diseases
# Nature Genetics 37, no. 4 (April 2005): 413�17
# https://doi.org/10.1038/ng1537.
#
# 2-way additive epistatic model

AABB, x
AABb, x*(1+y)
AAbb, x*(1+y)^2
AaBB, x*(1+y)
AaBb, x*(1+y)^2
Aabb, x*(1+y)^3
aaBB, x*(1+y)^2
aaBb, x*(1+y)^3
aabb, x*(1+y)^4
34 changes: 34 additions & 0 deletions models/additive_3.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Jonathan Marchini, Peter Donnelly, and Lon R. Cardon
# Genome-Wide Strategies for Detecting Multiple Loci That Influence Complex Diseases
# Nature Genetics 37, no. 4 (April 2005): 413�17
# https://doi.org/10.1038/ng1537.
#
# 3-way additive epistatic model

AABBCC, x
AABBCc, x*(1+y)
AABBcc, x*(1+y)^2
AABbCC, x*(1+y)
AABbCc, x*(1+y)^2
AABbcc, x*(1+y)^3
AAbbCC, x*(1+y)^2
AAbbCc, x*(1+y)^3
AAbbcc, x*(1+y)^4
AaBBCC, x*(1+y)
AaBBCc, x*(1+y)^2
AaBBcc, x*(1+y)^3
AaBbCC, x*(1+y)^2
AaBbCc, x*(1+y)^3
AaBbcc, x*(1+y)^4
AabbCC, x*(1+y)^3
AabbCc, x*(1+y)^4
Aabbcc, x*(1+y)^5
aaBBCC, x*(1+y)^2
aaBBCc, x*(1+y)^3
aaBBcc, x*(1+y)^4
aaBbCC, x*(1+y)^3
aaBbCc, x*(1+y)^4
aaBbcc, x*(1+y)^5
aabbCC, x*(1+y)^4
aabbCc, x*(1+y)^5
aabbcc, x*(1+y)^6
88 changes: 88 additions & 0 deletions models/additive_4.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Jonathan Marchini, Peter Donnelly, and Lon R. Cardon
# Genome-Wide Strategies for Detecting Multiple Loci That Influence Complex Diseases
# Nature Genetics 37, no. 4 (April 2005): 413�17
# https://doi.org/10.1038/ng1537.
#
# 4-way additive epistatic model

AABBCCDD, x
AABBCCDd, x*(1+y)
AABBCCdd, x*(1+y)^2
AABBCcDD, x*(1+y)
AABBCcDd, x*(1+y)^2
AABBCcdd, x*(1+y)^3
AABBccDD, x*(1+y)^2
AABBccDd, x*(1+y)^3
AABBccdd, x*(1+y)^4
AABbCCDD, x*(1+y)
AABbCCDd, x*(1+y)^2
AABbCCdd, x*(1+y)^3
AABbCcDD, x*(1+y)^2
AABbCcDd, x*(1+y)^3
AABbCcdd, x*(1+y)^4
AABbccDD, x*(1+y)^3
AABbccDd, x*(1+y)^4
AABbccdd, x*(1+y)^5
AAbbCCDD, x*(1+y)^2
AAbbCCDd, x*(1+y)^3
AAbbCCdd, x*(1+y)^4
AAbbCcDD, x*(1+y)^3
AAbbCcDd, x*(1+y)^4
AAbbCcdd, x*(1+y)^5
AAbbccDD, x*(1+y)^4
AAbbccDd, x*(1+y)^5
AAbbccdd, x*(1+y)^6
AaBBCCDD, x*(1+y)
AaBBCCDd, x*(1+y)^2
AaBBCCdd, x*(1+y)^3
AaBBCcDD, x*(1+y)^2
AaBBCcDd, x*(1+y)^3
AaBBCcdd, x*(1+y)^4
AaBBccDD, x*(1+y)^3
AaBBccDd, x*(1+y)^4
AaBBccdd, x*(1+y)^5
AaBbCCDD, x*(1+y)^2
AaBbCCDd, x*(1+y)^3
AaBbCCdd, x*(1+y)^4
AaBbCcDD, x*(1+y)^3
AaBbCcDd, x*(1+y)^4
AaBbCcdd, x*(1+y)^5
AaBbccDD, x*(1+y)^4
AaBbccDd, x*(1+y)^5
AaBbccdd, x*(1+y)^6
AabbCCDD, x*(1+y)^3
AabbCCDd, x*(1+y)^4
AabbCCdd, x*(1+y)^5
AabbCcDD, x*(1+y)^4
AabbCcDd, x*(1+y)^5
AabbCcdd, x*(1+y)^6
AabbccDD, x*(1+y)^5
AabbccDd, x*(1+y)^6
Aabbccdd, x*(1+y)^7
aaBBCCDD, x*(1+y)^2
aaBBCCDd, x*(1+y)^3
aaBBCCdd, x*(1+y)^4
aaBBCcDD, x*(1+y)^3
aaBBCcDd, x*(1+y)^4
aaBBCcdd, x*(1+y)^5
aaBBccDD, x*(1+y)^4
aaBBccDd, x*(1+y)^5
aaBBccdd, x*(1+y)^6
aaBbCCDD, x*(1+y)^3
aaBbCCDd, x*(1+y)^4
aaBbCCdd, x*(1+y)^5
aaBbCcDD, x*(1+y)^4
aaBbCcDd, x*(1+y)^5
aaBbCcdd, x*(1+y)^6
aaBbccDD, x*(1+y)^5
aaBbccDd, x*(1+y)^6
aaBbccdd, x*(1+y)^7
aabbCCDD, x*(1+y)^4
aabbCCDd, x*(1+y)^5
aabbCCdd, x*(1+y)^6
aabbCcDD, x*(1+y)^5
aabbCcDd, x*(1+y)^6
aabbCcdd, x*(1+y)^7
aabbccDD, x*(1+y)^6
aabbccDd, x*(1+y)^7
aabbccdd, x*(1+y)^8
Loading

0 comments on commit 0e2650b

Please sign in to comment.