Skip to content

Commit

Permalink
For next release
Browse files Browse the repository at this point in the history
  • Loading branch information
lemieuxl committed Jun 22, 2016
2 parents 5221a86 + 28c24bd commit 795c7db
Show file tree
Hide file tree
Showing 63 changed files with 6,225 additions and 2,965 deletions.
2 changes: 1 addition & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[run]
branch = True
include = genipe*
source = genipe

[report]
exclude_lines =
Expand Down
6 changes: 4 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
language: python
python:
- "3.3"
- "3.4"
- "3.5"
before_install:
- "wget http://repo.continuum.io/miniconda/Miniconda3-3.4.2-Linux-x86_64.sh -O miniconda.sh"
- "wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh"
- "bash miniconda.sh -b -p $HOME/miniconda"
- "export PATH=$HOME/miniconda/bin:$PATH"
- "hash -r"
Expand All @@ -15,10 +15,12 @@ before_install:
- "conda info -a"
- "python --version"
install:
- "conda install -q nomkl"
- "conda install -q jinja2"
- "conda install -q numpy"
- "conda install -q pandas"
- "conda install -q scipy"
- "conda install -q patsy"
- "conda install -q statsmodels"
- "pip install --no-deps pyfaidx"
- "pip install --no-deps lifelines"
Expand Down
78 changes: 59 additions & 19 deletions README.mkd
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ Full documentation is available at

## Installation

We recommend installing the package in a Python 3 virtual environment. There
are two ways to install: `pip` or `conda`.
We recommend installing the package in a Python 3.4 (or latest) virtual
environment. There are two ways to install: `pip` or `conda`.

```bash
# Using pip
Expand All @@ -41,12 +41,12 @@ The complete installation procedure is available in the

### Dependencies

The tool requires a standard [Python](http://python.org/) 3 installation with
the following modules:
The tool requires a standard [Python](http://python.org/) 3.4 (or latest)
installation with the following modules:

* `numpy` version 1.8.2 and latest
* `Jinja2` version 2.7.3 and latest
* `pandas` version 0.15.2 and latest
* `pandas` version 0.17.0 and latest
* `setuptools` version 12.0.5 and latest

The tool requires the binaries for
Expand All @@ -62,6 +62,7 @@ and Cox's regressions), `genipe` requires the following Python modules:

* `Matplotlib` version 1.4.2 or latest
* `scipy` version 0.15.1 or latest
* `patsy` version 0.4.1 or latest
* `statsmodels` version 0.6.1 or latest
* `lifelines` version 0.7.0 or latest
* `Biopython` version 1.65 or latest
Expand Down Expand Up @@ -99,20 +100,27 @@ analysis.
```console
$ genipe-launcher --help
usage: genipe-launcher [-h] [-v] [--debug] [--thread THREAD] --bfile PREFIX
[--reference FILE] [--output-dir DIR] [--bgzip]
[--use-drmaa] [--drmaa-config FILE] [--preamble FILE]
[--reference FILE] [--chrom CHROM [CHROM ...]]
[--output-dir DIR] [--bgzip] [--use-drmaa]
[--drmaa-config FILE] [--preamble FILE]
[--shapeit-bin BINARY] [--shapeit-thread INT]
[--plink-bin BINARY] [--impute2-bin BINARY]
[--segment-length BP] --hap-template TEMPLATE
--legend-template TEMPLATE --map-template TEMPLATE
--sample-file FILE [--filtering-rules RULE [RULE ...]]
[--probability FLOAT] [--completion FLOAT]
[--info FLOAT] [--report-number NB]
[--report-title TITLE] [--report-author AUTHOR]
[--shapeit-extra OPTIONS] [--plink-bin BINARY]
[--hap-template TEMPLATE] [--legend-template TEMPLATE]
[--map-template TEMPLATE] --sample-file FILE
[--hap-nonPAR FILE] [--hap-PAR1 FILE] [--hap-PAR2 FILE]
[--legend-nonPAR FILE] [--legend-PAR1 FILE]
[--legend-PAR2 FILE] [--map-nonPAR FILE]
[--map-PAR1 FILE] [--map-PAR2 FILE]
[--impute2-bin BINARY] [--segment-length BP]
[--filtering-rules RULE [RULE ...]]
[--impute2-extra OPTIONS] [--probability FLOAT]
[--completion FLOAT] [--info FLOAT]
[--report-number NB] [--report-title TITLE]
[--report-author AUTHOR]
[--report-background BACKGROUND]

Execute the genome-wide imputation pipeline. This script is part of the
'genipe' package, version 1.2.3.
'genipe' package, version 1.3.0.

optional arguments:
-h, --help show this help message and exit
Expand All @@ -127,6 +135,8 @@ Input Options:
reference files) (optional).

Output Options:
--chrom CHROM [CHROM ...]
The chromosomes to process.
--output-dir DIR The name of the output directory. [genipe]
--bgzip Use bgzip to compress the impute2 files.

Expand All @@ -144,13 +154,15 @@ HPC Options:
SHAPEIT Options:
--shapeit-bin BINARY The SHAPEIT binary if it's not in the path.
--shapeit-thread INT The number of thread for phasing. [1]
--shapeit-extra OPTIONS
SHAPEIT extra parameters. Put extra parameters between
single or normal quotes (e.g. --shapeit-extra '--
states 100 --window 2').

Plink Options:
--plink-bin BINARY The Plink binary if it's not in the path.

IMPUTE2 Options:
--impute2-bin BINARY The IMPUTE2 binary if it's not in the path.
--segment-length BP The length of a single segment for imputation. [5e+06]
IMPUTE2 Autosomal Reference:
--hap-template TEMPLATE
The template for IMPUTE2's haplotype files (replace
the chromosome number by '{chrom}', e.g.
Expand All @@ -164,8 +176,36 @@ IMPUTE2 Options:
chromosome number by '{chrom}', e.g.
'genetic_map_chr{chrom}_combined_b37.txt').
--sample-file FILE The name of IMPUTE2's sample file.

IMPUTE2 Chromosome X Reference:
--hap-nonPAR FILE The IMPUTE2's haplotype file for the non-
pseudoautosomal region of chromosome 23.
--hap-PAR1 FILE The IMPUTE2's haplotype file for the first
pseudoautosomal region of chromosome 23.
--hap-PAR2 FILE The IMPUTE2's haplotype file for the second
pseudoautosomal region of chromosome 23.
--legend-nonPAR FILE The IMPUTE2's legend file for the non-pseudoautosomal
region of chromosome 23.
--legend-PAR1 FILE The IMPUTE2's legend file for the first
pseudoautosomal region of chromosome 23.
--legend-PAR2 FILE The IMPUTE2's legend file for the second
pseudoautosomal region of chromosome 23.
--map-nonPAR FILE The IMPUTE2's map file for the non-pseudoautosomal
region of chromosome 23.
--map-PAR1 FILE The IMPUTE2's map file for the first pseudoautosomal
region of chromosome 23.
--map-PAR2 FILE The IMPUTE2's map file for the second pseudoautosomal
region of chromosome 23.

IMPUTE2 Options:
--impute2-bin BINARY The IMPUTE2 binary if it's not in the path.
--segment-length BP The length of a single segment for imputation. [5e+06]
--filtering-rules RULE [RULE ...]
IMPUTE2 filtering rules (optional).
--impute2-extra OPTIONS
IMPUTE2 extra parameters. Put the extra parameters
between single or normal quotes (e.g. --impute2-extra
'-buffer 250 -Ne 20000').

IMPUTE2 Merger Options:
--probability FLOAT The probability threshold for no calls. [<0.9]
Expand Down Expand Up @@ -223,7 +263,7 @@ usage: imputed-stats [-h] [-v] {cox,linear,logistic,mixedlm,skat} ...

Performs statistical analysis on imputed data (either SKAT analysis, or
linear, logistic or survival regression). This script is part of the 'genipe'
package, version 1.2.3).
package, version 1.3.0.

optional arguments:
-h, --help show this help message and exit
Expand Down
79 changes: 79 additions & 0 deletions conda_build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/usr/bin/env bash

# Getting genipe's version to build
genipe_version=$1
if [ -z $genipe_version ]
then
echo "usage: $0 VERSION" 1>&2
exit 1
fi

# Creating a directory for the build module
mkdir -p conda_dist

# Creating a directory for the skeleton
mkdir -p skeleton
pushd skeleton

# Creating the skeleton
conda skeleton pypi genipe --version $genipe_version

# Checking that fetching genipe was successful
if [ $? -ne 0 ]
then
echo "Error when creating skeleton for genipe version $genipe_version" 1>&2
exit 1
fi

# The different python versions and platforms
python_versions="3.4 3.5"
platforms="linux-32 linux-64 osx-64"

# Building
for python_version in $python_versions
do
# Building
conda build --python $python_version genipe &> log.txt

# Checking the build was completed
if [ $? -ne 0 ]
then
cat log.txt
echo "Error when building genipe $genipe_version (python" \
"$python_version)" 1>&2
exit 1
fi

# Fetching the file name of the build
filename=$(egrep "^# [$] anaconda upload \S+$" log.txt | cut -d " " -f 5)

# Checking the file exists
if [ -z $filename ]||[ ! -e $filename ]
then
echo "Problem fetching file $filename" 1>&2
exit 1
fi

# Converting to the different platforms
for platform in $platforms
do
conda convert -p $platform $filename -o ../conda_dist

# Checking the conversion was completed
if [ $? -ne 0 ]
then
echo "Problem converting genipe $genipe_version (python" \
"$python_version) to $platform" 1>&2
exit 1
fi

done
done

popd
rm -rf skeleton

# Indexing
pushd conda_dist
conda index *
popd
Binary file removed docs/_static/images/Linear_Walltime.png
Binary file not shown.
Binary file removed docs/_static/images/Linear_Walltime_Plink.png
Binary file not shown.
Binary file removed docs/_static/images/Logistic_Walltime.png
Binary file not shown.
Binary file removed docs/_static/images/Logistic_Walltime_Plink.png
Binary file not shown.
Binary file added docs/_static/images/MixedLM_TS_Diff.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/_static/images/MixedLM_Walltime.png
Binary file not shown.
Binary file removed docs/_static/images/Survival_Walltime.png
Binary file not shown.
Binary file added docs/_static/images/execution_time.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/tutorial/phenotypes_mixedlm.txt.bz2
Binary file not shown.
45 changes: 45 additions & 0 deletions docs/execution_time.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@

.. _stats-exec-time:

Statistical Analysis Execution Time
====================================

GWAS analysis of imputed markers is computationally intensive. While it is
feasible to run such analyses on some simple models like linear and logistic
regression, more complex models like Cox regression and mixed linear models
require more computing power or specialized implementations.

We have optimized the mixed linear model analysis to significantly decrease
computation time. Using a two-step approach (as described by Sikorska *et al.*,
2015 [doi: `10.1038/ejhg.2015.1
<http://www.nature.com/ejhg/journal/v23/n10/abs/ejhg20151a.html>`_]), the
execution time is comparable to a simple linear regression. Prior to
optimization, the analysis of chromosome 2 was performed in 53 hours for 33
sub-analysis with 6 threads each (which corresponds to 198 threads).

The following figure shows the execution time for a typical imputation analysis
of chromosome 2, imputed for 5,045 samples. Chromosome 2 was composed a total
of 1,170,797 loci, where 961,019 were of sufficient quality, and 528,932 had a
MAF higher than 1%. The black dashed line is the execution time for Plink.

.. figure:: _static/images/execution_time.png
:align: center
:width: 70%
:alt: Statistical analysis exection time.

.. note::

On some installation, when executing the analysis with *n* threads,
*OPENBLAS* automatically uses all the CPUs for each thread, such that the
load quickly increases to *n* times the number of CPUs. Such high load slows
down the analysis considerably.

To avoid this, always export the following environment variable and specify
the total number of threads using the ``--nb-process`` option.

.. code-block:: bash
export OPENBLAS_NUM_THREADS=1
We are planning to optimize the Cox's proportional hazard regression in the
near future.
Loading

0 comments on commit 795c7db

Please sign in to comment.