pyGenClean
is an informatics tool to facilitate and standardize the genetic
data clean up pipeline with genotyping array data. In conjunction with a source
batch-queuing system, the tool minimizes data manipulation errors, it
accelerates the completion of the data clean up process and it provides
informative graphics and metrics to guide decision making for statistical
analysis.
If you use pyGenClean
in you project, please cite the published paper
describing the tool:
Lemieux Perreault LP, Provost S, Legault MA, Barhdadi A, Dubé MP (2013) pyGenClean: efficient tool for genetic data clean up before association testing. Bioinformatics, 29(13): 1704-1705 [DOI:10.1093/bioinformatics/btt261]
Documentation is available from http://lemieuxl.github.io/pyGenClean/.
Here are the dependencies that must be installed before pyGenClean:
- Python (version 2.7)
- numpy (version 1.6.2 or latest)
- matplotlib (version 1.2.0 or latest)
- scipy (version 0.11.0 or latest)
- scikit-learn (version 0.12.1 or latest)
- Jinja2 (version 2.8 or latest)
- PLINK (1.07)
For Linux users, we recommend installing pyGenClean
in a Python
virtualenv (virtual environment).
pyGenClean
should work on Windows and MacOS, even though it hasn't been fully
tested for full compatibility. It has been tried on Windows XP (32 bits) and
Windows 7 (64 bits, but with a 32 bits Python 2.7 installation) without known
problems.
For a step by step installation on both Linux and Windows operation systems, see
pyGenClean
documentation, located here.