Skip to content

Commit

Permalink
Merge branch 'dev'
Browse files Browse the repository at this point in the history
  • Loading branch information
lemieuxl committed Nov 4, 2015
2 parents fce5cd3 + 0069b04 commit 87c1989
Show file tree
Hide file tree
Showing 5 changed files with 594 additions and 98 deletions.
4 changes: 2 additions & 2 deletions README.mkd
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@ To test the module, just perform the following command:
```python
>>> import pyplink
>>> pyplink.test()
.......................
............................................
----------------------------------------------------------------------
Ran 23 tests in 0.149s
Ran 44 tests in 0.468s

OK
```
Expand Down
159 changes: 153 additions & 6 deletions demo/PyPlink Demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@
" * [*Counting the allele frequency of markers*](#Counting-the-allele-frequency-of-markers)\n",
"\n",
"\n",
"* [**Writing binary pedfile**](#Writing-binary-pedfile)"
"* [**Writing binary pedfile**](#Writing-binary-pedfile)\n",
" * [SNP-major format](#SNP-major-format)\n",
" * [INDIVIDUAL-major-format](#INDIVIDUAL-major-format)"
]
},
{
Expand Down Expand Up @@ -823,7 +825,9 @@
"source": [
"## Writing binary pedfile\n",
"\n",
"The following examples shows how to write a binary file using the `PyPlink` module.\n",
"### *SNP-major* format\n",
"\n",
"The following examples shows how to write a binary file using the `PyPlink` module. The *SNP-major* format is the default. It means that the binary file is written one marker at a time.\n",
"\n",
"> Note that `PyPlink` only writes the `BED` file. The user is required to create the `FAM` and `BIM` files."
]
Expand All @@ -832,7 +836,7 @@
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
"collapsed": false
},
"outputs": [],
"source": [
Expand All @@ -846,7 +850,7 @@
"# Writing the BED file using PyPlink\n",
"with PyPlink(\"test_output\", \"w\") as pedfile:\n",
" for genotypes in all_genotypes:\n",
" pedfile.write_marker(genotypes)\n",
" pedfile.write_genotypes(genotypes)\n",
"\n",
"# Writing a dummy FAM file\n",
"with open(\"test_output.fam\", \"w\") as fam_file:\n",
Expand Down Expand Up @@ -1142,7 +1146,7 @@
"\n",
"Skipping web check... [ --noweb ] \n",
"Writing this text to log file [ plink.log ]\n",
"Analysis started: Tue Nov 3 11:12:07 2015\n",
"Analysis started: Wed Nov 4 14:50:58 2015\n",
"\n",
"Options in effect:\n",
"\t--noweb\n",
Expand All @@ -1167,7 +1171,7 @@
"10 founders and 0 non-founders found\n",
"Writing allele frequencies (founders-only) to [ plink.frq ] \n",
"\n",
"Analysis finished: Tue Nov 3 11:12:10 2015\n",
"Analysis finished: Wed Nov 4 14:50:58 2015\n",
"\n"
]
}
Expand Down Expand Up @@ -1205,6 +1209,149 @@
"with open(\"plink.frq\", \"r\") as i_file:\n",
" print(i_file.read(), end=\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### *INDIVIDUAL-major* format\n",
"\n",
"The following examples shows how to write a binary file using the `PyPlink` module. The *INDIVIDUAL-major* format means that the binary file is written one sample at a time.\n",
"\n",
"**Files in *INDIVIDUAL-major* format is not readable by `PyPlink`.** You need to convert it using *Plink*.\n",
"\n",
"> Note that `PyPlink` only writes the `BED` file. The user is required to create the `FAM` and `BIM` files."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# The genotypes for 3 markers and 10 samples (INDIVIDUAL-major)\n",
"all_genotypes = [\n",
" [ 0, 0, 0],\n",
" [ 0, 0, 0],\n",
" [ 0, 1, 0],\n",
" [ 1, 1, 0],\n",
" [ 0, 0, 1],\n",
" [ 0, 0, 1],\n",
" [-1, 0, 0],\n",
" [ 2, 1, 0],\n",
" [ 1, 2, 0],\n",
" [ 0, 0, 1],\n",
"]\n",
"\n",
"# Writing the BED file using PyPlink\n",
"with PyPlink(\"test_output_2\", \"w\", bed_format=\"INDIVIDUAL-major\") as pedfile:\n",
" for genotypes in all_genotypes:\n",
" pedfile.write_genotypes(genotypes)\n",
"\n",
"# Writing a dummy FAM file\n",
"with open(\"test_output_2.fam\", \"w\") as fam_file:\n",
" for i in range(10):\n",
" print(\"family_{}\".format(i+1), \"sample_{}\".format(i+1), \"0\", \"0\", \"0\", \"-9\",\n",
" sep=\" \", file=fam_file)\n",
"\n",
"# Writing a dummy BIM file\n",
"with open(\"test_output_2.bim\", \"w\") as bim_file:\n",
" for i in range(3):\n",
" print(\"1\", \"marker_{}\".format(i+1), \"0\", i+1, \"A\", \"T\",\n",
" sep=\"\\t\", file=bim_file)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"@----------------------------------------------------------@\n",
"| PLINK! | v1.07 | 10/Aug/2009 |\n",
"|----------------------------------------------------------|\n",
"| (C) 2009 Shaun Purcell, GNU General Public License, v2 |\n",
"|----------------------------------------------------------|\n",
"| For documentation, citation & bug-report instructions: |\n",
"| http://pngu.mgh.harvard.edu/purcell/plink/ |\n",
"@----------------------------------------------------------@\n",
"\n",
"Skipping web check... [ --noweb ] \n",
"Writing this text to log file [ plink_2.log ]\n",
"Analysis started: Wed Nov 4 14:50:58 2015\n",
"\n",
"Options in effect:\n",
"\t--noweb\n",
"\t--bfile test_output_2\n",
"\t--freq\n",
"\t--out plink_2\n",
"\n",
"Reading map (extended format) from [ test_output_2.bim ] \n",
"3 markers to be included from [ test_output_2.bim ]\n",
"Reading pedigree information from [ test_output_2.fam ] \n",
"10 individuals read from [ test_output_2.fam ] \n",
"0 individuals with nonmissing phenotypes\n",
"Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)\n",
"Missing phenotype value is also -9\n",
"0 cases, 0 controls and 10 missing\n",
"0 males, 0 females, and 10 of unspecified sex\n",
"Warning, found 10 individuals with ambiguous sex codes\n",
"These individuals will be set to missing ( or use --allow-no-sex )\n",
"Writing list of these individuals to [ plink_2.nosex ]\n",
"Reading genotype bitfile from [ test_output_2.bed ] \n",
"Detected that binary PED file is v1.00 individual-major mode\n",
"Before frequency and genotyping pruning, there are 3 SNPs\n",
"Converting data to SNP-major format\n",
"10 founders and 0 non-founders found\n",
"Writing allele frequencies (founders-only) to [ plink_2.frq ] \n",
"\n",
"Analysis finished: Wed Nov 4 14:50:58 2015\n",
"\n"
]
}
],
"source": [
"from subprocess import Popen, PIPE\n",
"\n",
"# Computing frequencies\n",
"proc = Popen([\"plink\", \"--noweb\", \"--bfile\", \"test_output_2\", \"--freq\", \"--out\", \"plink_2\"],\n",
" stdout=PIPE, stderr=PIPE)\n",
"outs, errs = proc.communicate()\n",
"print(outs.decode(), end=\"\")"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" CHR SNP A1 A2 MAF NCHROBS\n",
" 1 marker_1 A T 0.2222 18\n",
" 1 marker_2 A T 0.25 20\n",
" 1 marker_3 A T 0.15 20\n"
]
}
],
"source": [
"with open(\"plink_2.frq\", \"r\") as i_file:\n",
" print(i_file.read(), end=\"\")"
]
}
],
"metadata": {
Expand Down
Loading

0 comments on commit 87c1989

Please sign in to comment.