diff --git a/README.md b/README.md index f34cc1d..3a99244 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,3 @@ # PolrGWAS + Genome-wide association studies (GWAS) for ordered categorical responses diff --git a/REQUIRE b/REQUIRE index 8e7734d..d7d7932 100644 --- a/REQUIRE +++ b/REQUIRE @@ -1,3 +1,7 @@ +CSV +DataFrames +Distributions PolrModels +Reexport SnpArrays StatsModels diff --git a/docs/polrgwas.ipynb b/docs/polrgwas.ipynb index 13d8efd..78304c2 100644 --- a/docs/polrgwas.ipynb +++ b/docs/polrgwas.ipynb @@ -11,24 +11,97 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "PolrGWAS.jl is a Julia package for performing genome-wide association studies for ordered categorical phenotypes. Install the package by\n", + "PolrGWAS.jl is a Julia package for performing genome-wide association studies for ordered categorical phenotypes. \n", + "\n", + "This package requires Julia v0.7.0 or later. The package has not yet been registered and must be installed using the repository location. Start julia and use the ] key to switch to the package manager REPL\n", "```julia\n", - "Pkg.clone(\"git@github.com:Hua-Zhou/PolrGWAS.git\")\n", + "(v0.7) pkg> add https://github.com/Hua-Zhou/PolrGWAS.git#juliav0.7\n", "```" ] }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Julia Version 0.7.0\n", + "Commit a4cb80f3ed (2018-08-08 06:46 UTC)\n", + "Platform Info:\n", + " OS: macOS (x86_64-apple-darwin14.5.0)\n", + " CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz\n", + " WORD_SIZE: 64\n", + " LIBM: libopenlibm\n", + " LLVM: libLLVM-6.0.0 (ORCJIT, skylake)\n", + "Environment:\n", + " JULIA_EDITOR = code\n" + ] + } + ], + "source": [ + "versioninfo()" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "using BenchmarkTools, CSV, PolrGWAS, SnpArrays" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic usage\n", "\n", - "Suppose covariates and phenotype are available in a csv file `covariate.txt`. Variable `trait` is the ordered categorical phenotypes coded as integers 1 to 4. We want to include variable `sex` as the covariate in GWAS." + "The following command performs GWAS using the proportional odds logistic regression." ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For documentation, type `?polrgwas` in Julia REPL." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Formula\n", + "\n", + "In the formula, e.g., `@formula(trait ~ 0 + sex)`, it's important to **exclude** the intercept because proportional odds model automatically incorporates intercepts for modeling purpose." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Input files\n", + "\n", + "`polrgwas` expects two input files: one for responses and covariates, the other the Plink files for genotypes.\n", + "\n", + "Covariates and phenotype are available in a csv file `covariate.txt`, which has one header line for variable names. Variable `trait` is the ordered categorical phenotypes coded as integers 1 to 4. We want to include variable `sex` as the covariate in GWAS." + ] + }, + { + "cell_type": "code", + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -71,119 +144,561 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "-rw-r--r-- 1 huazhou staff 1128171 Jun 19 2017 ../data/hapmap3.bed\n", - "-rw-r--r-- 1 huazhou staff 388672 Jun 19 2017 ../data/hapmap3.bim\n", - "-rw-r--r-- 1 huazhou staff 7136 Jun 19 2017 ../data/hapmap3.fam\n", - "-rw-r--r-- 1 huazhou staff 332960 Jun 19 2017 ../data/hapmap3.map\n" + "-rw-r--r-- 1 huazhou staff 1128171 Sep 4 12:03 ../data/hapmap3.bed\n", + "-rw-r--r-- 1 huazhou staff 388672 Sep 4 12:03 ../data/hapmap3.bim\n", + "-rw-r--r-- 1 huazhou staff 7136 Sep 4 12:03 ../data/hapmap3.fam\n" ] } ], "source": [ - ";ls -l ../data/hapmap3.\"*\"" + ";ls -l ../data/hapmap3.bed ../data/hapmap3.bim ../data/hapmap3.fam" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The following command performs GWAS using the proportional odds logistic regression." + "There are 324 samples at 13,928 SNPs." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(324, 13928)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "size(SnpArray(\"../data/hapmap3.bed\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Output files" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`polrgwas` outputs two files: `polrgwas.nullmodel.txt` and `polrgwas.scoretest.txt`. \n", + "\n", + "* `polrgwas.nullmodel.txt` lists the estimated null model (without SNPs). " + ] + }, + { + "cell_type": "code", + "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "StatsModels.DataFrameRegressionModel{PolrModels.PolrModel{Int64,Float64,GLM.LogitLink},Array{Float64,2}}\n", + "StatsModels.DataFrameRegressionModel{PolrModel{Int64,Float64,LogitLink},Array{Float64,2}}\n", "\n", "Formula: trait ~ +sex\n", "\n", "Coefficients:\n", " Estimate Std.Error t value Pr(>|t|)\n", - "θ1 -1.48554 0.358711 -4.14131 <1e-4\n", - "θ2 -0.569325 0.340645 -1.67132 0.0956\n", - "θ3 0.429823 0.339263 1.26693 0.2061\n", - "β1 0.424691 0.21391 1.98537 0.0480\n" + "θ1 -1.48564 0.358713 -4.14157 <1e-4\n", + "θ2 -0.569479 0.340649 -1.67175 0.0956\n", + "θ3 0.429815 0.339266 1.2669 0.2061\n", + "β1 0.424656 0.213911 1.9852 0.0480\n" ] - }, + } + ], + "source": [ + ";cat polrgwas.nullmodel.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* `polrgwas.scoretest.txt` tallies the SNPs and their pvalues. " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ { - "name": "stderr", + "name": "stdout", "output_type": "stream", "text": [ - "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mv1.0 BED file detected\n", - "\u001b[39m" + "chr,pos,snpid,maf,pval\n", + "1,554484,rs10458597,0.0,1.0\n", + "1,758311,rs12562034,0.07763975155279501,0.003001230754791795\n", + "1,967643,rs2710875,0.32407407407407407,2.5117214960313727e-5\n", + "1,1168108,rs11260566,0.19158878504672894,1.13731122530895e-5\n", + "1,1375074,rs1312568,0.441358024691358,0.008317358366815325\n", + "1,1588771,rs35154105,0.0,1.0\n", + "1,1789051,rs16824508,0.00462962962962965,0.5274428530031774\n", + "1,1990452,rs2678939,0.4537037037037037,0.29988429741739986\n", + "1,2194615,rs7553178,0.22685185185185186,0.16436415589171793\n" ] - }, + } + ], + "source": [ + ";head polrgwas.scoretest.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The prefix `polrgwas` can be changed by the `outfile` keyword, e.g.,\n", + "```julia\n", + "polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\", outfile=\"testdata\")\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this moderate-sized data set, `polrgwas` takes less than 0.2 second." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - " 0.804299 seconds (10.19 M allocations: 183.621 MiB, 22.96% gc time)\n" + " 128.138 ms (639488 allocations: 29.14 MiB)\n" ] } ], "source": [ - "using PolrGWAS\n", + "@btime(polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\"))" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "rm(\"polrgwas.scoretest.txt\")\n", + "rm(\"polrgwas.nullmodel.txt\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Link functions\n", "\n", - "@time polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\";\n", - " covartype = [String, String, String, String, Float64, Int])" + "The `link` keyword argument of `polrgwas` can take value `LogitLink()` (default), `ProbitLink()` (ordred Probit model), or `CloglogLink()` (proportional hazards model).\n", + "\n", + "E.g., to perform GWAS using the ordred Probit model" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\", link=ProbitLink(), outfile=\"opm\")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "StatsModels.DataFrameRegressionModel{PolrModel{Int64,Float64,ProbitLink},Array{Float64,2}}\n", + "\n", + "Formula: trait ~ +sex\n", + "\n", + "Coefficients:\n", + " Estimate Std.Error t value Pr(>|t|)\n", + "θ1 -0.866156 0.210746 -4.10995 <1e-4\n", + "θ2 -0.359878 0.20552 -1.75106 0.0809\n", + "θ3 0.247054 0.205135 1.20435 0.2293\n", + "β1 0.251058 0.128212 1.95814 0.0511\n" + ] + } + ], + "source": [ + ";cat opm.nullmodel.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "chr,pos,snpid,maf,pval\n", + "1,554484,rs10458597,0.0,1.0\n", + "1,758311,rs12562034,0.07763975155279501,0.0064504259191418\n", + "1,967643,rs2710875,0.32407407407407407,1.578504254844719e-5\n", + "1,1168108,rs11260566,0.19158878504672894,4.979075119252628e-6\n", + "1,1375074,rs1312568,0.441358024691358,0.004566574021452637\n", + "1,1588771,rs35154105,0.0,1.0\n", + "1,1789051,rs16824508,0.00462962962962965,0.4819721449163142\n", + "1,1990452,rs2678939,0.4537037037037037,0.33240414006705726\n", + "1,2194615,rs7553178,0.22685185185185186,0.25299085946476113\n" + ] + } + ], + "source": [ + ";head opm.scoretest.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "rm(\"opm.nullmodel.txt\")\n", + "rm(\"opm.scoretest.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "It outputs two files: \n", - "* `polrgwas.nullmodel.txt` lists the estimated regression model. " + "## SNP and/or sample masks\n", + "\n", + "In practice, we often perform GWAS on selected SNPs and/or selected samples. They can be specified by the `colinds` and `rowinds` keywords of `polrgwas` function.\n", + "\n", + "For example, to perform GWAS on SNPs with minor allele frequency (MAF) above 0.05" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "13928-element BitArray{1}:\n", + " false\n", + " true\n", + " true\n", + " true\n", + " true\n", + " false\n", + " false\n", + " true\n", + " true\n", + " true\n", + " false\n", + " true\n", + " true\n", + " ⋮\n", + " true\n", + " true\n", + " true\n", + " true\n", + " true\n", + " true\n", + " true\n", + " true\n", + " true\n", + " false\n", + " false\n", + " false" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# create SNP mask\n", + "snpinds = maf(SnpArray(\"../data/hapmap3.bed\")) .≥ 0.05" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 0.226078 seconds (829.18 k allocations: 39.024 MiB, 3.50% gc time)\n" + ] + } + ], + "source": [ + "# GWAS on selected SNPs\n", + "@time polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\", colinds = snpinds, outfile=\"commonvariant\")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "StatsModels.DataFrameRegressionModel{PolrModels.PolrModel{Int64,Float64,GLM.LogitLink},Array{Float64,2}}\n", + "StatsModels.DataFrameRegressionModel{PolrModel{Int64,Float64,LogitLink},Array{Float64,2}}\n", "\n", "Formula: trait ~ +sex\n", "\n", "Coefficients:\n", " Estimate Std.Error t value Pr(>|t|)\n", - "θ1 -1.48554 0.358711 -4.14131 <1e-4\n", - "θ2 -0.569325 0.340645 -1.67132 0.0956\n", - "θ3 0.429823 0.339263 1.26693 0.2061\n", - "β1 0.424691 0.21391 1.98537 0.0480\n" + "θ1 -1.48564 0.358713 -4.14157 <1e-4\n", + "θ2 -0.569479 0.340649 -1.67175 0.0956\n", + "θ3 0.429815 0.339266 1.2669 0.2061\n", + "β1 0.424656 0.213911 1.9852 0.0480\n" ] } ], "source": [ - ";cat polrgwas.nullmodel.txt" + ";cat commonvariant.nullmodel.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "chr,pos,snpid,maf,pval\n", + "1,758311,rs12562034,0.07763975155279501,0.003001230754791795\n", + "1,967643,rs2710875,0.32407407407407407,2.5117214960313727e-5\n", + "1,1168108,rs11260566,0.19158878504672894,1.13731122530895e-5\n", + "1,1375074,rs1312568,0.441358024691358,0.008317358366815325\n", + "1,1990452,rs2678939,0.4537037037037037,0.29988429741739986\n", + "1,2194615,rs7553178,0.22685185185185186,0.16436415589171793\n", + "1,2396747,rs13376356,0.1448598130841121,0.5372089713885622\n", + "1,2823603,rs1563468,0.4830246913580247,0.23123684490822347\n", + "1,3025087,rs6690373,0.2538699690402477,0.700092366400877\n", + "1,3431124,rs12093117,0.1099071207430341,0.4271132018338309\n", + "1,3633945,rs10910017,0.22187500000000004,0.9141485352635613\n", + "1,4096895,rs6702633,0.4752321981424149,0.006373000780228055\n", + "1,4297388,rs684965,0.3055555555555556,0.09402646589417148\n", + "1,4498133,rs11809295,0.0993788819875776,0.0856953578572345\n", + "1,4698713,rs578528,0.32407407407407407,0.06883563182592009\n", + "1,4899946,rs4654471,0.3580246913580247,0.2267196199789606\n", + "1,5100369,rs6681148,0.13157894736842102,0.16154955342712946\n", + "1,5302730,rs10799197,0.4287925696594427,0.6769491555716189\n", + "1,5502779,rs10796400,0.2314814814814815,0.2449957392353961\n" + ] + } + ], + "source": [ + ";head -20 commonvariant.scoretest.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(countlines(\"commonvariant.scoretest.txt\"), count(snpinds)) = (12086, 12085)\n" + ] + }, + { + "data": { + "text/plain": [ + "(12086, 12085)" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# extra headline in commonvariant.scoretest.txt\n", + "@show countlines(\"commonvariant.scoretest.txt\"), count(snpinds)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "rm(\"commonvariant.scoretest.txt\")\n", + "rm(\"commonvariant.nullmodel.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "* `polrgwas.scoretest.txt` lists the SNPs and their pvalues. " + "User should be particularly careful when using the `rowinds` keyword. Selected rows in SnpArray should exactly match the samples in the null model. Otherwise the results are meaningless." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Likelihood ratio test (LRT)\n", + "\n", + "By default, `polrgwas` calculates the score test p-value for each SNP. Score test is fast because it doesn't require fitting alternative model. User can request likelihood ratio test (LRT) using keyword `test=:LRT`. LRT is much slower but may be more powerful than score test." ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 19.778470 seconds (8.79 M allocations: 2.064 GiB, 1.94% gc time)\n" + ] + } + ], + "source": [ + "@time polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\", test=:LRT, outfile=\"lrt\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Test result is output to `outfile.lrttest.txt` file" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "chr,pos,snpid,maf,effect,pval\n", + "1,554484,rs10458597,0.0,0.0,1.0\n", + "1,758311,rs12562034,0.07763975155279501,-1.0057833719544362,0.0019185836579804134\n", + "1,967643,rs2710875,0.32407407407407407,-0.6488560566295053,1.805050556976133e-5\n", + "1,1168108,rs11260566,0.19158878504672894,-0.9157225669357916,5.8733847126852225e-6\n", + "1,1375074,rs1312568,0.441358024691358,-0.3318136652577252,0.008081022577831297\n", + "1,1588771,rs35154105,0.0,0.0,1.0\n", + "1,1789051,rs16824508,0.00462962962962965,-0.7338026388700327,0.5169027130128576\n", + "1,1990452,rs2678939,0.4537037037037037,-0.1358649923181972,0.29946402200910055\n", + "1,2194615,rs7553178,0.22685185185185186,-0.2512075640440133,0.16151069094439868\n", + "1,2396747,rs13376356,0.1448598130841121,0.12946142026273552,0.5387338201469207\n", + "1,2623158,rs28753913,0.0,0.0,1.0\n", + "1,2823603,rs1563468,0.4830246913580247,0.15515329587697385,0.2312300208157546\n", + "1,3025087,rs6690373,0.2538699690402477,-0.05966638389967735,0.6995722170701131\n", + "1,3225416,rs12043519,0.029320987654321007,1.1761887120778431,0.002016744167744886\n", + "1,3431124,rs12093117,0.1099071207430341,0.18242332995458113,0.43052012312944177\n", + "1,3633945,rs10910017,0.22187500000000004,-0.017935692627049582,0.9142024828415289\n", + "1,3895935,rs34770924,0.024691358024691357,0.0009448575482692606,0.9980482910119114\n", + "1,4096895,rs6702633,0.4752321981424149,0.4230874102563219,0.0063052493446845445\n", + "1,4297388,rs684965,0.3055555555555556,-0.27091872232225395,0.09226258176075279\n" + ] + } + ], + "source": [ + ";head -20 lrt.lrttest.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note the extra `effect` column, which is the effect size (regression coefficient) for each SNP. " + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "rm(\"lrt.lrttest.txt\")\n", + "rm(\"lrt.nullmodel.txt\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this case, GWAS by score test takes less than 0.2 second, while GWAS by LRT takes about 20 seconds. About 100 fold difference in run time. \n", + "\n", + "## Score for screening, LRT for power \n", + "\n", + "For large data sets, a practical solution is to perform score test first, then re-do LRT for the most promising SNPs according to score test p-values.\n", + "\n", + "**Step 1**: Perform score test GWAS, results in hapmap.scoretest.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 0.232919 seconds (717.09 k allocations: 33.227 MiB, 11.72% gc time)\n" + ] + } + ], + "source": [ + "@time polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\", test=:score, outfile=\"hapmap\", verbose=false)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, "metadata": {}, "outputs": [ { @@ -192,47 +707,377 @@ "text": [ "chr,pos,snpid,maf,pval\n", "1,554484,rs10458597,0.0,1.0\n", - "1,758311,rs12562034,0.07763975155279501,0.003438362807748468\n", - "1,967643,rs2710875,0.32407407407407407,2.5147324918949065e-5\n", - "1,1168108,rs11260566,0.19158878504672894,1.2475005761914856e-5\n", - "1,1375074,rs1312568,0.441358024691358,0.00832247910423294\n", + "1,758311,rs12562034,0.07763975155279501,0.003001230754791795\n", + "1,967643,rs2710875,0.32407407407407407,2.5117214960313727e-5\n", + "1,1168108,rs11260566,0.19158878504672894,1.13731122530895e-5\n", + "1,1375074,rs1312568,0.441358024691358,0.008317358366815325\n", "1,1588771,rs35154105,0.0,1.0\n", - "1,1789051,rs16824508,0.00462962962962965,0.5281305094274626\n", - "1,1990452,rs2678939,0.4537037037037037,0.2999869793592329\n", - "1,2194615,rs7553178,0.22685185185185186,0.16443985731509447\n" + "1,1789051,rs16824508,0.00462962962962965,0.5274428530031774\n", + "1,1990452,rs2678939,0.4537037037037037,0.29988429741739986\n", + "1,2194615,rs7553178,0.22685185185185186,0.16436415589171793\n", + "1,2396747,rs13376356,0.1448598130841121,0.5372089713885622\n", + "1,2623158,rs28753913,0.0,1.0\n", + "1,2823603,rs1563468,0.4830246913580247,0.23123684490822347\n", + "1,3025087,rs6690373,0.2538699690402477,0.700092366400877\n", + "1,3225416,rs12043519,0.029320987654321007,0.001067589332379961\n", + "1,3431124,rs12093117,0.1099071207430341,0.4271132018338309\n", + "1,3633945,rs10910017,0.22187500000000004,0.9141485352635613\n", + "1,3895935,rs34770924,0.024691358024691357,0.9989137356071247\n", + "1,4096895,rs6702633,0.4752321981424149,0.006373000780228055\n", + "1,4297388,rs684965,0.3055555555555556,0.09402646589417148\n" ] } ], "source": [ - ";head polrgwas.scoretest.txt" + ";head -20 hapmap.scoretest.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Input files" + "**Step 2**: Sort score test p-values and find top 10 SNPs" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "13928-element Array{Union{Missing, Float64},1}:\n", + " 1.0 \n", + " 0.003001230754791795 \n", + " 2.5117214960313727e-5\n", + " 1.13731122530895e-5 \n", + " 0.008317358366815325 \n", + " 1.0 \n", + " 0.5274428530031774 \n", + " 0.29988429741739986 \n", + " 0.16436415589171793 \n", + " 0.5372089713885622 \n", + " 1.0 \n", + " 0.23123684490822347 \n", + " 0.700092366400877 \n", + " ⋮ \n", + " 0.4533525658437616 \n", + " 0.6409263039143044 \n", + " 0.15819888818088343 \n", + " 0.5643246521154451 \n", + " 0.5016172691333742 \n", + " 0.16467049275061096 \n", + " 0.6977871465129508 \n", + " 0.9120338117720298 \n", + " 0.6977871465129944 \n", + " 0.3136740371197593 \n", + " 0.2477983233224063 \n", + " 0.24848633880732612 " + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scorepvals = CSV.read(\"hapmap.scoretest.txt\")[5]" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "10-element Array{Int64,1}:\n", + " 6063\n", + " 13544\n", + " 5071\n", + " 2458\n", + " 4\n", + " 3291\n", + " 13737\n", + " 4183\n", + " 3\n", + " 12727" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tophits = sortperm(scorepvals)[1:10]" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "10-element Array{Union{Missing, Float64},1}:\n", + " 1.4416637280589482e-6\n", + " 5.127005866008025e-6 \n", + " 5.910845635149386e-6 \n", + " 8.756466364522906e-6 \n", + " 1.13731122530895e-5 \n", + " 1.7228784857112124e-5\n", + " 2.008908938410486e-5 \n", + " 2.283909776161149e-5 \n", + " 2.5117214960313727e-5\n", + " 2.949378996752777e-5 " + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scorepvals[tophits]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Output files" + "**Step 3**: Re-do LRT on top hits." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 0.186250 seconds (410.55 k allocations: 21.398 MiB, 4.34% gc time)\n" + ] + } + ], + "source": [ + "@time polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\", colinds=tophits, test=:LRT, outfile=\"hapmap\", verbose=false)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "chr,pos,snpid,maf,effect,pval\n", + "1,967643,rs2710875,0.32407407407407407,-0.6488560566295053,1.805050556976133e-5\n", + "1,1168108,rs11260566,0.19158878504672894,-0.9157225669357916,5.8733847126852225e-6\n", + "3,36821790,rs4678553,0.23456790123456794,0.7424952268973525,1.1303825016262592e-5\n", + "4,11017683,rs16881446,0.27554179566563464,-0.7870581482955515,1.1105427468799613e-5\n", + "5,3739190,rs12521166,0.0679012345679012,1.1468852997925303,4.7812882296576845e-5\n", + "6,7574576,rs1885466,0.17746913580246915,0.8750621092263018,7.272346896740631e-6\n", + "7,41152376,rs28880,0.3379629629629629,-0.8146339024453509,9.180126530294943e-7\n", + "20,39225952,rs2076145,0.04475308641975306,1.4412437719976467,5.94831595593157e-5\n", + "23,81247423,rs5923282,0.0030864197530864335,24.22432454878159,0.0016846467294181053\n", + "23,121048059,rs1937165,0.4380804953560371,0.5392313636256603,1.9754643855527356e-5\n" + ] + } + ], + "source": [ + ";head -20 hapmap.lrttest.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "rm(\"hapmap.nullmodel.txt\")\n", + "rm(\"hapmap.lrttest.txt\")\n", + "rm(\"hapmap.scoretest.txt\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "## GxE or other interactions\n", + "\n", + "In many applications, we want to test SNP effect and/or its interaction with other terms. `testformula` keyword specifies the test unit **besides** the covariates in `nullformula`. \n", + "\n", + "In following example, keyword `testformula=@formula(trait ~ 0 + snp + snp & sex)` instructs `polrgwas` to test joint effect of `snp` and `snp & sex` interaction." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 2.483772 seconds (8.83 M allocations: 732.630 MiB, 5.93% gc time)\n" + ] + } + ], + "source": [ + "@time polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\", \n", + " outfile=\"GxE\", testformula=@formula(trait ~ 0 + snp + snp & sex))" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "chr,pos,snpid,maf,pval\n", + "1,554484,rs10458597,0.0,1.0\n", + "1,758311,rs12562034,0.07763975155279501,0.011788370857064464\n", + "1,967643,rs2710875,0.32407407407407407,0.0001389561913412981\n", + "1,1168108,rs11260566,0.19158878504672894,4.4462425016377604e-5\n", + "1,1375074,rs1312568,0.441358024691358,0.029488132169322546\n", + "1,1588771,rs35154105,0.0,1.0\n", + "1,1789051,rs16824508,0.00462962962962965,0.2922197942259074\n", + "1,1990452,rs2678939,0.4537037037037037,0.3766426135602989\n", + "1,2194615,rs7553178,0.22685185185185186,0.30692923752492324\n", + "1,2396747,rs13376356,0.1448598130841121,0.8215334624145896\n", + "1,2623158,rs28753913,0.0,1.0\n", + "1,2823603,rs1563468,0.4830246913580247,0.3959379791900657\n", + "1,3025087,rs6690373,0.2538699690402477,0.9284920594746979\n", + "1,3225416,rs12043519,0.029320987654321007,0.0046530294043420975\n", + "1,3431124,rs12093117,0.1099071207430341,0.47569653308676063\n", + "1,3633945,rs10910017,0.22187500000000004,0.9880330237520744\n", + "1,3895935,rs34770924,0.024691358024691357,0.3750554140393971\n", + "1,4096895,rs6702633,0.4752321981424149,0.012651490097329717\n", + "1,4297388,rs684965,0.3055555555555556,0.1477240111436921\n" + ] + } + ], + "source": [ + ";head -20 GxE.scoretest.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [], + "source": [ + "rm(\"GxE.nullmodel.txt\")\n", + "rm(\"GxE.scoretest.txt\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To test the linear and quadratic SNP effects jointly" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 1.959710 seconds (7.51 M allocations: 526.572 MiB, 5.96% gc time)\n" + ] + } + ], + "source": [ + "@time polrgwas(@formula(trait ~ 0 + sex), \"../data/covariate.txt\", \"../data/hapmap3\", \n", + " outfile=\"quadratic\", testformula=@formula(trait ~ 0 + snp + snp & snp))" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "chr,pos,snpid,maf,pval\n", + "1,554484,rs10458597,0.0,1.0\n", + "1,758311,rs12562034,0.07763975155279501,0.012236157097696293\n", + "1,967643,rs2710875,0.32407407407407407,1.0\n", + "1,1168108,rs11260566,0.19158878504672894,6.553991614163971e-5\n", + "1,1375074,rs1312568,0.441358024691358,1.0\n", + "1,1588771,rs35154105,0.0,1.0\n", + "1,1789051,rs16824508,0.00462962962962965,1.0\n", + "1,1990452,rs2678939,0.4537037037037037,1.0\n", + "1,2194615,rs7553178,0.22685185185185186,1.0\n", + "1,2396747,rs13376356,0.1448598130841121,1.0\n", + "1,2623158,rs28753913,0.0,1.0\n", + "1,2823603,rs1563468,0.4830246913580247,1.0\n", + "1,3025087,rs6690373,0.2538699690402477,0.9284972163883947\n", + "1,3225416,rs12043519,0.029320987654321007,1.0\n", + "1,3431124,rs12093117,0.1099071207430341,1.0\n", + "1,3633945,rs10910017,0.22187500000000004,1.0\n", + "1,3895935,rs34770924,0.024691358024691357,1.0\n", + "1,4096895,rs6702633,0.4752321981424149,1.0\n", + "1,4297388,rs684965,0.3055555555555556,1.0\n" + ] + } + ], + "source": [ + ";head -20 quadratic.scoretest.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "rm(\"quadratic.nullmodel.txt\")\n", + "rm(\"quadratic.scoretest.txt\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Docker\n", + "\n", + "For ease of using PolrGWAS, we provides a Dockerfile so users don't need to install Julia and required packages. Only Docker app needs to be installed. Following is tested on ???" ] } ], "metadata": { "kernelspec": { - "display_name": "Julia 0.6.2", + "display_name": "Julia 0.7.0", "language": "julia", - "name": "julia-0.6" + "name": "julia-0.7" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", - "version": "0.6.2" + "version": "0.7.0" }, "toc": { "colors": { diff --git a/src/PolrGWAS.jl b/src/PolrGWAS.jl index 6eba261..12b6029 100644 --- a/src/PolrGWAS.jl +++ b/src/PolrGWAS.jl @@ -1,12 +1,12 @@ +__precompile__() + module PolrGWAS -using CSV, DataFrames, Reexport, SnpArrays +using CSV, DataFrames, Distributions, Reexport, SnpArrays @reexport using PolrModels +export polrgwas -export - polrgwas - -include("polrgwas_score.jl") +include("gwas.jl") end diff --git a/src/gwas.jl b/src/gwas.jl new file mode 100644 index 0000000..e4375a0 --- /dev/null +++ b/src/gwas.jl @@ -0,0 +1,153 @@ +""" + polrgwas(nullformula, covfile, plkfile) + polrgwas(nullformula, df, plkfile) + +# Positional arguments +- `nullformula::Formula`: formula for the null model. +- `covfile::AbstractString`: covariate file with one header line. One column + should be the ordered categorical phenotype coded as integers starting from 1. +- `df::DataFrame`: DataFrame containing response and regressors. +- `plkfile::AbstractString`: Plink file name without the bed, fam, or bim + extensions. If `plkfile==nothing`, only null model is fitted. + +# Keyword arguments +- `outfile::AbstractString`: output file prefix; default is `"polrgwas"``. Two output files + `prefix.nullmodel.txt` and `prefix.scoretest.txt` (or `prefix.lrttest.txt`) will be written. +- `covtype::Vector{DataType}`: type information for `covarfile`. This is useful + when `CSV.read(covarfile)` has parsing errors. +- `testformula::Formula`: formula for test unit. Default is `@formula(~ 0 + snp)`. +- `test::Symbol`: `:score` (default) or `:LRT`. +- `link::GLM.Link`: `LogitLink()` (default), `ProbitLink()`, `CauchitLink()`, + or `CloglogLink()` +- `colinds::Union{Nothing,AbstractVector{<:Integer}}`: SNP indices. +- `rowinds::Union{Nothing,AbstractVector{<:Integer}}`: sample indices for bed file. +- `solver`: A solver supported by MathProgBase. Default is `NLoptSolver(algorithm=:LD_SLSQP, maxeval=4000)`. + Another common choice is `IpoptSolver(print_level=0)`. +- `verbose::Bool`: default is `false`. +""" +function polrgwas( + # positional arguments + nullformula::Formula, + covfile::AbstractString, + plkfile::Union{Nothing,AbstractString} = nothing; + # keyword arguments + covtype::Union{Nothing,Vector{DataType}} = nothing, + kwargs... + ) + covdf = CSV.read(covfile; types=covtype) + polrgwas(nullformula, covdf, plkfile; kwargs...) +end + +function polrgwas( + # positional arguments + nullformula::Formula, + df::DataFrame, + plkfile::Union{Nothing,AbstractString} = nothing; + # keyword arguments + testformula::Formula=@eval(@formula($(nullformula.lhs) ~ 0 + snp)), + outfile::AbstractString = "polrgwas", + link::GLM.Link = LogitLink(), + test::Symbol = :score, + colinds::Union{Nothing,AbstractVector{<:Integer}} = nothing, + rowinds::Union{Nothing,AbstractVector{<:Integer}} = nothing, + solver = NLoptSolver(algorithm=:LD_SLSQP, maxeval=4000), + verbose::Bool = false + ) + # fit null model + nm = polr(nullformula, df, link, solver) + verbose && show(nm) + open(outfile * ".nullmodel.txt", "w") do io + show(io, nm) + end + plkfile == nothing && (return nothing) + # selected rows should match nobs in null model + rinds = something(rowinds, 1:countlines(plkfile * ".fam")) + nrows = eltype(rinds) == Bool ? count(rinds) : length(rinds) + nrows == nobs(nm) || throw(ArgumentError("number of samples in SnpArray does not match null model")) + # create SNP mask vector + if colinds == nothing + cmask = trues(countlines(plkfile * ".bim")) + elseif eltype(colinds) == Bool + cmask = colinds + else + cmask = falses(countlines(plkfile * ".bim")) + cmask[colinds] .= true + end + # dataframe for alternative model + dfalt = df + dfalt[:snp] = zeros(size(df, 1)) + # extra columns in design matrix to be tested + Z = similar(ModelMatrix(ModelFrame(testformula, dfalt)).m) + # carry out score or LRT test SNP by SNP + snponly = testformula.rhs == :(0 + snp) + genomat = SnpArrays.SnpArray(plkfile * ".bed") + mafreq = SnpArrays.maf(genomat) # TODO: need to calibrate according to rowinds + if test == :score + ts = PolrScoreTest(nm.model, Z) + open(outfile * ".scoretest.txt", "w") do io + println(io, "chr,pos,snpid,maf,pval") + for (j, row) in enumerate(eachline(plkfile * ".bim")) + cmask[j] || continue + if mafreq[j] == 0 + pval = 1.0 + else + if snponly + copyto!(ts.Z, @view(genomat[rinds, j]), impute = true) + else # snp + other terms + copyto!(dfalt[:snp], @view(genomat[rinds, j]), impute = true) + ts.Z[:] = ModelMatrix(ModelFrame(testformula, dfalt)).m + end + pval = polrtest(ts) + end + snpj = split(row) + println(io, snpj[1], ",", snpj[4], ",", snpj[2], ",", mafreq[j], ",", pval) + end + end + elseif test == :LRT + nulldev = deviance(nm.model) + Xaug = [nm.model.X Z] + q = size(Z, 2) + γ̂ = Vector{Float64}(undef, q) # effect size for columns being tested + open(outfile * ".lrttest.txt", "w") do io + if snponly + println(io, "chr,pos,snpid,maf,effect,pval") + else + print(io, "chr,pos,snpid,maf,") + for j in 1:q + print(io, "effect", j, ",") + end + println(io, "pval") + end + for (j, row) in enumerate(eachline(plkfile * ".bim")) + cmask[j] || continue + if mafreq[j] == 0 + fill!(γ̂, 0) + pval = 1.0 + else + if snponly + copyto!(@view(Xaug[:, nm.model.p+1]), @view(genomat[rinds, j]), impute = true) + else # snp + other terms + copyto!(dfalt[:snp], @view(genomat[rinds, j]), impute = true) + Xaug[:, nm.model.p+1:end] = ModelMatrix(ModelFrame(testformula, dfalt)).m + end + altmodel = polr(Xaug, nm.model.Y, nm.model.link, solver, wts = nm.model.wts) + copyto!(γ̂, 1, altmodel.β, nm.model.p + 1, q) + pval = ccdf(Chisq(q), nulldev - deviance(altmodel)) + end + snpj = split(row) + if snponly + println(io, snpj[1], ",", snpj[4], ",", snpj[2], ",", mafreq[j], ",", γ̂[1], ",", pval) + else + print(io, snpj[1], ",", snpj[4], ",", snpj[2], ",", mafreq[j], ",") + for j in 1:q + print(io, γ̂[j], ",") + end + println(io, pval) + end + end + end + else + throw(ArgumentError("unrecognized test $test")) + end + nothing +end diff --git a/src/polrgwas_score.jl b/src/polrgwas_score.jl deleted file mode 100644 index 4ed9758..0000000 --- a/src/polrgwas_score.jl +++ /dev/null @@ -1,58 +0,0 @@ -""" - polrgwas() - -# Position arguments - -- `covarfile::AbstractString`: covariate file. One column should be the -ordered categorical phenotype coded as integers starting from 1. - -- `plinkfile::AbstractString`: Plink file names without extension. - -# Keyword arguments - -- `outptfile::AbstractString`: output file prefix. Two output files -`prefix.nullmodel.txt` and `prefix.scoretest.txt` will be written. - -- `covartype::Vector{DataType}`: type information for `covarfile`. This is useful -when `CSV.read(covarfile)` has parsing errors. - -- `test::Symbol`: default `:score`. - -- `link::GLM.Link`: `LogitLink()` (default), `ProbitLink()`, `CauchitLink()`, -or `CloglogLink()` -""" -function polrgwas( - # position arguments - formula, - covarfile::AbstractString=nothing, - plinkfile::AbstractString = nothing; - # keyword arguments - covartype::Vector{DataType} = nothing, - outptfile::AbstractString = "polrgwas", - test::Symbol = :score, - link::GLM.Link = LogitLink(), - verbose::Bool = true - ) - - # fit null model - covdf = CSV.read(covarfile; types=covartype) - nm = polr(formula, covdf, link) - verbose && show(nm) - nmout = open(outptfile * ".nullmodel.txt", "w") - show(nmout, nm) - close(nmout) - isempty(plinkfile) && (return nothing) - - # carry out score test - snpdata = SnpData(plinkfile) - genomat = snpdata.snpmatrix - ts = PolrScoreTest(nm.model, zeros(snpdata.people, 1)) - outfile = open(outptfile * ".scoretest.txt", "w") - println(outfile, "chr,pos,snpid,maf,pval") - for s in 1:snpdata.snps - copy!(ts.Z, genomat[:, s:s]; impute = true) - pval = polrtest(ts) - println(outfile, "$(snpdata.chromosome[s]),$(snpdata.basepairs[s]),$(snpdata.snpid[s]),$(snpdata.maf[s]),$pval") - end - close(outfile) -end diff --git a/test/runtests.jl b/test/runtests.jl index 9eab5ed..9dab62e 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -1,10 +1,66 @@ -using CSV, PolrGWAS, StatsModels +using PolrGWAS, Test, CSV const datadir = joinpath(dirname(@__FILE__), "..", "data") +const covfile = datadir * "/covariate.txt" +const plkfile = datadir * "/hapmap3" -@time polrgwas( - @formula(trait ~ 0 + sex), - datadir * "/covariate.txt", - datadir * "/hapmap3"; - covartype = [String, String, String, String, Float64, Int] - ) +@testset "score test" begin + @time polrgwas(@formula(trait ~ 0 + sex), covfile, plkfile, test = :score) + @test isfile("polrgwas.nullmodel.txt") + @test isfile("polrgwas.scoretest.txt") + scorepvals = CSV.read("polrgwas.scoretest.txt")[5][1:5] + @test all(scorepvals .≈ [1.0, 3.00123075e-3, 2.51172149e-5, 1.137311225e-5, 8.31735837e-3]) + rm("polrgwas.nullmodel.txt") + rm("polrgwas.scoretest.txt") +end + +@testset "LRT test" begin + @time polrgwas(@formula(trait ~ 0 + sex), covfile, plkfile, test = :LRT) + @test isfile("polrgwas.nullmodel.txt") + @test isfile("polrgwas.lrttest.txt") + lrtpvals = CSV.read("polrgwas.lrttest.txt")[6][1:5] + @test all(lrtpvals .≈ [1.0, 1.91858366e-3, 1.80505056e-5, 5.87338471e-6, 8.08102258e-3]) + rm("polrgwas.nullmodel.txt") + rm("polrgwas.lrttest.txt") +end + +@testset "link" begin + polrgwas(@formula(trait ~ 0 + sex), covfile, plkfile, link=ProbitLink(), outfile="opm") + @test isfile("opm.nullmodel.txt") + @test isfile("opm.scoretest.txt") + scorepvals = CSV.read("opm.scoretest.txt")[5][1:5] + @test all(scorepvals .≈ [1.0, 6.45042592e-3, 1.57850425e-5, 4.97907512e-6, 4.56657402e-3]) + rm("opm.nullmodel.txt") + rm("opm.scoretest.txt") +end + +@testset "mask" begin + polrgwas(@formula(trait ~ 0 + sex), covfile, plkfile, colinds=1:5, outfile="first5snps") + @test isfile("first5snps.nullmodel.txt") + @test isfile("first5snps.scoretest.txt") + @test countlines("first5snps.scoretest.txt") == 6 + scorepvals = CSV.read("first5snps.scoretest.txt")[5] + @test all(scorepvals .≈ [1.0, 3.00123075e-3, 2.51172149e-5, 1.137311225e-5, 8.31735837e-3]) + rm("first5snps.nullmodel.txt") + rm("first5snps.scoretest.txt") +end + +@testset "test formula" begin + # score test + polrgwas(@formula(trait ~ 0 + sex), covfile, plkfile, outfile="GxE", testformula=@formula(trait ~ 0 + snp + snp & sex)) + @test isfile("GxE.nullmodel.txt") + @test isfile("GxE.scoretest.txt") + scorepvals = CSV.read("GxE.scoretest.txt")[5][1:5] + @test all(scorepvals .≈ [1.0, 1.17883709e-2, 1.38956191e-4, 4.44624250e-5, 2.94881322e-2]) + rm("GxE.nullmodel.txt") + rm("GxE.scoretest.txt") + # LRT, only first 5 SNPs + polrgwas(@formula(trait ~ 0 + sex), covfile, plkfile, outfile="GxE", + testformula=@formula(trait ~ 0 + snp + snp & sex), test=:LRT, colinds=1:5) + @test isfile("GxE.nullmodel.txt") + @test isfile("GxE.lrttest.txt") + lrtpvals = CSV.read("GxE.lrttest.txt")[end] + @test all(lrtpvals .≈ [1.0, 7.22410973e-3, 1.01730983e-4, 1.88174211e-5, 2.88295705e-2]) + rm("GxE.nullmodel.txt") + rm("GxE.lrttest.txt") +end