Skip to content

Commit

Permalink
Added reference to example reports in technical documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Ollie committed Aug 28, 2024
1 parent 627ccc4 commit 4c8570e
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 16 deletions.
8 changes: 4 additions & 4 deletions docs/pipeline_technical.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ MegaPRS uses a range of priors (lasso, ridge, bolt, BayesR) for SNP effects, run

#### PRS-CS

PRS-CS, a Bayesian method using a continuous shrinkage prior, specifies a range of global shrinkage parameters (phi), generating multiple sets of genetic effects for polygenic scoring. Its 'auto' model estimates the optimal parameter directly from GWAS summary statistics, negating the need for an external dataset. In GenoPred, PRS-CS is run using the script [pgs_methods/prscs.R](https://github.com/opain/GenoPred/blob/master/Scripts/pgs_methods/prscs.R). GenoPred specifies four phi parameters (1e-6, 1e-4, 1e-2, 1) and the auto model. By default, GenoPred uses the PRS-CS provided 1KG-derived LD matrix data, matching the population of the GWAS sample. The user can select the UKB-derived LD matrix data to be used using the `prscs_ldref` parameter in the `configfile`. 1KG is used by default as PGS based on Yengo et al. sumstats performed significantly better in the OpenSNP target sample, when using the 1KG reference data (this may differ for other GWAS).
PRS-CS, a Bayesian method using a continuous shrinkage prior, specifies a range of global shrinkage parameters (phi), generating multiple sets of genetic effects for polygenic scoring. Its 'auto' model estimates the optimal parameter directly from GWAS summary statistics, negating the need for an external dataset. In GenoPred, PRS-CS is run using the script [pgs_methods/prscs.R](https://github.com/opain/GenoPred/blob/master/Scripts/pgs_methods/prscs.R). By default, GenoPred specifies four phi parameters (1e-6, 1e-4, 1e-2, 1) and the auto model, but the user can modify this behaviour using the prscs_phi parameter in the configfile. By default, GenoPred uses the PRS-CS provided 1KG-derived LD matrix data, matching the population of the GWAS sample. The user can select the UKB-derived LD matrix data to be used using the `prscs_ldref` parameter in the `configfile`. 1KG is used by default as PGS based on Yengo et al. sumstats performed significantly better in the OpenSNP target sample, when using the 1KG reference data (this may differ for other GWAS).

***

Expand Down Expand Up @@ -229,7 +229,7 @@ Target genotype QC is performed using the [format_target.R](https://github.com/o

## Ancestry Inference

Target samples then undergo ancestry inference, using the [Ancestry_identifier.R](https://github.com/opain/GenoPred/blob/master/Scripts/Ancestry_identifier/Ancestry_identifier.R) script, estimating the probability that each target individual matches each reference population (AFR = African, AMR = Admixed American, EAS = East Asian, EUR = European, CSA = Central and South Asian, MID = Middle Eastern). Population membership was predicted using a reference trained elastic net model consisting of the first six reference-projected genetic principal components. Principal components were defined in the reference dataset using variants present in the target dataset with a minor allele frequency >0.05, missingness <0.02 and Hardy-Weinberg p-value >1×10-6 (if target sample size <100, then only missingness threshold is applied in the target). LD pruning for independent variants is then performed in PLINK after removal of long-range LD regions (ref), using a window size of 1000, step size of 5, and r2 threshold of 0.2. The A multinomial elastic net model predicting super population membership in the reference is derived in using the glmnet R package, with model performance assessed using 5-fold cross validation. The reference-derived principal components are then projected into the target dataset, and the reference-derived elastic net model is used to predict population membership in target. By default, target individuals are assigned to a population if the predicted probability was >0.95, but the user can modify this threshold using the `ancestry_prob_thresh` parameter in the config file.
Target samples then undergo ancestry inference, using the [Ancestry_identifier.R](https://github.com/opain/GenoPred/blob/master/Scripts/Ancestry_identifier/Ancestry_identifier.R) script, estimating the probability that each target individual matches each reference population (AFR = African, AMR = Admixed American, EAS = East Asian, EUR = European, CSA = Central and South Asian, MID = Middle Eastern). Population membership was predicted using a reference trained elastic net model consisting of the first six reference-projected genetic principal components. Principal components were defined in the reference dataset using variants present in the target dataset with a minor allele frequency >0.05, missingness <0.02 and Hardy-Weinberg p-value >1×10-6 (if target sample size <100, then only missingness threshold is applied in the target). LD pruning for independent variants is then performed in PLINK after removal of long-range LD regions (ref), using a window size of 1000, step size of 5, and r2 threshold of 0.2. The A multinomial elastic net model predicting super population membership in the reference is derived in using the glmnet R package, with model performance assessed using 5-fold cross validation. The reference-derived principal components are then projected into the target dataset, and the reference-derived elastic net model is used to predict population membership in target. By default, target individuals are assigned to a population if the predicted probability was >0.95, but the user can modify this threshold using the ancestry_prob_thresh parameter in the config file. If an individual does not have a predicted probability greater than the ancestry_prob_thresh parameter, then they will be excluded from downstream polygenic scoring. If the ancestry_prob_thresh parameter is low, then an individual may be assigned to multiple reference populations, and they will have polygenic scores that have been standardised according to each assigned reference population. In this case, the individual-level report created by GenoPred will present polygenic scores standardised according to the reference population with the highest predicted probability.

***

Expand All @@ -253,13 +253,13 @@ This step calculates scores in the target sample, based on scoring files from th

### Individual-level

This step creates an .html report summarising the pipeline outputs for each individual in the target sample. It simply reads in pipeline outputs, and then tabulates and plots them. The only analysis it performs is the conversion of polygenic scores onto the absolute scale. It uses a [previously published method](https://pubmed.ncbi.nlm.nih.gov/34983942/). The estimate of the PGS R2 come from the lassosum pseudovalidation analysis, and the distribution in the general population is provided by the user in the prev, mean and sd columns of the gwas_list. Note: It does not convert PGS from externally derived polygenic scores onto the absolutes scale.
This step creates an .html report summarising the pipeline outputs for each individual in the target sample. It simply reads in pipeline outputs, and then tabulates and plots them. The only analysis it performs is the conversion of polygenic scores onto the absolute scale. It uses a [previously published method](https://pubmed.ncbi.nlm.nih.gov/34983942/). The estimate of the PGS R2 come from the lassosum pseudovalidation analysis, and the distribution in the general population is provided by the user in the prev, mean and sd columns of the gwas_list. Note: It does not convert PGS from externally derived polygenic scores onto the absolutes scale. An example of the individual-level report derived using the test data can be found <a href="example_plink1-1_EUR.1_EUR-report.html" target="_blank">here</a>.

***

### Sample-level

This step creates an .html report summarising the pipeline outputs for each target sample. It simply reads in pipeline outputs, and then tabulates and plots them.
This step creates an .html report summarising the pipeline outputs for each target sample. It simply reads in pipeline outputs, and then tabulates and plots them. An example of the sample-level report derived using the test data can be found <a href="example_plink1-report.html" target="_blank">here</a>.

***

Expand Down
38 changes: 26 additions & 12 deletions docs/pipeline_technical.html
Original file line number Diff line number Diff line change
Expand Up @@ -936,14 +936,16 @@ <h4>PRS-CS</h4>
negating the need for an external dataset. In GenoPred, PRS-CS is run
using the script <a
href="https://github.com/opain/GenoPred/blob/master/Scripts/pgs_methods/prscs.R">pgs_methods/prscs.R</a>.
GenoPred specifies four phi parameters (1e-6, 1e-4, 1e-2, 1) and the
auto model. By default, GenoPred uses the PRS-CS provided 1KG-derived LD
matrix data, matching the population of the GWAS sample. The user can
select the UKB-derived LD matrix data to be used using the
<code>prscs_ldref</code> parameter in the <code>configfile</code>. 1KG
is used by default as PGS based on Yengo et al. sumstats performed
significantly better in the OpenSNP target sample, when using the 1KG
reference data (this may differ for other GWAS).</p>
By default, GenoPred specifies four phi parameters (1e-6, 1e-4, 1e-2, 1)
and the auto model, but the user can modify this behaviour using the
prscs_phi parameter in the configfile. By default, GenoPred uses the
PRS-CS provided 1KG-derived LD matrix data, matching the population of
the GWAS sample. The user can select the UKB-derived LD matrix data to
be used using the <code>prscs_ldref</code> parameter in the
<code>configfile</code>. 1KG is used by default as PGS based on Yengo et
al. sumstats performed significantly better in the OpenSNP target
sample, when using the 1KG reference data (this may differ for other
GWAS).</p>
<hr />
</div>
<div id="ptclump" class="section level4">
Expand Down Expand Up @@ -1081,8 +1083,16 @@ <h2>Ancestry Inference</h2>
target dataset, and the reference-derived elastic net model is used to
predict population membership in target. By default, target individuals
are assigned to a population if the predicted probability was &gt;0.95,
but the user can modify this threshold using the
<code>ancestry_prob_thresh</code> parameter in the config file.</p>
but the user can modify this threshold using the ancestry_prob_thresh
parameter in the config file. If an individual does not have a predicted
probability greater than the ancestry_prob_thresh parameter, then they
will be excluded from downstream polygenic scoring. If the
ancestry_prob_thresh parameter is low, then an individual may be
assigned to multiple reference populations, and they will have polygenic
scores that have been standardised according to each assigned reference
population. In this case, the individual-level report created by
GenoPred will present polygenic scores standardised according to the
reference population with the highest predicted probability.</p>
<hr />
</div>
<div id="within-target-qc" class="section level2">
Expand Down Expand Up @@ -1144,14 +1154,18 @@ <h3>Individual-level</h3>
pseudovalidation analysis, and the distribution in the general
population is provided by the user in the prev, mean and sd columns of
the gwas_list. Note: It does not convert PGS from externally derived
polygenic scores onto the absolutes scale.</p>
polygenic scores onto the absolutes scale. An example of the
individual-level report derived using the test data can be found
<a href="example_plink1-1_EUR.1_EUR-report.html" target="_blank">here</a>.</p>
<hr />
</div>
<div id="sample-level" class="section level3">
<h3>Sample-level</h3>
<p>This step creates an .html report summarising the pipeline outputs
for each target sample. It simply reads in pipeline outputs, and then
tabulates and plots them.</p>
tabulates and plots them. An example of the sample-level report derived
using the test data can be found
<a href="example_plink1-report.html" target="_blank">here</a>.</p>
<hr />
</div>
</div>
Expand Down

0 comments on commit 4c8570e

Please sign in to comment.