Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
getian107 authored Aug 10, 2023
1 parent 0544181 commit 80ace44
Showing 1 changed file with 27 additions and 7 deletions.
34 changes: 27 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ An application of the "meta" and "auto" version of PRS-CSx is described in:
T Ge et al. Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. *Genome Medicine*, 14:70, 2022.


## Recent Version History
## Version History

**Aug 10, 2023**: Added BETA/OR + SE as a new input format (see the format of GWAS summary statistics below), which is now the recommended input data. When using BETA/OR + P as the input, p-values smaller than 1e-323 are truncated, which may reduce prediction accuracy.

**July 29, 2021**: Changed default MCMC parameters.

Expand Down Expand Up @@ -104,14 +106,32 @@ python PRScsx.py --ref_dir=PATH_TO_REFERENCE --bim_prefix=VALIDATION_BIM_PREFIX

- VALIDATION_BIM_PREFIX (required): Full path and the prefix of the bim file for the target (validation/testing) dataset. This file is used to provide a list of SNPs that are available in the target dataset.

- SUM_STATS_FILE (required): Full path and the file name of the GWAS summary statistics. Multiple GWAS summary statistics files are allowed and should be separated by comma. Summary statistics files must have the following format (including the header line and order of the columns):
- SUM_STATS_FILE (required): Full path and the file name of the GWAS summary statistics. Multiple GWAS summary statistics files are allowed and should be separated by comma. The summary statistics file must include either BETA/OR + SE or BETA/OR + P. When using BETA/OR + SE as the input, the file must have the following format (including the header line):

```
SNP A1 A2 BETA SE
rs4970383 C A -0.0064 0.0090
rs4475691 C T -0.0145 0.0094
rs13302982 A G -0.0232 0.0199
...
```
Or:
```
SNP A1 A2 OR SE
rs4970383 A C 0.9825 0.0314
rs4475691 T C 0.9436 0.0319
rs13302982 A G 1.1337 0.0543
...
```
where SNP is the rs ID, A1 is the effect allele, A2 is the alternative allele, BETA/OR is the effect/odds ratio of the A1 allele, SE is the standard error of the effect. Note that when OR is used, SE corresponds to the standard error of logOR.

When using BETA/OR + P as the input, the file must have the following format (including the header line):

```
SNP A1 A2 BETA P
rs4970383 C A -0.0064 4.7780e-01
rs4475691 C T -0.0145 1.2450e-01
rs13302982 A G -0.0232 2.4290e-01
rs4970383 C A -0.0064 0.4778
rs4475691 C T -0.0145 0.1245
rs13302982 A G -0.0232 0.2429
...
```
Or:
Expand All @@ -122,7 +142,7 @@ Or:
rs13302982 A G 1.1337 0.0209
...
```
where SNP is the rs ID, A1 is the effect allele, A2 is the alternative allele, BETA/OR is the effect/odds ratio of the A1 allele, P is the p-value of the effect. In fact, BETA/OR is only used to determine the direction of an association. Therefore if z-scores or even +1/-1 indicating effect directions are presented in the BETA column, the algorithm should still work properly.
where SNP is the rs ID, A1 is the effect allele, A2 is the alternative allele, BETA/OR is the effect/odds ratio of the A1 allele, P is the p-value of the effect. Here, a standardized effect size is calculated using the p-value while BETA/OR is only used to determine the direction of an association. Therefore if z-scores or even +1/-1 indicating effect directions are presented in the BETA column, the algorithm should still work properly.

- GWAS_SAMPLE_SIZE (required): Sample sizes of the GWAS, in the same order of the GWAS summary statistics files, separated by comma.

Expand Down Expand Up @@ -155,7 +175,7 @@ where SNP is the rs ID, A1 is the effect allele, A2 is the alternative allele, B

For each input GWAS, PRS-CSx writes posterior SNP effect size estimates for each chromosome to the user-specified directory. The output file contains chromosome, rs ID, base position, A1, A2 and posterior effect size estimate for each SNP. If `--meta=True`, meta-analyzed posterior effect sizes will also be written to the output directory. An individual-level polygenic score can be produced by concatenating output files from all chromosomes and then using `PLINK`'s `--score` command (https://www.cog-genomics.org/plink/1.9/score). If polygenic scores are generated by chromosome, use the 'sum' modifier so that they can be combined into a genome-wide score.

We recommend calculating one polygenic score for each discovery population using population-specific posterior SNP effect size estimates and learn a linear combination of the polygenic scores that most accurately predicts the trait in the validation dataset. The predictive performance of the method can be assessed in an independent dataset, using the optimal global shrinkage parameter phi (if grid search is used) and weights for the linear combination learnt in the validation dataset. Separate evaluation of each population-specific polygenic score is NOT the intended use of PRS-CSx. We recommend standardizing the polygenic scores (i.e., converting the scores to zero mean and unit variance) in both validation and testing datasets before learning/applying the linear combination.
We recommend calculating one polygenic score for each discovery population using population-specific posterior SNP effect size estimates and learn a linear combination of the polygenic scores that most accurately predicts the trait in the validation dataset. The predictive performance of the method can be assessed in an independent dataset, using the optimal global shrinkage parameter phi (if grid search is used) and weights for the linear combination learnt in the validation dataset. Separate evaluation of each population-specific polygenic score is usually NOT the intended use of PRS-CSx. We recommend standardizing the polygenic scores (i.e., converting the scores to zero mean and unit variance) in both validation and testing datasets before learning/applying the linear combination.


## Computational Efficiency
Expand Down

0 comments on commit 80ace44

Please sign in to comment.