Skip to content

Commit

Permalink
Merge pull request #22 from wvictor14/devel
Browse files Browse the repository at this point in the history
add stop controls and fix check errors
  • Loading branch information
wvictor14 authored Sep 8, 2023
2 parents e16f778 + 7cc6828 commit 0e6603d
Show file tree
Hide file tree
Showing 13 changed files with 148 additions and 182 deletions.
2 changes: 1 addition & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@
^Meta$
^.*\.pptx$
^.*.Rmd.orig$
^precompile.R$
^precompile.R$
48 changes: 0 additions & 48 deletions .github/workflows/pkgdown.yaml

This file was deleted.

5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,8 @@ doc
Meta
*hexsticker.pptx
pkgdown/
*.rds
.vscode/settings.json
data-raw/mod.rds
data-raw/x_test.rds
data-raw/x_train.rds
data-raw/y_train.rds
12 changes: 5 additions & 7 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: planet
Title: Placental DNA methylation analysis tools
Version: 1.7.0
Version: 1.9.3
Authors@R:
c(person("Victor", "Yuan", email = "[email protected]",
role = c("aut", "cre")),
Expand All @@ -10,20 +10,18 @@ URL:
https://victor.rbind.io/planet, http://github.com/wvictor14/planet
BugReports:
http://github.com/wvictor14/planet/issues
Description: This package contains R functions to infer additional biological
variables to supplemental DNA methylation analysis of placental data. This
Description: This package contains R functions to predict biological variables
to from placnetal DNA methylation data generated from infinium arrays. This
includes inferring ethnicity/ancestry, gestational age, and cell composition
from placental DNA methylation array (450k/850k) data. The package comes
with an example processed placental dataset.
from placental DNA methylation array (450k/850k) data.
Depends:
R (>= 4.0)
R (>= 4.3)
Imports:
methods,
tibble,
magrittr,
dplyr
Suggests:
badger,
ggplot2,
testthat,
tidyr,
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# planet 1.9.2

# planet 1.9.1

# planet 0.99.4 (03-10-2021)

* Combined vignettes into a single one
Expand Down
192 changes: 102 additions & 90 deletions R/predictEthnicity.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#' contain all 1860 predictors and be normalized with NOOB and BMIQ.
#' @param threshold A probability threshold ranging from (0, 1) to call samples
#' 'ambiguous'. Defaults to 0.75.
#' @param force run even if missing predictors. Default is `FALSE`.
#'
#' @return a [tibble][tibble::tibble-package]
#' @examples
Expand All @@ -32,65 +33,76 @@
#' @export pl_infer_ethnicity
#' @aliases pl_infer_ethnicity

predictEthnicity <- function(betas, threshold = 0.75) {
data(ethnicityCpGs, envir=environment())
pf <- intersect(rownames(betas), ethnicityCpGs)
if (length(pf) < length(ethnicityCpGs)) {
warning(paste(
"Only", length(pf), "out of",
length(ethnicityCpGs), "present."
))
} else {
message(paste(length(pf), "of 1860 predictors present."))
}

# subset down to 1860 final features
betas <- t(betas[pf, ])

# This code is modified from glmnet v3.0.2, GPL-2 license
# modifications include reducing the number of features from the original
# training set, to only where coefficients != 0 (1860 features)
# These modifications were made to significantly reduce memory size of the
# internal object `nbeta`
# see https://glmnet.stanford.edu/ for original glmnet package
predictEthnicity <- function(betas, threshold = 0.75, force = FALSE) {
data(ethnicityCpGs, envir=environment())
pf <- intersect(ethnicityCpGs, rownames(betas))

if (!force) {

npred <- nrow(betas) # number of samples
dn <- list(names(nbeta), "1", dimnames(betas)[[1]])
dp <- array(0, c(nclass, nlambda, npred), dimnames = dn) # set up results
if (any(is.na(betas[pf,]))) {
stop(
paste(
"NAs present in predictor data."
)
)
}

# cross product with coeeficients
for (i in seq(nclass)) {
fitk <- methods::cbind2(1, betas) %*%
matrix(nbeta[[i]][c("(Intercept)", colnames(betas)), ])
dp[i, , ] <- dp[i, , ] + t(as.matrix(fitk))
if (!all(ethnicityCpGs %in% rownames(betas))) {
stop(paste(
"Missing", length(setdiff(ethnicityCpGs, rownames(betas))),
"out of", length(ethnicityCpGs), "predictors."
))
}

# probabilities
pp <- exp(dp)
psum <- apply(pp, c(2, 3), sum)
probs <- data.frame(aperm(
pp / rep(psum, rep(nclass, nlambda * npred)),
c(3, 1, 2)
))
colnames(probs) <- paste0("Prob_", dn[[1]])

# classification
link <- aperm(dp, c(3, 1, 2))
dpp <- aperm(dp, c(3, 1, 2))
preds <- data.frame(apply(dpp, 3, glmnet_softmax))
colnames(preds) <- "Predicted_ethnicity_nothresh"

# combine and apply thresholding
p <- cbind(preds, probs)
p$Highest_Prob <- apply(p[, 2:4], 1, max)
p$Predicted_ethnicity <- ifelse(
p$Highest_Prob < threshold, "Ambiguous",
as.character(p$Predicted_ethnicity_nothresh)
)
p$Sample_ID <- rownames(p)
p <- p[, c(7, 1, 6, 2:5)]

return(tibble::as_tibble(p))
}
message(paste(length(pf), "of 1860 predictors present."))

# subset down to 1860 final features
betas <- t(betas[pf, ])

# This code is modified from glmnet v3.0.2, GPL-2 license
# modifications include reducing the number of features from the original
# training set, to only where coefficients != 0 (1860 features)
# These modifications were made to significantly reduce memory size of the
# internal object `nbeta`
# see https://glmnet.stanford.edu/ for original glmnet package

npred <- nrow(betas) # number of samples
dn <- list(names(nbeta), "1", dimnames(betas)[[1]])
dp <- array(0, c(nclass, nlambda, npred), dimnames = dn) # set up results

# cross product with coeeficients
for (i in seq(nclass)) {
fitk <- methods::cbind2(1, betas) %*%
matrix(nbeta[[i]][c("(Intercept)", colnames(betas)), ])
dp[i, , ] <- dp[i, , ] + t(as.matrix(fitk))
}

# probabilities
pp <- exp(dp)
psum <- apply(pp, c(2, 3), sum)
probs <- data.frame(aperm(
pp / rep(psum, rep(nclass, nlambda * npred)),
c(3, 1, 2)
))
colnames(probs) <- paste0("Prob_", dn[[1]])

# classification
link <- aperm(dp, c(3, 1, 2))
dpp <- aperm(dp, c(3, 1, 2))
preds <- data.frame(apply(dpp, 3, glmnet_softmax))
colnames(preds) <- "Predicted_ethnicity_nothresh"

# combine and apply thresholding
p <- cbind(preds, probs)
p$Highest_Prob <- apply(p[, 2:4], 1, max)
p$Predicted_ethnicity <- ifelse(
p$Highest_Prob < threshold, "Ambiguous",
as.character(p$Predicted_ethnicity_nothresh)
)
p$Sample_ID <- rownames(p)
p <- p[, c(7, 1, 6, 2:5)]

return(tibble::as_tibble(p))
}

# This code is copied directly from glmnet v3.0.2, GPL-2 license
Expand All @@ -100,43 +112,43 @@ predictEthnicity <- function(betas, threshold = 0.75) {
# Balasubramanian Narasimhan [aut], Kenneth Tay [aut], Noah Simon [aut],
# Junyang Qian [ctb]
glmnet_softmax <- function(x, ignore_labels = FALSE) {
d <- dim(x)
dd <- dimnames(x)[[2]]
if (is.null(dd) || !length(dd)) {
ignore_labels <- TRUE
d <- dim(x)
dd <- dimnames(x)[[2]]
if (is.null(dd) || !length(dd)) {
ignore_labels <- TRUE
}

nas <- apply(is.na(x), 1, any)
if (any(nas)) {
pclass <- rep(NA, d[1])
if (sum(nas) < d[1]) {
pclass2 <- glmnet_softmax(x[!nas, ], ignore_labels)
pclass[!nas] <- pclass2
if (is.factor(pclass2)) {
pclass <- factor(
pclass,
levels = seq(d[2]),
labels = levels(pclass2)
)
}
}

nas <- apply(is.na(x), 1, any)
if (any(nas)) {
pclass <- rep(NA, d[1])
if (sum(nas) < d[1]) {
pclass2 <- glmnet_softmax(x[!nas, ], ignore_labels)
pclass[!nas] <- pclass2
if (is.factor(pclass2)) {
pclass <- factor(
pclass,
levels = seq(d[2]),
labels = levels(pclass2)
)
}
}
} else {
maxdist <- x[, 1]
pclass <- rep(1, d[1])
for (i in seq(2, d[2])) {
l <- x[, i] > maxdist
pclass[l] <- i
maxdist[l] <- x[l, i]
}
dd <- dimnames(x)[[2]]
if (!ignore_labels) {
pclass <- factor(pclass, levels = seq(d[2]), labels = dd)
}
} else {
maxdist <- x[, 1]
pclass <- rep(1, d[1])
for (i in seq(2, d[2])) {
l <- x[, i] > maxdist
pclass[l] <- i
maxdist[l] <- x[l, i]
}
dd <- dimnames(x)[[2]]
if (!ignore_labels) {
pclass <- factor(pclass, levels = seq(d[2]), labels = dd)
}
pclass
}
pclass
}

pl_infer_ethnicity <- function(betas, threshold = 0.75) {
.Deprecated("predictEthnicity")
predictEthnicity(betas = betas, threshold = threshold)
.Deprecated("predictEthnicity")
predictEthnicity(betas = betas, threshold = threshold)
}
Binary file modified R/sysdata.rda
Binary file not shown.
3 changes: 2 additions & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ knitr::opts_chunk$set(
# planet <img src="man/figures/logo.png" align="right" height = "139" />

<!-- badges: start -->
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4321633.svg)](https://doi.org/10.5281/zenodo.4321633) `r badger::badge_last_commit("wvictor14/planet")`
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4321633.svg)](https://doi.org/10.5281/zenodo.4321633) `r badger::badge_last_commit("GuangchuangYu/badger")`
[![R build status](https://github.com/wvictor14/planet/workflows/R-CMD-check/badge.svg)](https://github.com/wvictor14/planet/actions)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://www.tidyverse.org/lifecycle/#stable)
[![R-CMD-check](https://github.com/wvictor14/planet/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/wvictor14/planet/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
Expand Down
27 changes: 13 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@
<!-- badges: start -->

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4321633.svg)](https://doi.org/10.5281/zenodo.4321633)
[![](https://img.shields.io/github/last-commit/wvictor14/planet.svg)](https://github.com/wvictor14/planet/commits/master)
[![](https://img.shields.io/github/last-commit/GuangchuangYu/badger.svg)](https://github.com/GuangchuangYu/badger/commits/master)
[![R build
status](https://github.com/wvictor14/planet/workflows/R-CMD-check/badge.svg)](https://github.com/wvictor14/planet/actions)
[![Lifecycle:
stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://www.tidyverse.org/lifecycle/#stable)
[![R-CMD-check](https://github.com/wvictor14/planet/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/wvictor14/planet/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

`planet` is an R package for inferring **ethnicity** (1), **gestational
Expand Down Expand Up @@ -55,18 +56,16 @@ data(plPhenoData) # sample information
predictEthnicity(plBetas) %>%
head()
#> 1860 of 1860 predictors present.
#> # A tibble: 6 × 7
#> Sample_ID Predicted_ethnicity_nothr…¹ Predi…² Prob_…³ Prob_…⁴ Prob_…⁵ Highe…⁶
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 GSM1944936 Caucasian Caucas… 3.31e-3 1.64e-2 0.980 0.980
#> 2 GSM1944939 Caucasian Caucas… 7.72e-4 5.14e-4 0.999 0.999
#> 3 GSM1944942 Caucasian Caucas… 8.06e-4 6.99e-4 0.998 0.998
#> 4 GSM1944944 Caucasian Caucas… 8.83e-4 7.92e-4 0.998 0.998
#> 5 GSM1944946 Caucasian Caucas… 8.85e-4 1.30e-3 0.998 0.998
#> 6 GSM1944948 Caucasian Caucas… 8.52e-4 9.73e-4 0.998 0.998
#> # … with abbreviated variable names ¹​Predicted_ethnicity_nothresh,
#> # ²​Predicted_ethnicity, ³​Prob_African, ⁴​Prob_Asian, ⁵​Prob_Caucasian,
#> # ⁶​Highest_Prob
#> # A tibble: 6 x 7
#> Sample_ID Predicted_ethni~ Predicted_ethni~ Prob_African Prob_Asian
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 GSM19449~ Caucasian Caucasian 0.00331 0.0164
#> 2 GSM19449~ Caucasian Caucasian 0.000772 0.000514
#> 3 GSM19449~ Caucasian Caucasian 0.000806 0.000699
#> 4 GSM19449~ Caucasian Caucasian 0.000883 0.000792
#> 5 GSM19449~ Caucasian Caucasian 0.000885 0.00130
#> 6 GSM19449~ Caucasian Caucasian 0.000852 0.000973
#> # ... with 2 more variables: Prob_Caucasian <dbl>, Highest_Prob <dbl>
```

#### Predict Gestational Age
Expand Down
8 changes: 0 additions & 8 deletions data-raw/internal.R

This file was deleted.

2 changes: 1 addition & 1 deletion man/planet-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 0e6603d

Please sign in to comment.