diff --git a/README.md b/README.md
index 783df69..a08a24a 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,89 @@
# stability-analyses
-code for simulation studies and experimental microbiome data applications in Stability manuscript
+This set of codes are used for reproducing all the simulation studies and experimental microbiome data applications in Stability manuscript.
+
+I. General code:
+
+code_method folder: contain codes to reproduce simulation results for continuous outcomes
+
+ getStability.R: function to calculate Stability Index
+
+ cv_method.R: code for 4 selected feature selection methods with user-defined parameter grids and cross-validations for parameter tuning when applied to continuous outcomes
+
+ cv_method_binary_update.R: code for 4 selected feature selection methods with user-defined parameter grids and cross-validations for parameter tuning when applied to binary outcomes
+
+ stab_data_applications.R: function to perform hypothesis testing using bootstrap for continuous outcomes
+
+ stab_data_applications_binary.R: function to perform hypothesis testing using bootstrap for binary outcomes
+
+ bootstrap_test_compLasso_rf.R: general functions for comparing feature selection methods using hypothesis testing based on bootstrap when applied to continuous outcomes
+
+ bootstrap_test_compLasso_rf_binary.R: general functions for comparing feature selection methods using hypothesis testing based on bootstrap when applied to binary outcomes
+
+ source code for compositional lasso (continuous outcome) is available at: https://www.math.pku.edu.cn/teachers/linw/software.html
+ source code for compositional lasso (binary outcome) is available at: https://github.com/UVic-omics/Microbiome-Variable-Selection
+
+
+II. Simulation part (within simulations folder):
+
+sim_data_generation folder: contain codes to generate simulated data
+
+ sim_dat_ind_toeplitz: code to generate simulated data with Independent and Toeplitz correlation designs
+ sim_dat_block.R: code to generate simulated data with Block correlation design
+ run_sim_data.sh: bash commands for running simulation data generation code on HPC
+
+code_sim_cts folder: contain codes to reproduce simulation results for continuous outcomes
+
+ cv_sim_apply.R: general functions for applying selected feature selection methods to simulated data when applied to continuous outcomes
+
+ 1. compute Stability and MSE for different simulation scenarios
+ ind_results.R: code for comparing 3 methods (lasso, elastic net, random forests) in simulated data with Independent design and continuous outcomes
+ toe_results.R: code for comparing 3 methods (lasso, elastic net, random forests) in simulated data with Toeplitz design and continuous outcomes
+ block_results.R: code for comparing 3 methods (lasso, elastic net, random forests) in simulated data with Block design and continuous outcomes
+ CL_sim_apply.R: code for obtaining results for compositional lasso in all simulation correlation designs with continuous outcomes
+
+ 2. hypothesis testing with bootstrap for selected simulation scenarios
+ boot_CL_testing.R: code for calculating bootstrapped confidence interval for compositional lasso method in simulated data with continous outcomes
+ boot_RF_testing.R: code for calculating bootstrapped confidence interval for random forests method in simulated data with continous outcomes
+
+ 3. bash commands
+ run_sim_cts.sh: bash commands for running simulation code for continous outcomes on HPC
+
+
+code_sim_bin folder: contain codes to reproduce simulation results for binary outcomes
+
+ cv_sim_apply_binary_update.R: general functions for applying selected feature selection methods to simulated data when applied to binary outcomes
+
+ 1. compute Stability and AUC for different simulation scenarios
+ ind_results_binary_update.R: code for comparing all 4 methods in simulated data with Independent design and binary outcomes
+ toe_results_binary_update.R: code for comparing all 4 methods in simulated data with Toeplitz design and binary outcomes
+ block_results_binary_update.R: code for comparing all 4 methods in simulated data with Block design and binary outcomes
+
+ 2. hypothesis testing with bootstrap for selected simulation scenarios
+ boot_sim_binary.R: code for calculating bootstrapped confidence interval for compositional lasso and random forests methods in simulated data with binary outcomes
+
+ 3. bash commands
+ run_sim_bin.sh: bash commands for running simulation code for binary outcomes on HPC
+
+notebooks_sim_cts folder: contain notebooks (R) to summarize simulation results for continuous outcome
+
+notebooks_sim_bin folder: contain notebooks (R) to summarize simulation results for binary outcome
+
+results_summary_cts folder: contain outputs of tables from notebooks in notebooks_sim_cts folder
+
+results_summary_bin folder: contain outputs of tables from notebooks in notebooks_sim_bin folder
+
+figures_combined folder: contain figures generated for both continous and binary outcomes based on notebook 6_make_figures_combined in notebooks_sim_bin folder
+
+
+III. application part (within data_application folder):
+
+ code_cts folder: contain code for real data applications to BMI & soil datasets for continuous outcomes
+
+ code_bin folder: contain code for real data applications to BMI & soil datasets for binary outcomes
+
+ notebooks_applications folder: contain notebooks (R) to summarize microbiome application results for continuous and binary outcomes
+
+ 88soils folder: contain data and application results for soil datast
+
+ BMI folder: contain data and application results for BMI datast
+
diff --git a/code_method/.DS_Store b/code_method/.DS_Store
deleted file mode 100755
index 495cac6..0000000
Binary files a/code_method/.DS_Store and /dev/null differ
diff --git a/code_method/bootstrap_test_compLasso_rf.R b/code_method/bootstrap_test_compLasso_rf.R
new file mode 100755
index 0000000..2ae9005
--- /dev/null
+++ b/code_method/bootstrap_test_compLasso_rf.R
@@ -0,0 +1,126 @@
+##########################################################
+### hypothesis testing with bootstraps ###################
+##########################################################
+
+source('cv_method.R')
+source('getStability.R')
+
+## set up parallel computing
+library(foreach)
+library(doParallel)
+numCores <- detectCores() - 2
+registerDoParallel(numCores) # use multicore, set to the number of our cores
+
+boot_stab_sim = function(num_boot=100, sim_file, method, seednum=31, ratio.training=0.8, fold.cv=10,
+ family='gaussian', lambda.grid=exp(seq(-4, -2, 0.2)), alpha.grid=seq(0.1, 0.9, 0.1),
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05){
+
+ # load simulated data
+ load(sim_file, dat <- new.env())
+
+ idx.start = 1; idx.stop = 100 # 100 repetitions for each simulated scenario
+ rou = dat$sim_array[[1]]$rou # rou, n, p are same across all repetitions
+ n = dat$sim_array[[1]]$n
+ p = dat$sim_array[[1]]$p
+
+ ## get a vector of stability index from bootstrapped data
+ stab_index = rep(0, num_boot)
+ b = 0
+ for (i in idx.start:idx.stop){
+ b = b + 1
+ print(paste('index', i, sep=':'))
+ sub = dat$sim_array[[i]]
+
+ # bootsrap with parallelization
+ selections = foreach (i=1:num_boot) %dopar% {
+ N = length(sub$Y) # number of samples
+ boot_ids = sample(N, size=N, replace=TRUE)
+ boot_Z = sub$Z[boot_ids, ]
+ boot_Y = sub$Y[boot_ids, ]
+
+ ## select features from lasso/elnet
+ if (method == 'compLasso'){
+ result.lin = cons_lasso_cv(y=boot_Y, datx=boot_Z, seednum=i, ratio.training=ratio.training)
+ select.lin = result.lin$coef.chosen # since 1 represents intercept
+
+ } else if (method == 'RF'){
+ result.rf = randomForest_cv(y=boot_Y, datx=boot_Z, seednum=i, fold.cv=fold.cv,
+ num_trees=num_trees, mtry.grid = mtry.grid, pval_thr=pval_thr)
+ select.rf = result.rf$coef.chosen
+ }
+ }
+
+ # calculate stability index from bootstrapped data
+ stability_table = matrix(rep(0, num_boot * p), ncol=p)
+ for (j in 1:num_boot){
+ stability_table[j, selections[[j]]] = 1
+ }
+
+ stab_index[b] = round(getStability(stability_table)$stability, 2)
+ }
+
+ results=list(rou=rou, n=n, p=p, num_boot=num_boot, method=method, stab_index=stab_index)
+
+}
+
+
+########################################################
+## double bootstrap applied to real data application ###
+########################################################
+boot_stab_data = function(num_boot=100, data_file, method, seednum=31, ratio.training=0.8, fold.cv=10,
+ family='gaussian', lambda.grid=exp(seq(-4, -2, 0.2)), alpha.grid=seq(0.1, 0.9, 0.1),
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05){
+
+ # load simulated data
+ set.seed(seednum)
+ load(data_file)
+ p = dim(taxa)[2]
+
+ stab_index = rep(0, num_boot)
+ MSE_list = list()
+ # first loop of bootstrap to generate num_boot bootstrapped datasets
+ for (k in 1:num_boot){
+ print(paste('num_boot', k, sep=':'))
+ N = length(y) # number of samples
+ sample_ids = seq(1, N, 1)
+
+ # bootstrapped samples
+ boot_ids = sample(sample_ids, size=N, replace=TRUE)
+ boot_taxa = taxa[boot_ids, ]
+ boot_mf = y[boot_ids]
+
+ ## second loop of bootstrap to perform variable selection
+ results = foreach (i=1:num_boot) %dopar% {
+ boot_ids_second = sample(N, size=N, replace=TRUE)
+ boot_Z = boot_taxa[boot_ids_second, ]
+ boot_Y = boot_mf[boot_ids_second]
+
+ ## select features from lasso/elnet
+ if (method == 'compLasso'){
+ result.lin = cons_lasso_cv(y=boot_Y, datx=boot_Z, seednum=i, ratio.training=ratio.training)
+ output.lin = c(result.lin$MSE, result.lin$coef.chosen)
+
+ } else if (method == 'RF'){
+ result.rf = randomForest_cv(y=boot_Y, datx=boot_Z, seednum=i, fold.cv=fold.cv,
+ num_trees=num_trees, mtry.grid = mtry.grid, pval_thr=pval_thr)
+ output.rf = c(result.rf$MSE, result.rf$coef.chosen)
+ }
+ }
+
+ # reformat results (stability & MSE)
+ stability_table = matrix(rep(0, num_boot * p), ncol=p)
+ results_mse = results_chosen = list()
+ for (b in 1:num_boot){
+ results_mse[b] = results[[b]][1]
+ results_chosen[[b]] = results[[b]][-1]
+ stability_table[b, results_chosen[[b]]] = 1
+ }
+
+ stab_index[k] = round(getStability(stability_table)$stability, 2)
+ MSE_list[[k]] = results_mse
+ }
+
+
+ results=list(num_boot=num_boot, method=method, stab_index=stab_index, MSE_list=MSE_list)
+
+}
diff --git a/code_method/bootstrap_test_compLasso_rf_binary.R b/code_method/bootstrap_test_compLasso_rf_binary.R
new file mode 100644
index 0000000..91d3664
--- /dev/null
+++ b/code_method/bootstrap_test_compLasso_rf_binary.R
@@ -0,0 +1,137 @@
+##########################################################
+### estimate correlation between stability index #########
+##########################################################
+
+source('cv_method_binary_update.R')
+source('getStability.R')
+
+## set up parallel computing
+library(foreach)
+library(doParallel)
+numCores <- detectCores() - 2 # 6 cores
+registerDoParallel(numCores) # use multicore, set to the number of our cores
+
+########################################################
+## double bootstrap applied to simulations ###
+########################################################
+boot_stab_sim = function(num_boot=100, sim_file, method, seednum=31,
+ ratio.training=0.8, fold.cv=10,
+ family='binomial', lambda.grid=exp(seq(-4, -2, 0.2)),
+ alpha.grid=seq(0.1, 0.9, 0.1),
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05,
+ lambda.coda=seq(0.1, 0.2, 0.01), data.split=FALSE){
+
+ # load simulated data
+ load(sim_file, dat <- new.env())
+
+ idx.start = 1; idx.stop = 100 # 100 repetitions for each simulated scenario
+ rou = dat$sim_array[[1]]$rou # rou, n, p are same across all repetitions
+ n = dat$sim_array[[1]]$n
+ p = dat$sim_array[[1]]$p
+
+ ## get a vector of stability index from bootstrapped data
+ stab_index = rep(0, num_boot)
+ b = 0
+ for (i in idx.start:idx.stop){
+ b = b + 1
+ print(paste('index', i, sep=':'))
+ sub = dat$sim_array[[i]]
+ y_binary = as.factor(ifelse(sub$Y >= median(sub$Y), 1, 0))
+
+ # bootsrap with parallelization
+ selections = foreach (i=1:num_boot) %dopar% {
+ N = length(y_binary) # number of samples
+ boot_ids = sample(N, size=N, replace=TRUE)
+ boot_Z = sub$Z[boot_ids, ]
+ boot_Y = y_binary[boot_ids]
+ boot_X = sub$X[boot_ids, ] # for generalized compositional lasso
+
+ ## select features
+ if (method == 'GenCompLasso'){# generalized compositional lasso use X instead of Z
+ result.lin = gen_cons_lasso_cv(y=boot_Y, datx=boot_X, seednum=i, ratio.training=ratio.training,
+ lambda.coda=lambda.coda, data.split=data.split)
+ select.lin = result.lin$coef.chosen
+
+ } else if (method == 'RF'){
+ result.rf = randomForest_cv(y=boot_Y, datx=boot_Z, seednum=i, fold.cv=fold.cv,
+ num_trees=num_trees, mtry.grid = mtry.grid, pval_thr=pval_thr)
+ select.rf = result.rf$coef.chosen
+ }
+ }
+
+ # calculate stability index from bootstrapped data
+ stability_table = matrix(rep(0, num_boot * p), ncol=p)
+ for (j in 1:num_boot){
+ stability_table[j, selections[[j]]] = 1
+ }
+
+ stab_index[b] = round(getStability(stability_table)$stability, 2)
+ }
+
+ results=list(rou=rou, n=n, p=p, num_boot=num_boot, method=method, stab_index=stab_index)
+
+}
+
+
+########################################################
+## double bootstrap applied to real data application ###
+########################################################
+boot_stab_data = function(num_boot=100, data_file, method, seednum=31,
+ ratio.training=0.8, fold.cv=10,
+ family='binomial', lambda.grid=exp(seq(-4, -2, 0.2)),
+ alpha.grid=seq(0.1, 0.9, 0.1),
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05,
+ lambda.coda=seq(0.1, 0.2, 0.01), data.split=FALSE){
+
+ # load simulated data
+ set.seed(seednum)
+ load(data_file)
+ p = dim(taxa)[2]
+
+ stab_index = rep(0, num_boot)
+ ROC_list = list()
+ # first loop of bootstrap to generate num_boot bootstrapped datasets
+ for (k in 1:num_boot){
+ print(paste('num_boot', k, sep=':'))
+ N = length(y) # number of samples
+ sample_ids = seq(1, N, 1)
+
+ # bootstrapped samples
+ boot_ids = sample(sample_ids, size=N, replace=TRUE)
+ boot_taxa = taxa[boot_ids, ]
+ boot_mf = y[boot_ids]
+
+ ## second loop of bootstrap to perform variable selection
+ results = foreach (i=1:num_boot) %dopar% {
+ boot_ids_second = sample(N, size=N, replace=TRUE)
+ boot_Z = boot_taxa[boot_ids_second, ]
+ boot_Y = boot_mf[boot_ids_second]
+
+ ## select features from lasso/elnet
+ if (method == 'GenCompLasso'){ # use X instead of Z (log-transformed)
+ result.lin = gen_cons_lasso_cv(y=boot_Y, datx=exp(boot_Z), seednum=i, ratio.training=ratio.training)
+ output.lin = c(result.lin$ROC, result.lin$coef.chosen)
+ } else if (method == 'RF'){
+ result.rf = randomForest_cv(y=boot_Y, datx=boot_Z, seednum=i, fold.cv=fold.cv,
+ num_trees=num_trees, mtry.grid = mtry.grid, pval_thr=pval_thr)
+ output.rf = c(result.rf$ROC, result.rf$coef.chosen)
+ }
+ }
+
+ # reformat results (stability & ROC)
+ stability_table = matrix(rep(0, num_boot * p), ncol=p)
+ results_mse = results_chosen = list()
+ for (b in 1:num_boot){
+ results_mse[b] = results[[b]][1]
+ results_chosen[[b]] = results[[b]][-1]
+ stability_table[b, results_chosen[[b]]] = 1
+ }
+
+ stab_index[k] = round(getStability(stability_table)$stability, 2)
+ ROC_list[[k]] = results_mse
+ }
+
+
+ results=list(num_boot=num_boot, method=method, stab_index=stab_index, ROC_list=ROC_list)
+
+}
diff --git a/code_method/code_method/.DS_Store b/code_method/code_method/.DS_Store
deleted file mode 100755
index aff38d1..0000000
Binary files a/code_method/code_method/.DS_Store and /dev/null differ
diff --git a/code_method/cv_method.R b/code_method/cv_method.R
new file mode 100755
index 0000000..820c707
--- /dev/null
+++ b/code_method/cv_method.R
@@ -0,0 +1,364 @@
+###########################################
+#### methods for comparisions #############
+###########################################
+library(glmnet)
+library(caret)
+library(ranger) # faster random forest
+
+# general reference with caret on glmnet & random forest
+## http://rstudio-pubs-static.s3.amazonaws.com/251240_12a8ecea8e144fada41120ddcf52b116.html
+
+# important note
+## when training model on training set, as tuning parameters are set in final model, coefficient are chosen already
+## use test set to predict the MSE
+## as unclear how carret handle fitting and prediction, use carret only for parameter tuning in elastic net
+
+###########################################
+#### Lasso (tune lambda) ##################
+###########################################
+#' @title lasso_cv
+#' @description Does a K-fold cross-validation for Lasso.
+#' @param datx The input data matrix.
+#' @param y The response variable.
+#' @param seednum The seed number (default=31).
+#' @param family The family of linear regression models (default="gaussian").
+#' @param ratio.training The ratio of the whole data assigned for model training (default=0.8).
+#' @param fold.cv The number of folds for cross-validation.(default=10)
+#' @param lambda.grid The tuning range for the regularization parameter "lambda" of "lasso".
+#' @return A list of Lasso model output including MSE, and selected features.
+
+lasso_cv = function(datx, y, seednum=31, family=family, ratio.training=0.8, fold.cv=10,
+ lambda.grid, lambda.choice='lambda.1se'){
+ # seednum: the seed number
+ # ratio.training: the ratio of training set (parento principle training:test=8:2)
+ # lambda.grid: possible candidate values for tuning parameter Lambda
+ # fold.cv: n-fold cross validation
+ # lambda.choice: 'lambda.min' or 'lambda.1se'
+
+ set.seed(seednum)
+
+ # # split data into training and tests sets
+ nn <- length(y)
+ trn <- sample(1:nn, ratio.training*nn)
+ x.train <- datx[trn, ]
+ x.test <- datx[-trn, ]
+ y.train <- y[trn]
+ y.test <- y[-trn]
+
+ # use cross-validation on training data & fit on test data
+ cv.fit <- cv.glmnet(x.train, y.train, family=family, alpha=1, lambda=lambda.grid, nfolds=fold.cv)
+ pred.fit <- predict(cv.fit, s=lambda.choice, newx=x.test, type='response')
+
+ # covariates chosen
+ coef.fit <- predict(cv.fit, s=lambda.choice, newx=x.test, type='coefficients')
+ coef.chosen = which(coef.fit != 0)
+ coef.chosen = coef.chosen - 1 # index for 1 representing intercept
+
+ # evaluate results
+ MSE <- mean((y.test - pred.fit)^2)
+
+
+ result = list(MSE=MSE, coef.chosen=coef.chosen)
+ return(result)
+}
+
+###########################################
+#### Elastic Net (tune lambda and alpha) ##
+###########################################
+# reference 1 (tune both parameters with caret)
+## https://stats.stackexchange.com/questions/268885/tune-alpha-and-lambda-parameters-of-elastic-nets-in-an-optimal-way
+# reference 2 (extract final model and prediction with caret)
+## https://topepo.github.io/caret/model-training-and-tuning.html
+
+#' @title elnet_cv
+#' @description Does a K-fold cross-validation for elastic-net.
+#' @param datx The input data matrix.
+#' @param y The response variable.
+#' @param seednum The seed number (default=31).
+#' @param family The family of linear regression models (default="gaussian").
+#' @param ratio.training The ratio of the whole data assigned for model training (default=0.8).
+#' @param fold.cv The number of folds for cross-validation.(default=10)
+#' @param lambda.grid The tuning range for the regularization parameter "lambda" of "lasso".
+#' @param alpha.grid The tuning range for the elastic-net mixing parameter "alpha".
+#' @return A list of elnet model output including MSE, and selected features.
+
+elnet_cv = function(datx, y, seednum=31, alpha.grid, lambda.grid, family=family,
+ ratio.training=0.8, fold.cv=10){
+ # seednum: the seed number
+ # ratio.training: the ratio of training set (parento principle training:test=8:2)
+ # alpha.grid: possible candidate values for tuning parameter alpha
+ # lambda.grid: possible candidate values for tuning parameter Lambda
+ # fold.cv: n-fold cross validation
+
+ set.seed(seednum)
+
+ # # split data into training and tests sets
+ nn <- length(y)
+ trn <- sample(1:nn, ratio.training*nn)
+ x.train <- datx[trn, ]
+ x.test <- datx[-trn, ]
+ y.train <- y[trn]
+ y.test <- y[-trn]
+
+ data.train = as.data.frame(cbind(y.train, x.train))
+ colnames(data.train) = c('y', paste('V', seq(1, dim(datx)[2]), sep=''))
+
+ # tune parameters with CV on training data with caret
+ trnCtrl <- trainControl(method = "cv", number = fold.cv)
+ srchGrid <- expand.grid(.alpha = alpha.grid, .lambda = lambda.grid)
+
+ my_train <- train(y ~., data.train,
+ method = "glmnet",
+ tuneGrid = srchGrid,
+ trControl = trnCtrl)
+
+ # fit the model with best tuning parameters on test data with glmnet
+ fit = glmnet(x.train, y.train, alpha=my_train$bestTune$alpha, lambda=my_train$bestTune$lambda)
+ pred.fit <- predict(fit, x.test, type='response')
+
+ # covariates chosen
+ coef.fit <- predict(fit, x.test, type='coefficients')
+ coef.chosen = which(coef.fit != 0)
+ coef.chosen = coef.chosen - 1 # index for 1 representing intercept
+
+ # evaluate results
+ MSE <- mean((y.test - pred.fit)^2)
+
+
+ result = list(MSE=MSE, coef.chosen=coef.chosen)
+ return(result)
+}
+
+
+###############################################################
+#### Random Forests (tune # variables at each random split ####
+###############################################################
+# reference on regression random forst
+## https://uc-r.github.io/random_forests
+# OOB error is different from test error (see above website)
+
+#' @title randomForest_cv
+#' @description Does a K-fold cross-validation for random forests.
+#' @param datx The input data matrix.
+#' @param y The response variable.
+#' @param sim_file The path of a .RData file includes the simulation data pre-generated.
+#' @param seednum The seed number (default=31).
+#' @param fold.cv The number of folds for cross-validation.(default=10)
+#' @param ratio.training The ratio of the whole data assigned for model training (default=0.8).
+#' @param mtry.grid The tuning range for the hyperparameter "mtry", that is, number of variables to possibly split at in each node.
+#' @param num_trees The number of trees grow in random forests.
+#' @param pval_thr The threshold for the estimated p value of RF importance scores.
+#' @param method.perm The permutation method for estimating the p value of RF importance scores.
+#' @return A list of RF model output including the "mtry" grid, selected features, MSE, OOB, and p-values of selected features.
+
+randomForest_cv = function(datx, y, seednum=31, fold.cv=5, ratio.training=0.8, mtry.grid=10, num_trees=500,
+ pval_thr=0.05, method.perm='altmann'){
+ # mtry: number of variables to randomly sample at each split
+ # num_trees: number of trees to grow in random forests
+ # pval_thr: threshold for permutation test
+ # note that permutation p-value can use "altmann method" for all types of data; 'Janita' for high-dimensitional data only
+ # ref on permutation methods: http://finzi.psych.upenn.edu/library/ranger/html/importance_pvalues.html
+
+ set.seed(seednum)
+
+ # split data into training and tests sets
+ data = as.data.frame(cbind(y, datx))
+ colnames(data) = c('y', paste('V', seq(1, dim(datx)[2]), sep=''))
+ inTraining = createDataPartition(data$y, p = ratio.training, list=FALSE)
+ train <- data[inTraining, ]
+ test <- data[-inTraining, ]
+
+ # tune parameter with cross validation
+ hyper.grid <- expand.grid(mtry = mtry.grid, OOB_RMSE = 0)
+ for (i in 1:nrow(hyper.grid)){
+ model = ranger::ranger(y ~., data = train,
+ num.trees=500, mtry=hyper.grid$mtry[i],
+ seed=seednum, importance = 'permutation')
+ hyper.grid$OOB_RMSE[i] = sqrt(model$prediction.error)
+ }
+ OOB = min(hyper.grid$OOB_RMSE) # out of bag error
+ position = which.min(hyper.grid$OOB_RMSE)
+
+ # permutation test on tuned random forst model to obtain chosen features
+ if (method.perm == 'altmann'){ # for all data types
+ rf.model <- ranger::ranger(y ~., data=test, num.trees = num_trees,
+ mtry = hyper.grid$mtry[position], importance = 'permutation')
+ table = as.data.frame(importance_pvalues(rf.model, method = "altmann",
+ formula = y ~ ., data = test))
+ } else if (method.perm == 'janitza'){ # for high dimensional data only
+ rf.model <- ranger::ranger(y ~., data=test, num.trees = num_trees,
+ mtry = hyper.grid$mtry[position], importance = 'impurity_corrected')
+ table = as.data.frame(importance_pvalues(rf.model, method = "janitza",
+ formula = y ~ ., data = test))
+ }
+
+ coef.chosen = which(table$pvalue < pval_thr)
+
+ # if nothing been selected
+ if (identical(coef.chosen, integer(0))){
+ coef.chosen = 0
+ }
+
+ # obtain additional prediction error to make comparable to other methods
+ pred_rf = predict(rf.model, test)
+ MSE <- mean((test$y - pred_rf$predictions)^2)
+
+ result = list(mtry=mtry.grid[position], coef.chosen=coef.chosen, MSE=MSE, OOB=OOB, p.value=table$pvalue)
+ return(result)
+
+}
+
+
+###############################################################
+#### Compositional Lasso by Lin et al 2014 ####################
+###############################################################
+dyn.load("../../code_Lin/cvs/cdmm.dll")
+source("../../code_Lin/cvs/cdmm.R")
+
+#' @title cons_lasso_cv
+#' @description Does a K-fold cross-validation for elastic-net.
+#' @param datx The input data matrix.
+#' @param y The response variable.
+#' @param seednum The seed number (default=31).
+#' @param ratio.training The ratio of the whole data assigned for model training (default=0.8).
+#' @return A list of compLasso model output including MSE, and selected features.
+
+cons_lasso_cv = function(datx, y, seednum, ratio.training=0.8){
+ set.seed(seednum)
+ z = datx
+ n = length(y)
+
+ itrn = sample(n, ratio.training*n)
+ itst = setdiff(1:n, itrn)
+ ans <- cv.cdmm(y[itrn], z[itrn, ], refit=TRUE) # proposed method (default: 10 fold CV)
+ bet <- ans$bet; int <- ans$int
+ pe <- mean((y[itst] - int - z[itst, ] %*% bet)^2)
+
+ coef.chosen = which(bet != 0) # the first beta refer to 1st feature
+ MSE = pe
+
+ result = list(MSE=MSE, coef.chosen=coef.chosen)
+ return(result)
+}
+
+###########################################################
+#### Lasso (double cv -- no validation set) ###############
+###########################################################
+lasso_double_cv = function(datx, y, seednum=31, family=family, fold.cv=10, lambda.grid){
+ set.seed(seednum)
+
+ ## double loop for cross-vlidation
+ flds <- caret::createFolds(y, k = fold.cv, list = TRUE, returnTrain = FALSE)
+
+ ## outer loop for estimating MSE
+ MSE = STAB = matrix(rep(0, fold.cv * length(lambda.grid)), nrow=fold.cv)
+ rownames(MSE) = rownames(STAB) = paste('fold', seq(1:fold.cv), sep='')
+ colnames(MSE) = colnames(STAB) = paste('lambda', seq(1:length(lambda.grid)), sep='')
+ for (b in 1:fold.cv){
+ idx.test = flds[[b]]
+ x.train = datx[-idx.test, ]
+ y.train = y[-idx.test]
+ x.test = datx[idx.test, ]
+ y.test = y[idx.test]
+
+ fit = glmnet(x.train, y.train, family=family, alpha=1, lambda=lambda.grid)
+ pred.fit <- predict(fit, newx=x.test, s=lambda.grid, type='response')
+ MSE[b, ] <- colMeans((y.test - pred.fit)^2)
+
+ ## inner loup for estimating stability
+ flds.inn <- createFolds(y.train, k = fold.cv, list = TRUE, returnTrain = FALSE)
+ stab.table.inn = list()
+ for (bb in 1:fold.cv){
+ idx.test.inn = flds.inn[[bb]]
+ stab.x.train = x.train[-idx.test.inn, ]
+ stab.y.train = y.train[-idx.test.inn]
+ stab.x.test = x.train[idx.test.inn, ]
+ stab.y.test = y.train[idx.test.inn]
+
+ fit.inn = glmnet(stab.x.train, stab.y.train, family=family, alpha=1, lambda=lambda.grid)
+ coef.fit.inn <- predict(fit.inn, newx=stab.x.test, s=lambda.grid, type='coefficients')
+ coef.chosen.table = as.matrix(coef.fit.inn)
+ coef.chosen.table = coef.chosen.table[-1, ] # no need of intercept
+ coef.chosen.table[coef.chosen.table != 0] = 1 # convert to binary table
+ coef.chosen.table[coef.chosen.table == 0] = 0
+ colnames(coef.chosen.table) = paste('lambda', seq(1:length(lambda.grid)), sep='')
+ stab.table.inn[[bb]] = coef.chosen.table
+ }
+
+ # calculate stability
+ table.tmp = matrix(rep(0, dim(datx)[2] * fold.cv), nrow=fold.cv)
+ for (i in 1:length(lambda.grid)){
+ for (bb in 1:fold.cv){
+ table.tmp[bb, ] = stab.table.inn[[bb]][, i]
+ }
+ STAB[b, i] = round(getStability(table.tmp)$stability, 2)
+ }
+ }
+
+ result = list(lambda.grid=lambda.grid, MSE.list=MSE, STAB.list=STAB, MSE.value=colMeans(MSE), STAB.value=colMeans(STAB))
+ return(result)
+}
+
+
+###########################################################
+#### RF (double cv -- no validation set) ###############
+###########################################################
+randomForest_double_cv = function(datx, y, seednum=31, fold.cv=5, mtry.grid=10, num_trees=500, pval_thr=0.05){
+ set.seed(seednum)
+
+ ## double loop for cross-vlidation
+ data = as.data.frame(cbind(y, datx))
+ colnames(data) = c('y', paste('V', seq(1, dim(datx)[2]), sep=''))
+ flds <- createFolds(data$y, k = fold.cv, list = TRUE, returnTrain = FALSE)
+
+ ## outer loop for estimating MSE
+ MSE = STAB = matrix(rep(0, fold.cv * length(mtry.grid)), nrow=fold.cv)
+ rownames(MSE) = rownames(STAB) = paste('fold', seq(1:fold.cv), sep='')
+ colnames(MSE) = colnames(STAB) = paste('mtry', seq(1:length(mtry.grid)), sep='')
+ for (b in 1:fold.cv){
+ idx.test = flds[[b]]
+ train <- data[-idx.test, ]
+ test <- data[idx.test, ]
+
+ for (mtry in mtry.grid){
+ fit = ranger(y ~., data = train, num.trees=500, mtry=mtry, seed=seednum, importance = 'permutation')
+ pred.fit <- predict(fit, test)
+ MSE[b, ] <- mean((test$y - pred.fit$predictions)^2)
+ }
+
+ ## inner loup for estimating stability
+ flds.inn <- createFolds(train$y, k = fold.cv, list = TRUE, returnTrain = FALSE)
+ stab.table.inn = list()
+ for (bb in 1:fold.cv){
+ idx.test.inn = flds.inn[[bb]]
+ stab.train <- train[-idx.test.inn, ]
+ stab.test <- train[idx.test.inn, ]
+
+ coef.chosen.table = matrix(rep(0, dim(datx)[2] * length(mtry.grid)), ncol=length(mtry.grid))
+ colnames(coef.chosen.table) = paste('mtry', seq(1:length(mtry.grid)), sep='')
+ idx = 0
+ for (mtry in mtry.grid){
+ idx = idx + 1
+ fit.inn = ranger(y ~., data = stab.train, num.trees=500, mtry=mtry, seed=seednum, importance = 'permutation')
+ table = as.data.frame(importance_pvalues(fit.inn, method = "altmann", formula = y ~ ., data = stab.train))
+ coef.chosen = which(table$pvalue < pval_thr)
+ coef.chosen.table[coef.chosen, idx] = 1
+ stab.table.inn[[bb]] = coef.chosen.table
+ }
+
+ }
+
+ # calculate stability
+ table.tmp = matrix(rep(0, dim(datx)[2] * fold.cv), nrow=fold.cv)
+ for (i in 1:length(mtry.grid)){
+ for (bb in 1:fold.cv){
+ table.tmp[bb, ] = stab.table.inn[[bb]][, i]
+ }
+ STAB[b, i] = round(getStability(table.tmp)$stability, 2)
+ }
+ }
+
+ result = list(mtry.grid=mtry.grid, MSE.list=MSE, STAB.list=STAB, MSE.value=colMeans(MSE), STAB.value=colMeans(STAB))
+ return(result)
+
+}
diff --git a/code_method/cv_method_binary_update.R b/code_method/cv_method_binary_update.R
new file mode 100644
index 0000000..8da826f
--- /dev/null
+++ b/code_method/cv_method_binary_update.R
@@ -0,0 +1,247 @@
+###########################################
+#### methods for binary outcome #############
+###########################################
+library(glmnet)
+library(caret)
+library(ranger) # faster random forest
+library(pROC) # calculate ROC (need to load this library to avoid error)
+
+# general reference with caret on glmnet & random forest
+## http://rstudio-pubs-static.s3.amazonaws.com/251240_12a8ecea8e144fada41120ddcf52b116.html
+
+# important note
+## when training model on training set, as tuning parameters are set in final model, coefficient are chosen already
+## use test set to predict the MSE
+## as unclear how carret handle fitting and prediction, use carret only for parameter tuning in elastic net
+
+###########################################
+#### Lasso (tune lambda) ##################
+###########################################
+lasso_cv = function(datx, y, seednum=31, family=family, ratio.training=0.8, fold.cv=10,
+ lambda.grid, lambda.choice='lambda.1se'){
+ # seednum: the seed number
+ # ratio.training: the ratio of training set (parento principle training:test=8:2)
+ # lambda.grid: possible candidate values for tuning parameter Lambda
+ # fold.cv: n-fold cross validation
+ # lambda.choice: 'lambda.min' or 'lambda.1se'
+
+ set.seed(seednum)
+
+ # # split data into training and tests sets
+ nn <- length(y)
+ trn <- sample(1:nn, ratio.training*nn)
+ x.train <- datx[trn, ]
+ x.test <- datx[-trn, ]
+ y.train <- y[trn]
+ y.test <- y[-trn]
+
+ # use cross-validation on training data & fit on test data
+ cv.fit <- cv.glmnet(x.train, y.train, family=family, alpha=1, lambda=lambda.grid, nfolds=fold.cv)
+ pred.fit <- predict(cv.fit, s=lambda.choice, newx=x.test, type='class') # predict class for binomial
+
+ # covariates chosen
+ coef.fit <- predict(cv.fit, s=lambda.choice, newx=x.test, type='coefficients')
+ coef.chosen = which(coef.fit != 0)
+ coef.chosen = coef.chosen - 1 # index for 1 representing intercept
+
+ pred.class <- as.numeric(pred.fit)
+ ROC <- pROC::roc(y.test, pred.class)$auc # in fact should be called "AUC" value instead of ROC
+
+
+ result = list(ROC=ROC, coef.chosen=coef.chosen)
+ return(result)
+}
+
+###########################################
+#### Elastic Net (tune lambda and alpha) ##
+###########################################
+# reference 1 (tune both parameters with caret)
+## https://stats.stackexchange.com/questions/268885/tune-alpha-and-lambda-parameters-of-elastic-nets-in-an-optimal-way
+# reference 2 (extract final model and prediction with caret)
+## https://topepo.github.io/caret/model-training-and-tuning.html
+
+
+elnet_cv = function(datx, y, seednum=31, alpha.grid, lambda.grid, family=family,
+ ratio.training=0.8, fold.cv=10){
+ # seednum: the seed number
+ # ratio.training: the ratio of training set (parento principle training:test=8:2)
+ # alpha.grid: possible candidate values for tuning parameter alpha
+ # lambda.grid: possible candidate values for tuning parameter Lambda
+ # fold.cv: n-fold cross validation
+
+ set.seed(seednum)
+
+ # # split data into training and tests sets
+ nn <- length(y)
+ trn <- sample(1:nn, ratio.training*nn)
+ x.train <- datx[trn, ]
+ x.test <- datx[-trn, ]
+ y.train <- y[trn]
+ y.test <- y[-trn]
+
+ #data.train = as.data.frame(cbind(y.train, x.train))
+ data.train = data.frame(y.train, x.train) # avoid conversion of data types with binary response
+ colnames(data.train) = c('y', paste('V', seq(1, dim(datx)[2]), sep=''))
+
+ # tune parameters with CV on training data with caret
+ trnCtrl <- trainControl(method = "cv", number = fold.cv)
+ srchGrid <- expand.grid(.alpha = alpha.grid, .lambda = lambda.grid)
+
+ data.train$y = as.factor(data.train$y)
+ my_train <- train(y ~., data.train,
+ method = "glmnet",
+ tuneGrid = srchGrid,
+ trControl = trnCtrl)
+
+ # fit the model with best tuning parameters on test data with glmnet
+ fit = glmnet(x.train, y.train, alpha=my_train$bestTune$alpha,
+ lambda=my_train$bestTune$lambda, family=family)
+ pred.fit <- predict(fit, x.test, type='class') # predict class for binomial
+
+ # covariates chosen
+ coef.fit <- predict(fit, x.test, type='coefficients')
+ coef.chosen = which(coef.fit != 0)
+ coef.chosen = coef.chosen - 1 # index for 1 representing intercept
+
+ # evaluate results
+ pred.class <- as.numeric(unlist(pred.fit))
+ ROC <- pROC::roc(y.test, pred.class)$auc
+
+ result = list(ROC=ROC, coef.chosen=coef.chosen)
+ #result = list(pred.fit=pred.fit, y.test=y.test)
+ return(result)
+}
+
+
+###############################################################
+#### Random Forests (tune # variables at each random split ####
+###############################################################
+# reference on regression random forst
+## https://uc-r.github.io/random_forests
+# OOB error is different from test error (see above website)
+
+
+randomForest_cv = function(datx, y, seednum=31, fold.cv=5, ratio.training=0.8, mtry.grid=10, num_trees=500,
+ pval_thr=0.05, method.perm='altmann'){
+ # mtry: number of variables to randomly sample at each split
+ # num_trees: number of trees to grow in random forests
+ # pval_thr: threshold for permutation test
+ # note that permutation p-value can use "altmann method" for all types of data; 'Janita' for high-dimensitional data only
+ # ref on permutation methods: http://finzi.psych.upenn.edu/library/ranger/html/importance_pvalues.html
+
+ set.seed(seednum)
+
+ # split data into training and tests sets
+ #data = as.data.frame(cbind(y, datx))
+ data = data.frame(y, datx) # avoid conversion of data types with binary response
+ colnames(data) = c('y', paste('V', seq(1, dim(datx)[2]), sep=''))
+ inTraining = createDataPartition(data$y, p = ratio.training, list=FALSE)
+ train <- data[inTraining, ]
+ test <- data[-inTraining, ]
+
+ # tune parameter with cross validation
+ hyper.grid <- expand.grid(mtry = mtry.grid, OOB_RMSE = 0)
+ for (i in 1:nrow(hyper.grid)){
+ model = ranger(y ~., data = train,
+ num.trees=500, mtry=hyper.grid$mtry[i],
+ seed=seednum, importance = 'permutation')
+ hyper.grid$OOB_RMSE[i] = sqrt(model$prediction.error)
+ }
+ OOB = min(hyper.grid$OOB_RMSE) # out of bag error
+ position = which.min(hyper.grid$OOB_RMSE)
+
+ # permutation test on tuned random forst model to obtain chosen features
+ if (method.perm == 'altmann'){ # for all data types
+ rf.model <- ranger(y ~., data=test, num.trees = num_trees,
+ mtry = hyper.grid$mtry[position], importance = 'permutation')
+ table = as.data.frame(importance_pvalues(rf.model, method = "altmann",
+ formula = y ~ ., data = test))
+ } else if (method.perm == 'janitza'){ # for high dimensional data only
+ rf.model <- ranger(y ~., data=test, num.trees = num_trees,
+ mtry = hyper.grid$mtry[position], importance = 'impurity_corrected')
+ table = as.data.frame(importance_pvalues(rf.model, method = "janitza",
+ formula = y ~ ., data = test))
+ }
+
+ coef.chosen = which(table$pvalue < pval_thr)
+
+ # if nothing been selected
+ if (identical(coef.chosen, integer(0))){
+ coef.chosen = 0
+ }
+
+ # predicted class: https://www.rdocumentation.org/packages/ranger/versions/0.12.1/topics/predict.ranger
+ pred_rf = predict(rf.model, test)
+ pred.class <- as.numeric(pred_rf$predictions)
+ ROC <- pROC::roc(test$y, pred.class)$auc
+
+
+ result = list(mtry=mtry.grid[position], coef.chosen=coef.chosen, ROC=ROC, OOB=OOB, p.value=table$pvalue)
+ return(result)
+
+}
+
+
+# # ###############################################################
+# # #### Generalized Compositional Lasso by Lu et al 2019 #########
+# # ###############################################################
+dir_coda = '../code_coda/Microbiome-Variable-Selection-master/Microbiome_variable_selection_tutorial/'
+
+source(file = paste0(dir_coda, 'CoDA-Penalized-Regression/R/functions_coda_penalized_regression.R'))
+
+source(file = paste0(dir_coda, 'functions.R'))
+
+gen_cons_lasso_cv = function(datx, y, seednum=31, data.split=FALSE,
+ ratio.training=0.8, lambda.coda=seq(0.1, 0.2, 0.01)){
+ # note that generalized compositional lasso do the X transformation within
+ # thus use sub$X instead of sub$Z for generalized compositonal lasso method
+
+ set.seed(seednum)
+ colnames(datx) = paste('V', seq(1, dim(datx)[2]), sep='')
+
+ if (data.split == TRUE){
+ # # split data into training and tests sets
+ nn <- length(y)
+ trn <- sample(1:nn, ratio.training*nn)
+ x.train <- datx[trn, ]
+ x.test <- datx[-trn, ]
+ y.train <- y[trn]
+ y.test <- y[-trn]
+
+ lambda_table <- lambdaRange_codalasso(X = x.train, y = y.train,
+ lambdaSeq = lambda.coda)
+ lambda_optimal <- lambda_table[which.max(lambda_table$prop.explained.dev),
+ 'lambda']
+ codalasso_sim <- coda_logistic_lasso(X = x.test, y = y.test,
+ lambda = lambda_optimal)
+ ROC <- pROC::roc(y.test, codalasso_sim$`predicted class`)$auc
+ sim.results_codalasso <- coda_lasso_wrapper(result = codalasso_sim,
+ X = x.test)
+ coef.chosen = tidyr::extract_numeric(sim.results_codalasso$varSelect)
+ }else{
+ lambda_table <- lambdaRange_codalasso(X = datx, y = y,
+ lambdaSeq = lambda.coda)
+
+ # choose lambda explained most deviance (if several, choose smaller lambda)
+ lambda_optimal <- lambda_table[which.max(lambda_table$prop.explained.dev),
+ 'lambda']
+ codalasso_sim <- coda_logistic_lasso(X = datx, y = y,
+ lambda = lambda_optimal)
+ ROC <- pROC::roc(y, codalasso_sim$`predicted class`)$auc
+ sim.results_codalasso <- coda_lasso_wrapper(result = codalasso_sim,
+ X = datx)
+ coef.chosen = tidyr::extract_numeric(sim.results_codalasso$varSelect)
+ }
+
+ result = list(ROC=ROC, coef.chosen=coef.chosen)
+ return(result)
+}
+
+
+
+
+
+
+
+
+
diff --git a/code_method/getStability.R b/code_method/getStability.R
new file mode 100755
index 0000000..913a417
--- /dev/null
+++ b/code_method/getStability.R
@@ -0,0 +1,95 @@
+#' ## source code: https://github.com/nogueirs/JMLR2018/blob/master/R/getStability.R
+#' @title getStability
+#' @description Nogueira's stability measure from a binary matrix representing
+#' feature selection results in M bootstrapped datasets using a machine-learning method.
+#' @param X A binary matrix X of size M * p, where a row represents a feature set
+#' (for one of M data sets) and a column represents the selection of a given
+#' feature over all the M data sets.
+#' @param alpha The level of significance (e.g. if alpha=0.05, we will get 95%
+#' confidence intervals). It's an optional argument and is set to 5% by default.
+#' @details
+#' @return A list of stability measures is returned. The list inlcude stability,
+#' the variance of statbility, the upper/lower bound of the (1-alpha)
+#' confidence interval.
+#' @examples
+#' ## extreme cases example
+#' d = 2 # number of features
+#' M = 10 # number of bootstrap replicates
+#'
+#' ## case 1: when stability index undefined -- Z all zeros or all ones (Nogueria2018 p.13)
+#' Z_all_missed = matrix(rep(0, M*d), nrow=M) # since K_bar = 0, thus SI undefined
+#' getStability(Z_all_missed)$stability
+#'
+#' Z_all_selected = matrix(rep(1, M*d), nrow=M) # since K_bar = d = 3, thus SI underfined
+#' getStability(Z_all_selected)$stability
+#'
+#' ## case 2: when stability index reaches maximum 1 -- each column of Z either all ones or all zeros (but not only zeros or only ones)(Nogueria2018 p.13)
+#' # this was the case when we got the wrong SI almost 1 for soil datasets: since sampled dataset the same across all
+#' d = 10
+#' for (i in 1: (d-1)){
+#' d_ones = sample(seq(1, d, 1), i)
+#' Z_tmp = matrix(rep(0, M*d), nrow=M)
+#' Z_tmp[, d_ones] = 1 # since Sf = 0 for all features, thus SI = 1
+#' SI = getStability(Z_tmp)$stability
+#' print(i)
+#' print(paste('SI:', SI, sep=''))
+#' }
+#'
+#' ## case 3: when stability index near minimum 0 (appedix D: - 1/(M-1), but as M goes to infinity, minimum asymptotically 0)
+#' ## when for each column of feature, it receives same numbers of 0 and 1
+#' d_alt = rep(c(0, 1), M/2)
+#' Z_alt = matrix(rep(d_alt, d), ncol=d)
+#' getStability(Z_alt)$stability # -0.1111111, which is - 1/(M-1)
+#
+#' d_alt_2 = c(rep(0, M/2), rep(1, M/2))
+#' Z_alt_2 = matrix(rep(d_alt_2, d), ncol=d)
+#' getStability(Z_alt_2)$stability # -0.1111111, which is - 1/(M-1)
+#
+#' # when M goes to infinity
+#' M = 10000
+#' d_alt = rep(c(0, 1), M/2)
+#' Z_alt = matrix(rep(d_alt, d), ncol=d)
+#' getStability(Z_alt)$stability # -0.00010001, very close to 0
+#' @export
+getStability <- function(X, alpha=0.05) {
+## the input X is a binary matrix of size M*d where:
+## M is the number of bootstrap replicates
+## d is the total number of features
+## alpha is the level of significance (e.g. if alpha=0.05, we will get 95% confidence intervals)
+## it's an optional argument and is set to 5% by default
+### first we compute the stability
+
+M<-nrow(X)
+d<-ncol(X)
+hatPF<-colMeans(X) # selection probability of each feature
+kbar<-sum(hatPF)
+v_rand=(kbar/d)*(1-kbar/d) # kbar is the sum of selection probability on all features; v_rand is like the variance of bernoulli dist
+stability<-1-(M/(M-1))*mean(hatPF*(1-hatPF))/v_rand ## this is the stability estimate
+
+## then we compute the variance of the estimate
+ki<-rowSums(X)
+phi_i<-rep(0,M)
+for(i in 1:M){
+ phi_i[i]<-(1/v_rand)*((1/d)*sum(X[i,]*hatPF)-(ki[i]*kbar)/d^2-(stability/2)*((2*kbar*ki[i])/d^2-ki[i]/d-kbar/d+1))
+}
+phi_bar=mean(phi_i)
+var_stab=(4/M^2)*sum((phi_i-phi_bar)^2) ## this is the variance of the stability estimate
+
+## then we calculate lower and upper limits of the confidence intervals
+z<-qnorm(1-alpha/2) # this is the standard normal cumulative inverse at a level 1-alpha/2
+upper<-stability+z*sqrt(var_stab) ## the upper bound of the (1-alpha) confidence interval
+lower<-stability-z*sqrt(var_stab) ## the lower bound of the (1-alpha) confidence interval
+
+return(list("stability"=stability,"variance"=var_stab,"lower"=lower,"upper"=upper))
+
+}
+
+
+
+
+
+
+
+
+
+
diff --git a/code_method/stab_data_applications.R b/code_method/stab_data_applications.R
new file mode 100755
index 0000000..403a3d8
--- /dev/null
+++ b/code_method/stab_data_applications.R
@@ -0,0 +1,73 @@
+##########################################################
+### estimate correlation between stability index #########
+##########################################################
+
+#source('cv_method.R')
+#source('getStability.R')
+
+## set up parallel computing
+library(foreach)
+library(doParallel)
+#numCores <- detectCores() - 2 # 6 cores
+numCores <- detectCores() - 4 # for old computer
+registerDoParallel(numCores) # use multicore, set to the number of our cores
+
+boot_stab = function(num_boot=100, dat_file, method, ratio.training=0.8, fold.cv=10,
+ family='gaussian', lambda.grid=exp(seq(-4, -2, 0.2)), alpha.grid=seq(0.1, 0.9, 0.1),
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05, method.perm='altmann'){
+
+ # load data
+ load(dat_file)
+
+ # bootsrap with parallelization
+ results = foreach (i=1:num_boot) %dopar% {
+ print(paste('bootnum', i, sep=":"))
+ set.seed(i) # ensure each bootstrapped data is the same across methods
+ N = length(y) # number of samples
+ sample_ids = seq(1, N, 1)
+
+ # bootstrapped samples
+ boot_ids = sample(sample_ids, size=N, replace=TRUE)
+ boot_taxa = taxa[boot_ids, ]
+ boot_mf = y[boot_ids]
+
+ ## select features from lasso/elnet
+ if (method == 'compLasso'){
+ result.lin = cons_lasso_cv(y=boot_mf, datx=boot_taxa, seednum=i, ratio.training=ratio.training)
+ output.lin = c(result.lin$MSE, result.lin$coef.chosen)
+
+ } else if (method == 'RF'){
+ result.rf = randomForest_cv(y=boot_mf, datx=boot_taxa, seednum=i, fold.cv=fold.cv,
+ num_trees=num_trees, mtry.grid = mtry.grid, pval_thr=pval_thr, method.perm=method.perm)
+ output.rf = c(result.rf$MSE, result.rf$coef.chosen)
+
+ } else if (method == 'lasso'){
+ result.lasso = lasso_cv(y=boot_mf, datx=boot_taxa, seednum=i,family=family, lambda.choice='lambda.1se',
+ ratio.training=ratio.training, fold.cv=fold.cv, lambda.grid=lambda.grid)
+ output.lasso = c(result.lasso$MSE, result.lasso$coef.chosen)
+
+ } else if (method == 'elnet'){
+ result.elnet = elnet_cv(y=boot_mf, datx=boot_taxa, seednum=i,family=family, alpha.grid=alpha.grid,
+ ratio.training=ratio.training, fold.cv=fold.cv, lambda.grid=lambda.grid)
+ output.elnet = c(result.elnet$MSE, result.elnet$coef.chosen)
+ }
+ }
+
+ # reformat results
+ p = dim(taxa)[2]
+ stability_table = matrix(rep(0, num_boot * p), ncol=p)
+ results_mse = results_chosen = list()
+ for (b in 1:num_boot){
+ results_mse[b] = results[[b]][1]
+ results_chosen[[b]] = results[[b]][-1]
+ stability_table[b, results_chosen[[b]]] = 1
+ }
+
+ stab_index = round(getStability(stability_table)$stability, 2)
+ MSE_mean = round(mean(unlist(results_mse), na.rm=T),2)
+ MSE_se = round(FSA::se(unlist(results_mse), na.rm=T),2)
+
+ results_list=list(num_boot=num_boot, method=method, stab_index=stab_index, stab_table=stability_table,
+ results_chosen=results_chosen, lists_mse=results_mse, MSE_mean=MSE_mean, MSE_se=MSE_se)
+
+}
diff --git a/code_method/stab_data_applications_binary.R b/code_method/stab_data_applications_binary.R
new file mode 100644
index 0000000..cc5f907
--- /dev/null
+++ b/code_method/stab_data_applications_binary.R
@@ -0,0 +1,74 @@
+##########################################################
+### estimate correlation between stability index #########
+##########################################################
+
+#source('cv_method_binary_update.R')
+#source('getStability.R')
+
+## set up parallel computing
+library(foreach)
+library(doParallel)
+#numCores <- detectCores() - 2 # 6 cores
+numCores <- detectCores() - 4 # for old computer
+registerDoParallel(numCores) # use multicore, set to the number of our cores
+
+boot_stab = function(num_boot=100, dat_file, method, ratio.training=0.8, fold.cv=10,
+ family='binomial', lambda.grid=exp(seq(-4, -2, 0.2)), alpha.grid=seq(0.1, 0.9, 0.1),
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05, method.perm='altmann',
+ lambda.coda=seq(0.1, 0.2, 0.01), data.split=FALSE){
+
+ # load data
+ load(dat_file)
+
+ # bootsrap with parallelization
+ results = foreach (i=1:num_boot) %dopar% {
+ print(paste('bootnum', i, sep=":"))
+ set.seed(i) # ensure each bootstrapped data is the same across methods
+ N = length(y) # number of samples
+ sample_ids = seq(1, N, 1)
+
+ # bootstrapped samples
+ boot_ids = sample(sample_ids, size=N, replace=TRUE)
+ boot_taxa = taxa[boot_ids, ] # log-transformed taxa relative abundance
+ boot_mf = y[boot_ids]
+
+ ## select features from lasso/elnet
+ if (method == 'GenCompLasso'){ # use X instead of Z (log-transformed)
+ result.lin = gen_cons_lasso_cv(y=boot_mf, datx=exp(boot_taxa), seednum=i, ratio.training=ratio.training)
+ output.lin = c(result.lin$ROC, result.lin$coef.chosen)
+
+ } else if (method == 'RF'){
+ result.rf = randomForest_cv(y=boot_mf, datx=boot_taxa, seednum=i, fold.cv=fold.cv,
+ num_trees=num_trees, mtry.grid = mtry.grid, pval_thr=pval_thr, method.perm=method.perm)
+ output.rf = c(result.rf$ROC, result.rf$coef.chosen)
+
+ } else if (method == 'lasso'){
+ result.lasso = lasso_cv(y=boot_mf, datx=boot_taxa, seednum=i,family=family, lambda.choice='lambda.1se',
+ ratio.training=ratio.training, fold.cv=fold.cv, lambda.grid=lambda.grid)
+ output.lasso = c(result.lasso$ROC, result.lasso$coef.chosen)
+
+ } else if (method == 'elnet'){
+ result.elnet = elnet_cv(y=boot_mf, datx=boot_taxa, seednum=i,family=family, alpha.grid=alpha.grid,
+ ratio.training=ratio.training, fold.cv=fold.cv, lambda.grid=lambda.grid)
+ output.elnet = c(result.elnet$ROC, result.elnet$coef.chosen)
+ }
+ }
+
+ # reformat results
+ p = dim(taxa)[2]
+ stability_table = matrix(rep(0, num_boot * p), ncol=p)
+ results_ROC = results_chosen = list()
+ for (b in 1:num_boot){
+ results_ROC[b] = results[[b]][1]
+ results_chosen[[b]] = results[[b]][-1]
+ stability_table[b, results_chosen[[b]]] = 1
+ }
+
+ stab_index = round(getStability(stability_table)$stability, 2)
+ ROC_mean = round(mean(unlist(results_ROC), na.rm=T),2)
+ ROC_se = round(FSA::se(unlist(results_ROC), na.rm=T),2)
+
+ results_list=list(num_boot=num_boot, method=method, stab_index=stab_index, stab_table=stability_table,
+ results_chosen=results_chosen, lists_ROC=results_ROC, ROC_mean=ROC_mean, ROC_se=ROC_se)
+
+}
diff --git a/data_application/.DS_Store b/data_application/.DS_Store
deleted file mode 100755
index 2461f94..0000000
Binary files a/data_application/.DS_Store and /dev/null differ
diff --git a/data_application/88soils/.DS_Store b/data_application/88soils/.DS_Store
deleted file mode 100755
index 7a35105..0000000
Binary files a/data_application/88soils/.DS_Store and /dev/null differ
diff --git a/data_application/88soils/results_bin/soils_binary_ph_GenCompLasso.RData b/data_application/88soils/results_bin/soils_binary_ph_GenCompLasso.RData
new file mode 100644
index 0000000..1e40325
Binary files /dev/null and b/data_application/88soils/results_bin/soils_binary_ph_GenCompLasso.RData differ
diff --git a/data_application/88soils/results_bin/soils_binary_ph_boot_compLasso.RData b/data_application/88soils/results_bin/soils_binary_ph_boot_compLasso.RData
new file mode 100644
index 0000000..f8c44cc
Binary files /dev/null and b/data_application/88soils/results_bin/soils_binary_ph_boot_compLasso.RData differ
diff --git a/data_application/88soils/results_bin/soils_binary_ph_boot_rf.RData b/data_application/88soils/results_bin/soils_binary_ph_boot_rf.RData
new file mode 100644
index 0000000..810a02b
Binary files /dev/null and b/data_application/88soils/results_bin/soils_binary_ph_boot_rf.RData differ
diff --git a/data_application/88soils/results_bin/soils_binary_ph_elnet.RData b/data_application/88soils/results_bin/soils_binary_ph_elnet.RData
new file mode 100644
index 0000000..4b94de5
Binary files /dev/null and b/data_application/88soils/results_bin/soils_binary_ph_elnet.RData differ
diff --git a/data_application/88soils/results_bin/soils_binary_ph_lasso.RData b/data_application/88soils/results_bin/soils_binary_ph_lasso.RData
new file mode 100644
index 0000000..351b7fe
Binary files /dev/null and b/data_application/88soils/results_bin/soils_binary_ph_lasso.RData differ
diff --git a/data_application/88soils/results_bin/soils_binary_ph_rf.RData b/data_application/88soils/results_bin/soils_binary_ph_rf.RData
new file mode 100644
index 0000000..f205d56
Binary files /dev/null and b/data_application/88soils/results_bin/soils_binary_ph_rf.RData differ
diff --git a/data_application/88soils/results_cts/soils_ph.RData b/data_application/88soils/results_cts/soils_ph.RData
new file mode 100755
index 0000000..b7933e1
Binary files /dev/null and b/data_application/88soils/results_cts/soils_ph.RData differ
diff --git a/data_application/88soils/results_cts/soils_ph_boot_compLasso.RData b/data_application/88soils/results_cts/soils_ph_boot_compLasso.RData
new file mode 100755
index 0000000..340c5a2
Binary files /dev/null and b/data_application/88soils/results_cts/soils_ph_boot_compLasso.RData differ
diff --git a/data_application/88soils/results_cts/soils_ph_boot_rf.RData b/data_application/88soils/results_cts/soils_ph_boot_rf.RData
new file mode 100755
index 0000000..d43024a
Binary files /dev/null and b/data_application/88soils/results_cts/soils_ph_boot_rf.RData differ
diff --git a/data_application/88soils/results_cts/soils_ph_compLasso.RData b/data_application/88soils/results_cts/soils_ph_compLasso.RData
new file mode 100755
index 0000000..e591d05
Binary files /dev/null and b/data_application/88soils/results_cts/soils_ph_compLasso.RData differ
diff --git a/data_application/88soils/results_cts/soils_ph_elnet.RData b/data_application/88soils/results_cts/soils_ph_elnet.RData
new file mode 100755
index 0000000..4aaca38
Binary files /dev/null and b/data_application/88soils/results_cts/soils_ph_elnet.RData differ
diff --git a/data_application/88soils/results_cts/soils_ph_lasso.RData b/data_application/88soils/results_cts/soils_ph_lasso.RData
new file mode 100755
index 0000000..370aa5f
Binary files /dev/null and b/data_application/88soils/results_cts/soils_ph_lasso.RData differ
diff --git a/data_application/88soils/results_cts/soils_ph_rf.RData b/data_application/88soils/results_cts/soils_ph_rf.RData
new file mode 100755
index 0000000..5730257
Binary files /dev/null and b/data_application/88soils/results_cts/soils_ph_rf.RData differ
diff --git a/data_application/BMI/.DS_Store b/data_application/BMI/.DS_Store
deleted file mode 100755
index 3541a03..0000000
Binary files a/data_application/BMI/.DS_Store and /dev/null differ
diff --git a/data_application/BMI/filter_onepercent/.DS_Store b/data_application/BMI/filter_onepercent/.DS_Store
deleted file mode 100755
index 5008ddf..0000000
Binary files a/data_application/BMI/filter_onepercent/.DS_Store and /dev/null differ
diff --git a/data_application/BMI/results_bin/BMI_binary_GenCompLasso.RData b/data_application/BMI/results_bin/BMI_binary_GenCompLasso.RData
new file mode 100644
index 0000000..77b9b13
Binary files /dev/null and b/data_application/BMI/results_bin/BMI_binary_GenCompLasso.RData differ
diff --git a/data_application/BMI/results_bin/BMI_binary_boot_compLasso.RData b/data_application/BMI/results_bin/BMI_binary_boot_compLasso.RData
new file mode 100644
index 0000000..e033e7b
Binary files /dev/null and b/data_application/BMI/results_bin/BMI_binary_boot_compLasso.RData differ
diff --git a/data_application/BMI/results_bin/BMI_binary_boot_rf.RData b/data_application/BMI/results_bin/BMI_binary_boot_rf.RData
new file mode 100644
index 0000000..dd7de59
Binary files /dev/null and b/data_application/BMI/results_bin/BMI_binary_boot_rf.RData differ
diff --git a/data_application/BMI/results_bin/BMI_binary_elnet.RData b/data_application/BMI/results_bin/BMI_binary_elnet.RData
new file mode 100644
index 0000000..bc5face
Binary files /dev/null and b/data_application/BMI/results_bin/BMI_binary_elnet.RData differ
diff --git a/data_application/BMI/results_bin/BMI_binary_lasso.RData b/data_application/BMI/results_bin/BMI_binary_lasso.RData
new file mode 100644
index 0000000..dce955d
Binary files /dev/null and b/data_application/BMI/results_bin/BMI_binary_lasso.RData differ
diff --git a/data_application/BMI/results_bin/BMI_binary_rf.RData b/data_application/BMI/results_bin/BMI_binary_rf.RData
new file mode 100644
index 0000000..d85608c
Binary files /dev/null and b/data_application/BMI/results_bin/BMI_binary_rf.RData differ
diff --git a/data_application/BMI/results_cts/BMI_boot_compLasso.RData b/data_application/BMI/results_cts/BMI_boot_compLasso.RData
new file mode 100755
index 0000000..af312d4
Binary files /dev/null and b/data_application/BMI/results_cts/BMI_boot_compLasso.RData differ
diff --git a/data_application/BMI/results_cts/BMI_boot_rf.RData b/data_application/BMI/results_cts/BMI_boot_rf.RData
new file mode 100755
index 0000000..354dd1f
Binary files /dev/null and b/data_application/BMI/results_cts/BMI_boot_rf.RData differ
diff --git a/data_application/BMI/results_cts/BMI_compLasso.RData b/data_application/BMI/results_cts/BMI_compLasso.RData
new file mode 100755
index 0000000..881a1d8
Binary files /dev/null and b/data_application/BMI/results_cts/BMI_compLasso.RData differ
diff --git a/data_application/BMI/results_cts/BMI_elnet.RData b/data_application/BMI/results_cts/BMI_elnet.RData
new file mode 100755
index 0000000..c22cb89
Binary files /dev/null and b/data_application/BMI/results_cts/BMI_elnet.RData differ
diff --git a/data_application/BMI/results_cts/BMI_lasso.RData b/data_application/BMI/results_cts/BMI_lasso.RData
new file mode 100755
index 0000000..8cc6e87
Binary files /dev/null and b/data_application/BMI/results_cts/BMI_lasso.RData differ
diff --git a/data_application/BMI/results_cts/BMI_rf.RData b/data_application/BMI/results_cts/BMI_rf.RData
new file mode 100755
index 0000000..8e57fa9
Binary files /dev/null and b/data_application/BMI/results_cts/BMI_rf.RData differ
diff --git a/data_application/code_applications/.DS_Store b/data_application/code_applications/.DS_Store
deleted file mode 100755
index 5008ddf..0000000
Binary files a/data_application/code_applications/.DS_Store and /dev/null differ
diff --git a/data_application/code_applications/code_bin/88soils_stab_application_binary.R b/data_application/code_applications/code_bin/88soils_stab_application_binary.R
new file mode 100644
index 0000000..db0659c
--- /dev/null
+++ b/data_application/code_applications/code_bin/88soils_stab_application_binary.R
@@ -0,0 +1,111 @@
+#########################################################################################################
+### This is to estimate stablity & MSE using bootstrap on BMI_Lin_2014 dataset #################
+#########################################################################################################
+
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+source('../../../code_method/cv_method_binary_update.R')
+source('../../../code_method/getStability.R')
+source('../../../code_method/stab_data_applications_binary.R') # boot_stab()
+source('../../../code_method/bootstrap_test_compLasso_rf_binary.R') # boot_stab_data()
+
+#####################################
+##### data preparation ##############
+#####################################
+load('../../88soils/88soils_genus_table.RData') # if any error, re-process the data in HPC
+count = soil_otu
+
+## filter 1% + add pesudo count
+x <- count[, colMeans(count > 0) >= 0.01]
+x[x == 0] <- 0.5
+x <- x/rowSums(x) # relative abundance
+taxa <- log(x)
+print(paste('number of features:', dim(taxa)[2], sep=':'))
+
+# # metadata
+mf <- read.csv("../../88soils/88soils_modified_metadata.txt", sep='\t', row.names=1)
+y <- mf$ph[match(rownames(count), rownames(mf))]
+y <- as.factor(ifelse(y >= median(y), 1, 0)) # transform to be binary
+print(paste('number of samples:', length(y), sep=':'))
+
+# save processed data
+save(y, taxa, file='../../88soils/soils_ph_binary.RData')
+
+
+# #------------------------
+# # compare methods
+# #------------------------
+i <- as.numeric(Sys.getenv("PBS_ARRAYID"))
+
+if (i == 1){
+ ####################################
+ ## Generalized compositional lasso #############
+ # ####################################
+ print('GenCompLasso')
+ out_GenCompLasso = boot_stab(num_boot = 100, method = 'GenCompLasso', lambda.grid = NULL,
+ dat_file = '../../88soils/soils_ph_binary.RData')
+
+ save(out_GenCompLasso, file=paste0(dir, '/soils_binary_ph_GenCompLasso.RData', sep=''))
+}else if (i == 2){
+ #############################################################
+ # Lasso with glmnet default lambda sequence #############
+ #############################################################
+ print('Lasso')
+ out_lasso = boot_stab(num_boot = 100, method = 'lasso', lambda.grid = NULL,
+ dat_file = '../../88soils/soils_ph_binary.RData')
+
+ save(out_lasso, file=paste0(dir, '/soils_binary_ph_lasso.RData', sep=''))
+}else if (i == 3){
+ # ##############################################################
+ # ## Elastic Net with self defined lambda (NULL not allowed) #############
+ # ##############################################################
+ print('Elnet')
+ out_elnet = boot_stab(num_boot = 100, method = 'elnet',
+ dat_file = '../../88soils/soils_ph_binary.RData')
+
+ save(out_elnet, file=paste0(dir, '/soils_binary_ph_elnet.RData', sep=''))
+}else if (i == 4){
+ ##############################################################
+ ## Random Forest with Altman feature selection #############
+ ##############################################################
+ print('RF')
+ out_rf = boot_stab(num_boot = 100, method = 'RF', method.perm='altmann', mtry.grid=seq(30, 60, 5),
+ dat_file = '../../88soils/soils_ph_binary.RData')
+
+ save(out_rf, file=paste0(dir, '/soils_binary_ph_rf.RData', sep=''))
+}else if (i == 5){
+ ########################################################################
+ ############### double bootstrap: random forest ##################
+ # ######################################################################
+ print('double boot RF')
+ boot_rf = boot_stab_data(num_boot=100, method = 'RF',
+ data_file='../../88soils/soils_ph_binary.RData')
+ save(boot_rf, file=paste0(dir, '/soils_binary_ph_boot_rf.RData'))
+}else if (i == 6){
+ ########################################################################
+ ############### double bootstrap: generalized compositional lasso ######
+ # ######################################################################
+ print('double boot compLasso')
+ boot_compLasso = boot_stab_data(num_boot=100, method = 'GenCompLasso',
+ data_file='../../88soils/soils_ph_binary.RData')
+ save(boot_compLasso, file=paste0(dir, '/soils_binary_ph_boot_compLasso.RData'))
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/data_application/code_applications/code_bin/BMI_stab_application_binary.R b/data_application/code_applications/code_bin/BMI_stab_application_binary.R
new file mode 100644
index 0000000..7f2560c
--- /dev/null
+++ b/data_application/code_applications/code_bin/BMI_stab_application_binary.R
@@ -0,0 +1,99 @@
+#########################################################################################################
+### This is to estimate stablity & MSE using bootstrap on BMI_Lin_2014 dataset #################
+#########################################################################################################
+
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+source('../../../code_method/cv_method_binary_update.R')
+source('../../../code_method/getStability.R')
+source('../../../code_method/stab_data_applications_binary.R') # boot_stab()
+source('../../../code_method/bootstrap_test_compLasso_rf_binary.R') # boot_stab_data()
+
+#####################################
+##### data preparation ##############
+#####################################
+count <- as.matrix(read.table("../../code_Lin/cvs/data/combo_count_tab.txt"))
+
+# filter 1% + add pesudo count
+depth <- sapply(strsplit(colnames(count), "\\."), length)
+x <- count[, depth == 6 & colMeans(count > 0) >= 0.01]
+x[x == 0] <- 0.5
+x <- x/rowSums(x) # relative abundance
+taxa <- log(x)
+print(paste('number of features:', dim(taxa)[2], sep=':'))
+
+# metadata
+demo <- read.delim("../../code_Lin/cvs/data/demographic.txt")
+y <- demo$bmi[match(rownames(count), demo$pid)]
+y <- as.factor(ifelse(y >= median(y), 1, 0)) # transform to be binary
+print(paste('number of samples:', length(y), sep=':'))
+
+# save processed data
+save(y, taxa, file='../../BMI/BMI_Lin_2014_binary.RData')
+
+#------------------------
+# compare methods
+#------------------------
+i <- as.numeric(Sys.getenv("PBS_ARRAYID"))
+
+if (i == 1){
+ # ####################################
+ # ## Generalized compositional lasso #############
+ # ####################################
+ print('GencompLasso')
+ out_GenCompLasso = boot_stab(num_boot = 100, method = 'GenCompLasso',
+ dat_file = '../../BMI/BMI_Lin_2014_binary.RData')
+
+ save(out_GenCompLasso, file=paste0(dir, '/BMI_binary_GenCompLasso.RData'))
+}else if (i == 2){
+
+ ##############################################################
+ ## Lasso with glmnet default lambda sequence #############
+ ##############################################################
+ print('Lasso')
+ out_lasso = boot_stab(num_boot = 100, method = 'lasso', lambda.grid = NULL,
+ dat_file = '../../BMI/BMI_Lin_2014_binary.RData')
+
+ save(out_lasso, file=paste0(dir, '/BMI_binary_lasso.RData', sep=''))
+}else if (i == 3){
+ # ##############################################################
+ # ## Elastic Net with self defined lambda (NULL not allowed) #############
+ # ##############################################################
+ print('Elnet')
+ out_elnet = boot_stab(num_boot = 100, method = 'elnet',
+ dat_file = '../../BMI/BMI_Lin_2014_binary.RData')
+
+ save(out_elnet, file=paste0(dir, '/BMI_binary_elnet.RData', sep=''))
+}else if (i == 4){
+ # ##############################################################
+ # ## Random Forest with Altman feature selection #############
+ # ##############################################################
+ print('RF')
+ out_rf = boot_stab(num_boot = 100, method = 'RF', method.perm='altmann',
+ dat_file = '../../BMI/BMI_Lin_2014_binary.RData')
+
+ save(out_rf, file=paste0(dir, '/BMI_binary_rf.RData', sep=''))
+}else if (i == 5){
+ ########################################################################
+ ############### double bootstrap: random forest ##################
+ # ######################################################################
+ print('double boot RF')
+ boot_rf = boot_stab_data(num_boot=100, method = 'RF',
+ data_file='../../BMI/BMI_Lin_2014_binary.RData')
+ save(boot_rf, file=paste0(dir, '/BMI_binary_boot_rf.RData'))
+}else if (i == 6){
+ ########################################################################
+ ############### double bootstrap: generalized compositional lasso ######
+ # ######################################################################
+ print('double boot compLasso')
+ boot_compLasso = boot_stab_data(num_boot=100, method = 'GenCompLasso',
+ data_file='../../BMI/BMI_Lin_2014_binary.RData')
+ save(boot_compLasso, file=paste0(dir, '/BMI_binary_boot_compLasso.RData'))
+}
+
+
+
+
+
diff --git a/data_application/code_applications/code_cts/88soils_stab_application.R b/data_application/code_applications/code_cts/88soils_stab_application.R
new file mode 100755
index 0000000..d1379bb
--- /dev/null
+++ b/data_application/code_applications/code_cts/88soils_stab_application.R
@@ -0,0 +1,102 @@
+#########################################################################################################
+### This is to estimate stablity & MSE using bootstrap on BMI_Lin_2014 dataset #################
+#########################################################################################################
+
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+source('../../../code_method/cv_method.R')
+source('../../../code_method/getStability.R')
+source('../../../code_method/stab_data_applications.R')
+source('../../../code_method/bootstrap_test_compLasso_rf.R')
+
+#####################################
+##### data preparation ##############
+#####################################
+soil_otu = as.matrix(read.csv("../../88soils/88soils_genus_table.txt", sep='\t', row.names=1))
+save(soil_otu, file='../../88soils/88soils_genus_table.RData')
+
+load('../../88soils/88soils_genus_table.RData')
+count = soil_otu
+
+## filter 1% + add pesudo count
+x <- count[, colMeans(count > 0) >= 0.01]
+x[x == 0] <- 0.5
+x <- x/rowSums(x) # relative abundance
+taxa <- log(x)
+print(paste('number of features:', dim(taxa)[2], sep=':'))
+
+# # metadata
+mf <- read.csv("../../88soils/88soils_modified_metadata.txt", sep='\t', row.names=1)
+y <- mf$ph[match(rownames(count), rownames(mf))]
+print(paste('number of samples:', length(y), sep=':'))
+
+# save processed data
+save(y, taxa, file='../../88soils/soil_ph.RData')
+
+
+
+####################################
+## compositional lasso #############
+# ####################################
+print('compLasso')
+out_compLasso = boot_stab(num_boot = 100, method = 'compLasso',
+ dat_file = '../data_application/88soils/soils_ph.RData')
+
+save(out_compLasso, file=paste0(dir, '/soils_ph_compLasso.RData'))
+
+#############################################################
+# Lasso with glmnet default lambda sequence #############
+#############################################################
+print('Lasso')
+out_lasso = boot_stab(num_boot = 100, method = 'lasso', lambda.grid = NULL,
+ dat_file = '../data_application/88soils/soils_ph.RData')
+
+save(out_lasso, file=paste0(dir, '/soils_ph_lasso.RData', sep=''))
+
+
+# ##############################################################
+# ## Elastic Net with self defined lambda (NULL not allowed) #############
+# ##############################################################
+print('Elnet')
+out_elnet = boot_stab(num_boot = 100, method = 'elnet',
+ dat_file = '../../88soils/soils_ph.RData')
+
+save(out_elnet, file=paste0(dir, '/soils_ph_elnet.RData', sep=''))
+
+
+##############################################################
+## Random Forest with Altman feature selection #############
+##############################################################
+print('RF')
+out_rf = boot_stab(num_boot = 100, method = 'RF', method.perm='altmann', mtry.grid=seq(30, 60, 5),
+ dat_file = '../../88soils/soils_ph.RData')
+
+save(out_rf, file=paste0(dir, '/soils_ph_rf.RData', sep=''))
+
+########################################################################
+############### double bootstrap: random forest ##################
+# ######################################################################
+print('double boot RF')
+boot_rf = boot_stab_data(num_boot=100, method = 'RF',
+ data_file='../../88soils/soils_ph.RData')
+save(boot_rf, file=paste0(dir, '/soils_ph_boot_rf.RData'))
+
+########################################################################
+############### double bootstrap: compositional lasso ##################
+# ######################################################################
+print('double boot compLasso')
+boot_compLasso = boot_stab_data(num_boot=100, method = 'compLasso',
+ data_file='../../88soils/soils_ph.RData')
+save(boot_compLasso, file=paste0(dir, '/soils_ph_boot_compLasso.RData'))
+
+
+
+
+
+
+
+
+
+
diff --git a/data_application/code_applications/code_cts/BMI_stab_application.R b/data_application/code_applications/code_cts/BMI_stab_application.R
new file mode 100755
index 0000000..5514c31
--- /dev/null
+++ b/data_application/code_applications/code_cts/BMI_stab_application.R
@@ -0,0 +1,105 @@
+#########################################################################################################
+### This is to estimate stablity & MSE using bootstrap on BMI_Lin_2014 dataset #################
+#########################################################################################################
+
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+source('../../../code_method/cv_method.R')
+source('../../../code_method/getStability.R')
+source('../../../code_method/stab_data_applications.R')
+source('../../../code_method/bootstrap_test_compLasso_rf.R')
+
+#####################################
+##### data preparation ##############
+#####################################
+count <- as.matrix(read.table("../../code_Lin/cvs/data/combo_count_tab.txt"))
+
+# filter 1% + add pesudo count
+depth <- sapply(strsplit(colnames(count), "\\."), length)
+x <- count[, depth == 6 & colMeans(count > 0) >= 0.01]
+x[x == 0] <- 0.5
+x <- x/rowSums(x) # relative abundance
+taxa <- log(x)
+print(paste('number of features:', dim(taxa)[2], sep=':'))
+
+# metadata
+demo <- read.delim("../../code_Lin/cvs/data/demographic.txt")
+y <- demo$bmi[match(rownames(count), demo$pid)]
+print(paste('number of samples:', length(y), sep=':'))
+
+# save processed data
+save(y, taxa, file='../../BMI/BMI_Lin_2014.RData')
+
+# ####################################
+# ## compositional lasso #############
+# ####################################
+print('compLasso')
+out_compLasso = boot_stab(num_boot = 100, method = 'compLasso',
+ dat_file = '../../BMI/BMI_Lin_2014.RData')
+
+save(out_compLasso, file=paste0(dir, '/BMI_compLasso.RData'))
+
+##############################################################
+## Lasso with glmnet default lambda sequence #############
+##############################################################
+print('Lasso')
+out_lasso = boot_stab(num_boot = 100, method = 'lasso', lambda.grid = NULL,
+ dat_file = '../../BMI/BMI_Lin_2014.RData')
+
+save(out_lasso, file=paste0(dir, '/BMI_lasso.RData', sep=''))
+
+
+# ##############################################################
+# ## Elastic Net with self defined lambda (NULL not allowed) #############
+# ##############################################################
+print('Elnet')
+out_elnet = boot_stab(num_boot = 100, method = 'elnet',
+ dat_file = '../../BMI/BMI_Lin_2014.RData')
+
+save(out_elnet, file=paste0(dir, '/BMI_elnet.RData', sep=''))
+
+
+# ##############################################################
+# ## Random Forest with Altman feature selection #############
+# ##############################################################
+print('RF')
+out_rf = boot_stab(num_boot = 100, method = 'RF', method.perm='altmann',
+ dat_file = '../../BMI/BMI_Lin_2014.RData')
+
+save(out_rf, file=paste0(dir, '/BMI_rf.RData', sep=''))
+
+
+#############################################################
+# Random Forest with Janitza feature selection #############
+#############################################################
+print("RF_JNT")
+dir = '../data_application'
+out_rf_jnt= boot_stab(num_boot = 100, method = 'RF', method.perm='janitza',
+ dat_file = '../data_application/BMI_Lin_2014.RData')
+
+save(out_rf_jnt, file=paste0(dir, '/BMI_rf_jnt.RData', sep=''))
+
+
+########################################################################
+############### double bootstrap: random forest ##################
+# ######################################################################
+print('double boot RF')
+boot_rf = boot_stab_data(num_boot=100, method = 'RF',
+ data_file='../data_application/BMI_Lin_2014.RData')
+save(boot_rf, file=paste0(dir, '/BMI_boot_rf.RData'))
+
+########################################################################
+############### double bootstrap: compositional lasso ##################
+# ######################################################################
+print('double boot compLasso')
+boot_compLasso = boot_stab_data(num_boot=100, method = 'compLasso',
+ data_file='../data_application/BMI_Lin_2014.RData')
+save(boot_compLasso, file=paste0(dir, '/BMI_boot_compLasso.RData'))
+
+
+
+
+
+
diff --git a/data_application/code_applications/code_cts/run_data_applications.sh b/data_application/code_applications/code_cts/run_data_applications.sh
new file mode 100755
index 0000000..e578605
--- /dev/null
+++ b/data_application/code_applications/code_cts/run_data_applications.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+#PBS -N data_applications
+#PBS -l walltime=50:00:00
+#PBS -l nodes=1:ppn=8
+#PBS -l mem=10gb
+#PBS -V
+#PBS -j oe
+#PBS -d .
+
+set -e
+cpus=$PBS_NUM_PPN
+
+export TMPDIR=/panfs/panfs1.ucsd.edu/panscratch/$USER/Stability_2020
+[ ! -d $TMPDIR ] && mkdir $TMPDIR
+export TMPDIR=$TMPDIR/data_applications
+[ ! -d $TMPDIR ] && mkdir $TMPDIR
+#tmp=$(mktemp -d --tmpdir)
+#export TMPDIR=$tmp
+#trap "rm -r $tmp; unset TMPDIR" EXIT
+
+# do something
+source activate r-c-env
+Rscript BMI_stab_application.R $TMPDIR
+Rscript 88soils_stab_application.R $TMPDIR
+source deactivate r-c-env
+
+#mv $tmp/outdir ./outdir
diff --git a/data_application/notebooks_application/.DS_Store b/data_application/notebooks_application/.DS_Store
deleted file mode 100755
index 5008ddf..0000000
Binary files a/data_application/notebooks_application/.DS_Store and /dev/null differ
diff --git a/data_application/notebooks_application/.ipynb_checkpoints/1.1. BMI_DataPreparation-checkpoint.ipynb b/data_application/notebooks_application/.ipynb_checkpoints/1.1. BMI_DataPreparation-checkpoint.ipynb
new file mode 100755
index 0000000..2a6fe2d
--- /dev/null
+++ b/data_application/notebooks_application/.ipynb_checkpoints/1.1. BMI_DataPreparation-checkpoint.ipynb
@@ -0,0 +1,875 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### BMI dataset in Lin et al, 2014 \n",
+ "##### use unrarifed count table and retain microbes only at genus level (+ present at least one sample)\n",
+ "#### add 0.5 as pseudo count to zero counts"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\t- 98
\n",
+ "\t- 263
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 98\n",
+ "\\item 263\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 98\n",
+ "2. 263\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 98 263"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | Bacteria.Actinobacteria | Bacteria.Bacteroidetes | Bacteria.Cyanobacteria | Bacteria.Firmicutes | Bacteria.Fusobacteria | Bacteria.Lentisphaerae | Bacteria.OD1 | Bacteria.Proteobacteria | Bacteria.Spirochaetes | Bacteria.Synergistetes | ... | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Moraxellaceae.Enhydrobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Vibrionales.Vibrionaceae.Vibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Lysobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Brachyspiraceae.Brachyspira | Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Spirochaetaceae.Treponema | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |
\n",
+ "\n",
+ "\t3001 | 1 | 5067 | 0 | 4153 | 0 | 0 | 0 | 534 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3003 | 0 | 4659 | 0 | 2177 | 0 | 0 | 0 | 105 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3004 | 0 | 4342 | 0 | 3008 | 0 | 2 | 0 | 134 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3006 | 8 | 2910 | 0 | 4147 | 0 | 0 | 0 | 459 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3007 | 5 | 5630 | 0 | 4705 | 0 | 0 | 0 | 214 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3008 | 5 | 1868 | 0 | 1619 | 0 | 0 | 0 | 17 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & Bacteria.Actinobacteria & Bacteria.Bacteroidetes & Bacteria.Cyanobacteria & Bacteria.Firmicutes & Bacteria.Fusobacteria & Bacteria.Lentisphaerae & Bacteria.OD1 & Bacteria.Proteobacteria & Bacteria.Spirochaetes & Bacteria.Synergistetes & ... & Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Moraxellaceae.Enhydrobacter & Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas & Bacteria.Proteobacteria.Gammaproteobacteria.Vibrionales.Vibrionaceae.Vibrio & Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Lysobacter & Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas & Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Brachyspiraceae.Brachyspira & Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Spirochaetaceae.Treponema & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter & Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\\\\\n",
+ "\\hline\n",
+ "\t3001 & 1 & 5067 & 0 & 4153 & 0 & 0 & 0 & 534 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3003 & 0 & 4659 & 0 & 2177 & 0 & 0 & 0 & 105 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3004 & 0 & 4342 & 0 & 3008 & 0 & 2 & 0 & 134 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3006 & 8 & 2910 & 0 & 4147 & 0 & 0 & 0 & 459 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3007 & 5 & 5630 & 0 & 4705 & 0 & 0 & 0 & 214 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3008 & 5 & 1868 & 0 & 1619 & 0 & 0 & 0 & 17 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | Bacteria.Actinobacteria | Bacteria.Bacteroidetes | Bacteria.Cyanobacteria | Bacteria.Firmicutes | Bacteria.Fusobacteria | Bacteria.Lentisphaerae | Bacteria.OD1 | Bacteria.Proteobacteria | Bacteria.Spirochaetes | Bacteria.Synergistetes | ... | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Moraxellaceae.Enhydrobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Vibrionales.Vibrionaceae.Vibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Lysobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Brachyspiraceae.Brachyspira | Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Spirochaetaceae.Treponema | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 3001 | 1 | 5067 | 0 | 4153 | 0 | 0 | 0 | 534 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3003 | 0 | 4659 | 0 | 2177 | 0 | 0 | 0 | 105 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3004 | 0 | 4342 | 0 | 3008 | 0 | 2 | 0 | 134 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3006 | 8 | 2910 | 0 | 4147 | 0 | 0 | 0 | 459 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3007 | 5 | 5630 | 0 | 4705 | 0 | 0 | 0 | 214 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3008 | 5 | 1868 | 0 | 1619 | 0 | 0 | 0 | 17 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " Bacteria.Actinobacteria Bacteria.Bacteroidetes Bacteria.Cyanobacteria\n",
+ "3001 1 5067 0 \n",
+ "3003 0 4659 0 \n",
+ "3004 0 4342 0 \n",
+ "3006 8 2910 0 \n",
+ "3007 5 5630 0 \n",
+ "3008 5 1868 0 \n",
+ " Bacteria.Firmicutes Bacteria.Fusobacteria Bacteria.Lentisphaerae\n",
+ "3001 4153 0 0 \n",
+ "3003 2177 0 0 \n",
+ "3004 3008 0 2 \n",
+ "3006 4147 0 0 \n",
+ "3007 4705 0 0 \n",
+ "3008 1619 0 0 \n",
+ " Bacteria.OD1 Bacteria.Proteobacteria Bacteria.Spirochaetes\n",
+ "3001 0 534 0 \n",
+ "3003 0 105 0 \n",
+ "3004 0 134 0 \n",
+ "3006 0 459 0 \n",
+ "3007 0 214 0 \n",
+ "3008 0 17 0 \n",
+ " Bacteria.Synergistetes ...\n",
+ "3001 0 ...\n",
+ "3003 0 ...\n",
+ "3004 0 ...\n",
+ "3006 0 ...\n",
+ "3007 0 ...\n",
+ "3008 0 ...\n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Moraxellaceae.Enhydrobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Vibrionales.Vibrionaceae.Vibrio\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Lysobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Brachyspiraceae.Brachyspira\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Spirochaetaceae.Treponema\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# unrarefied microbe count table\n",
+ "count <- as.matrix(read.table(\"../../code_Lin/cvs/data/combo_count_tab.txt\")) \n",
+ "dim(count)\n",
+ "head(count)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 98
\n",
+ "\t- 87
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 98\n",
+ "\\item 87\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 98\n",
+ "2. 87\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 98 87"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas | ... | Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter | Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria | Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio | Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |
\n",
+ "\n",
+ "\t3001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2878 | 0 | 69 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3003 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 578 | 180 | 0 | ... | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3004 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3503 | 143 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3006 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2162 | 1 | 0 | ... | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3007 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 4453 | 0 | 0 | ... | 9 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3008 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 950 | 237 | 19 | ... | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas & ... & Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter & Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria & Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio & Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter & Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio & Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas & Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter & Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\\\\\n",
+ "\\hline\n",
+ "\t3001 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2878 & 0 & 69 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3003 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 578 & 180 & 0 & ... & 2 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3004 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3503 & 143 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3006 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2162 & 1 & 0 & ... & 3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3007 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 4453 & 0 & 0 & ... & 9 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3008 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 950 & 237 & 19 & ... & 4 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas | ... | Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter | Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria | Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio | Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 3001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2878 | 0 | 69 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3003 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 578 | 180 | 0 | ... | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3004 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3503 | 143 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3006 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2162 | 1 | 0 | ... | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3007 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 4453 | 0 | 0 | ... | 9 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3008 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 950 | 237 | 19 | ... | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 2 \n",
+ "3008 1 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides\n",
+ "3001 2878 \n",
+ "3003 578 \n",
+ "3004 3503 \n",
+ "3006 2162 \n",
+ "3007 4453 \n",
+ "3008 950 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella\n",
+ "3001 0 \n",
+ "3003 180 \n",
+ "3004 143 \n",
+ "3006 1 \n",
+ "3007 0 \n",
+ "3008 237 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas\n",
+ "3001 69 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 19 \n",
+ " ...\n",
+ "3001 ...\n",
+ "3003 ...\n",
+ "3004 ...\n",
+ "3006 ...\n",
+ "3007 ...\n",
+ "3008 ...\n",
+ " Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter\n",
+ "3001 0 \n",
+ "3003 2 \n",
+ "3004 0 \n",
+ "3006 3 \n",
+ "3007 9 \n",
+ "3008 4 \n",
+ " Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter\n",
+ "3001 0 \n",
+ "3003 1 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 1 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# retain only microbes at genus level and exist at least one sample\n",
+ "depth <- sapply(strsplit(colnames(count), \"\\\\.\"), length)\n",
+ "x <- count[, depth == 6 & colSums(count != 0) >= 1] # 98 * 87\n",
+ "dim(x)\n",
+ "head(x)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 98
\n",
+ "\t- 87
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 98\n",
+ "\\item 87\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 98\n",
+ "2. 87\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 98 87"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas | ... | Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter | Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria | Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio | Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |
\n",
+ "\n",
+ "\t3001 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -0.8429202 | -9.500918 | -4.573665 | ... | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 |
\n",
+ "\t3003 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -2.2258386 | -3.392456 | -9.278560 | ... | -7.892265 | -9.278560 | -9.278560 | -8.585412 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 |
\n",
+ "\t3004 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -0.4151243 | -3.613655 | -9.269646 | ... | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 |
\n",
+ "\t3006 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -0.7426639 | -8.421453 | -9.114600 | ... | -7.322841 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 |
\n",
+ "\t3007 | -9.643356 | -9.643356 | -9.643356 | -8.257061 | -9.643356 | -9.643356 | -9.643356 | -0.5488753 | -9.643356 | -9.643356 | ... | -6.752984 | -9.643356 | -9.643356 | -8.950209 | -9.643356 | -9.643356 | -9.643356 | -9.643356 | -9.643356 | -9.643356 |
\n",
+ "\t3008 | -8.387085 | -8.387085 | -8.387085 | -7.693937 | -8.387085 | -8.387085 | -8.387085 | -0.8374753 | -2.225877 | -4.749498 | ... | -6.307643 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas & ... & Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter & Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria & Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio & Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter & Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio & Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas & Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter & Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\\\\\n",
+ "\\hline\n",
+ "\t3001 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -0.8429202 & -9.500918 & -4.573665 & ... & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 \\\\\n",
+ "\t3003 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -2.2258386 & -3.392456 & -9.278560 & ... & -7.892265 & -9.278560 & -9.278560 & -8.585412 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 \\\\\n",
+ "\t3004 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -0.4151243 & -3.613655 & -9.269646 & ... & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 \\\\\n",
+ "\t3006 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -0.7426639 & -8.421453 & -9.114600 & ... & -7.322841 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 \\\\\n",
+ "\t3007 & -9.643356 & -9.643356 & -9.643356 & -8.257061 & -9.643356 & -9.643356 & -9.643356 & -0.5488753 & -9.643356 & -9.643356 & ... & -6.752984 & -9.643356 & -9.643356 & -8.950209 & -9.643356 & -9.643356 & -9.643356 & -9.643356 & -9.643356 & -9.643356 \\\\\n",
+ "\t3008 & -8.387085 & -8.387085 & -8.387085 & -7.693937 & -8.387085 & -8.387085 & -8.387085 & -0.8374753 & -2.225877 & -4.749498 & ... & -6.307643 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas | ... | Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter | Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria | Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio | Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 3001 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -0.8429202 | -9.500918 | -4.573665 | ... | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 |\n",
+ "| 3003 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -2.2258386 | -3.392456 | -9.278560 | ... | -7.892265 | -9.278560 | -9.278560 | -8.585412 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 |\n",
+ "| 3004 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -0.4151243 | -3.613655 | -9.269646 | ... | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 |\n",
+ "| 3006 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -0.7426639 | -8.421453 | -9.114600 | ... | -7.322841 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 |\n",
+ "| 3007 | -9.643356 | -9.643356 | -9.643356 | -8.257061 | -9.643356 | -9.643356 | -9.643356 | -0.5488753 | -9.643356 | -9.643356 | ... | -6.752984 | -9.643356 | -9.643356 | -8.950209 | -9.643356 | -9.643356 | -9.643356 | -9.643356 | -9.643356 | -9.643356 |\n",
+ "| 3008 | -8.387085 | -8.387085 | -8.387085 | -7.693937 | -8.387085 | -8.387085 | -8.387085 | -0.8374753 | -2.225877 | -4.749498 | ... | -6.307643 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -8.257061 \n",
+ "3008 -7.693937 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides\n",
+ "3001 -0.8429202 \n",
+ "3003 -2.2258386 \n",
+ "3004 -0.4151243 \n",
+ "3006 -0.7426639 \n",
+ "3007 -0.5488753 \n",
+ "3008 -0.8374753 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella\n",
+ "3001 -9.500918 \n",
+ "3003 -3.392456 \n",
+ "3004 -3.613655 \n",
+ "3006 -8.421453 \n",
+ "3007 -9.643356 \n",
+ "3008 -2.225877 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas\n",
+ "3001 -4.573665 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -4.749498 \n",
+ " ...\n",
+ "3001 ...\n",
+ "3003 ...\n",
+ "3004 ...\n",
+ "3006 ...\n",
+ "3007 ...\n",
+ "3008 ...\n",
+ " Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter\n",
+ "3001 -9.500918 \n",
+ "3003 -7.892265 \n",
+ "3004 -9.269646 \n",
+ "3006 -7.322841 \n",
+ "3007 -6.752984 \n",
+ "3008 -6.307643 \n",
+ " Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter\n",
+ "3001 -9.500918 \n",
+ "3003 -8.585412 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -8.950209 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# add pesudo count 0.5\n",
+ "x[x == 0] <- 0.5\n",
+ "x <- x/rowSums(x) # relative abundance\n",
+ "taxa <- log(x)\n",
+ "dim(taxa)\n",
+ "head(taxa)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "pid | visitdate | birthdate | sex1m2f | heightcm | weightkg | vdate | bdate | age | heightm | bmi | zbmius | zbmicatus | bmicat1norm2ow3ob |
\n",
+ "\n",
+ "\t3029 | 12-Apr-10 | 1-May-86 | 1 | 172.43 | 83.0 | 18364 | 9617 | 23.94798 | 1.7243 | 27.91595 | NA | | 2 |
\n",
+ "\t3030 | 12-Apr-10 | 22-May-87 | 1 | 178.87 | 70.3 | 18364 | 10003 | 22.89117 | 1.7887 | 21.97254 | NA | | 1 |
\n",
+ "\t3031 | 20-Apr-10 | 1-Dec-82 | 2 | 157.60 | 52.0 | 18372 | 8370 | 27.38398 | 1.5760 | 20.93586 | NA | | 1 |
\n",
+ "\t3032 | 22-Apr-10 | 9-Feb-86 | 1 | 188.10 | 89.6 | 18374 | 9536 | 24.19712 | 1.8810 | 25.32389 | NA | | 2 |
\n",
+ "\t3033 | 22-Apr-10 | 9-Apr-86 | 2 | 170.03 | 65.2 | 18374 | 9595 | 24.03559 | 1.7003 | 22.55259 | NA | | 1 |
\n",
+ "\t3034 | 28-Apr-10 | 6-Feb-86 | 2 | 162.16 | 59.9 | 18380 | 9533 | 24.22177 | 1.6216 | 22.77925 | NA | | 1 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|llllllllllllll}\n",
+ " pid & visitdate & birthdate & sex1m2f & heightcm & weightkg & vdate & bdate & age & heightm & bmi & zbmius & zbmicatus & bmicat1norm2ow3ob\\\\\n",
+ "\\hline\n",
+ "\t 3029 & 12-Apr-10 & 1-May-86 & 1 & 172.43 & 83.0 & 18364 & 9617 & 23.94798 & 1.7243 & 27.91595 & NA & & 2 \\\\\n",
+ "\t 3030 & 12-Apr-10 & 22-May-87 & 1 & 178.87 & 70.3 & 18364 & 10003 & 22.89117 & 1.7887 & 21.97254 & NA & & 1 \\\\\n",
+ "\t 3031 & 20-Apr-10 & 1-Dec-82 & 2 & 157.60 & 52.0 & 18372 & 8370 & 27.38398 & 1.5760 & 20.93586 & NA & & 1 \\\\\n",
+ "\t 3032 & 22-Apr-10 & 9-Feb-86 & 1 & 188.10 & 89.6 & 18374 & 9536 & 24.19712 & 1.8810 & 25.32389 & NA & & 2 \\\\\n",
+ "\t 3033 & 22-Apr-10 & 9-Apr-86 & 2 & 170.03 & 65.2 & 18374 & 9595 & 24.03559 & 1.7003 & 22.55259 & NA & & 1 \\\\\n",
+ "\t 3034 & 28-Apr-10 & 6-Feb-86 & 2 & 162.16 & 59.9 & 18380 & 9533 & 24.22177 & 1.6216 & 22.77925 & NA & & 1 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| pid | visitdate | birthdate | sex1m2f | heightcm | weightkg | vdate | bdate | age | heightm | bmi | zbmius | zbmicatus | bmicat1norm2ow3ob |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 3029 | 12-Apr-10 | 1-May-86 | 1 | 172.43 | 83.0 | 18364 | 9617 | 23.94798 | 1.7243 | 27.91595 | NA | | 2 |\n",
+ "| 3030 | 12-Apr-10 | 22-May-87 | 1 | 178.87 | 70.3 | 18364 | 10003 | 22.89117 | 1.7887 | 21.97254 | NA | | 1 |\n",
+ "| 3031 | 20-Apr-10 | 1-Dec-82 | 2 | 157.60 | 52.0 | 18372 | 8370 | 27.38398 | 1.5760 | 20.93586 | NA | | 1 |\n",
+ "| 3032 | 22-Apr-10 | 9-Feb-86 | 1 | 188.10 | 89.6 | 18374 | 9536 | 24.19712 | 1.8810 | 25.32389 | NA | | 2 |\n",
+ "| 3033 | 22-Apr-10 | 9-Apr-86 | 2 | 170.03 | 65.2 | 18374 | 9595 | 24.03559 | 1.7003 | 22.55259 | NA | | 1 |\n",
+ "| 3034 | 28-Apr-10 | 6-Feb-86 | 2 | 162.16 | 59.9 | 18380 | 9533 | 24.22177 | 1.6216 | 22.77925 | NA | | 1 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " pid visitdate birthdate sex1m2f heightcm weightkg vdate bdate age \n",
+ "1 3029 12-Apr-10 1-May-86 1 172.43 83.0 18364 9617 23.94798\n",
+ "2 3030 12-Apr-10 22-May-87 1 178.87 70.3 18364 10003 22.89117\n",
+ "3 3031 20-Apr-10 1-Dec-82 2 157.60 52.0 18372 8370 27.38398\n",
+ "4 3032 22-Apr-10 9-Feb-86 1 188.10 89.6 18374 9536 24.19712\n",
+ "5 3033 22-Apr-10 9-Apr-86 2 170.03 65.2 18374 9595 24.03559\n",
+ "6 3034 28-Apr-10 6-Feb-86 2 162.16 59.9 18380 9533 24.22177\n",
+ " heightm bmi zbmius zbmicatus bmicat1norm2ow3ob\n",
+ "1 1.7243 27.91595 NA 2 \n",
+ "2 1.7887 21.97254 NA 1 \n",
+ "3 1.5760 20.93586 NA 1 \n",
+ "4 1.8810 25.32389 NA 2 \n",
+ "5 1.7003 22.55259 NA 1 \n",
+ "6 1.6216 22.77925 NA 1 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "98"
+ ],
+ "text/latex": [
+ "98"
+ ],
+ "text/markdown": [
+ "98"
+ ],
+ "text/plain": [
+ "[1] 98"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 21.6186
\n",
+ "\t- 21.82244
\n",
+ "\t- 20.03762
\n",
+ "\t- 20.82412
\n",
+ "\t- 22.66875
\n",
+ "\t- 24.97552
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 21.6186\n",
+ "\\item 21.82244\n",
+ "\\item 20.03762\n",
+ "\\item 20.82412\n",
+ "\\item 22.66875\n",
+ "\\item 24.97552\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 21.6186\n",
+ "2. 21.82244\n",
+ "3. 20.03762\n",
+ "4. 20.82412\n",
+ "5. 22.66875\n",
+ "6. 24.97552\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 21.61860 21.82244 20.03762 20.82412 22.66875 24.97552"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# metadata\n",
+ "demo <- read.delim(\"../../code_Lin/cvs/data/demographic.txt\")\n",
+ "head(demo)\n",
+ "y <- demo$bmi[match(rownames(count), demo$pid)]\n",
+ "length(y)\n",
+ "head(y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "'matrix'"
+ ],
+ "text/latex": [
+ "'matrix'"
+ ],
+ "text/markdown": [
+ "'matrix'"
+ ],
+ "text/plain": [
+ "[1] \"matrix\""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "'numeric'"
+ ],
+ "text/latex": [
+ "'numeric'"
+ ],
+ "text/markdown": [
+ "'numeric'"
+ ],
+ "text/plain": [
+ "[1] \"numeric\""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# check datatype\n",
+ "class(taxa); class(y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# save processed data\n",
+ "save(y, taxa, file='../BMI/BMI_Lin_2014.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data_application/notebooks_application/.ipynb_checkpoints/1.2 BMI_results_cts-checkpoint.ipynb b/data_application/notebooks_application/.ipynb_checkpoints/1.2 BMI_results_cts-checkpoint.ipynb
new file mode 100755
index 0000000..3d0c385
--- /dev/null
+++ b/data_application/notebooks_application/.ipynb_checkpoints/1.2 BMI_results_cts-checkpoint.ipynb
@@ -0,0 +1,402 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### BMI microbiome data application results for continuous outcome"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### method comparisons"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../BMI/results_cts/BMI_compLasso.RData')\n",
+ "load('../BMI/results_cts/BMI_elnet.RData')\n",
+ "load('../BMI/results_cts/BMI_lasso.RData')\n",
+ "load('../BMI/results_cts/BMI_rf.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.22
\n",
+ "\t- 21.59
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.22\n",
+ "\\item 21.59\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.22\n",
+ "2. 21.59\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.22 21.59"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_compLasso$stab_index, out_compLasso$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.14
\n",
+ "\t- 24.07
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.14\n",
+ "\\item 24.07\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.14\n",
+ "2. 24.07\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.14 24.07"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_lasso$stab_index, out_lasso$MSE_mean) "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.23
\n",
+ "\t- 25.33
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.23\n",
+ "\\item 25.33\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.23\n",
+ "2. 25.33\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.23 25.33"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_elnet$stab_index, out_elnet$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.02
\n",
+ "\t- 4.99
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.02\n",
+ "\\item 4.99\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.02\n",
+ "2. 4.99\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.02 4.99"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_rf$stab_index, out_rf$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "dataset | method | mse | stability |
\n",
+ "\n",
+ "\tbmi_gut | lasso | 24.07 | 0.14 |
\n",
+ "\tbmi_gut | elent | 25.33 | 0.23 |
\n",
+ "\tbmi_gut | rf | 4.99 | 0.02 |
\n",
+ "\tbmi_gut | compLasso | 21.59 | 0.22 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|llll}\n",
+ " dataset & method & mse & stability\\\\\n",
+ "\\hline\n",
+ "\t bmi\\_gut & lasso & 24.07 & 0.14 \\\\\n",
+ "\t bmi\\_gut & elent & 25.33 & 0.23 \\\\\n",
+ "\t bmi\\_gut & rf & 4.99 & 0.02 \\\\\n",
+ "\t bmi\\_gut & compLasso & 21.59 & 0.22 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| dataset | method | mse | stability |\n",
+ "|---|---|---|---|\n",
+ "| bmi_gut | lasso | 24.07 | 0.14 |\n",
+ "| bmi_gut | elent | 25.33 | 0.23 |\n",
+ "| bmi_gut | rf | 4.99 | 0.02 |\n",
+ "| bmi_gut | compLasso | 21.59 | 0.22 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " dataset method mse stability\n",
+ "1 bmi_gut lasso 24.07 0.14 \n",
+ "2 bmi_gut elent 25.33 0.23 \n",
+ "3 bmi_gut rf 4.99 0.02 \n",
+ "4 bmi_gut compLasso 21.59 0.22 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# combine and export results\n",
+ "bmi_gut = as.data.frame(matrix(NA, nrow=4, ncol=4))\n",
+ "colnames(bmi_gut) = c('dataset', 'method', 'mse', 'stability')\n",
+ "bmi_gut$dataset = 'bmi_gut'\n",
+ "bmi_gut$method = c('lasso', 'elent', 'rf', 'compLasso')\n",
+ "bmi_gut$mse = c(out_lasso$MSE_mean, out_elnet$MSE_mean, out_rf$MSE_mean, out_compLasso$MSE_mean)\n",
+ "bmi_gut$stability = c(out_lasso$stab_index, out_elnet$stab_index, out_rf$stab_index, out_compLasso$stab_index)\n",
+ "bmi_gut"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### hypothesis testing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../BMI/results_cts/BMI_boot_compLasso.RData')\n",
+ "load('../BMI/results_cts/BMI_boot_rf.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.2659"
+ ],
+ "text/latex": [
+ "0.2659"
+ ],
+ "text/markdown": [
+ "0.2659"
+ ],
+ "text/plain": [
+ "[1] 0.2659"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 2.5%
\n",
+ "\t\t- 0.17475
\n",
+ "\t- 97.5%
\n",
+ "\t\t- 0.34
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] 0.17475\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.34\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": 0.1747597.5%\n",
+ ": 0.34\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "0.17475 0.34000 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_stab = (boot_compLasso$stab_index - boot_rf$stab_index)\n",
+ "mean(diff_stab)\n",
+ "quantile(diff_stab, probs = c(0.025, 0.975)) \n",
+ "# CI doesn't contain zero: compLasso is significantly more stable than RF"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "11.8026936858767"
+ ],
+ "text/latex": [
+ "11.8026936858767"
+ ],
+ "text/markdown": [
+ "11.8026936858767"
+ ],
+ "text/plain": [
+ "[1] 11.80269"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 2.5%
\n",
+ "\t\t- -2.09038719853993
\n",
+ "\t- 97.5%
\n",
+ "\t\t- 41.1831613135692
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] -2.09038719853993\n",
+ "\\item[97.5\\textbackslash{}\\%] 41.1831613135692\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": -2.0903871985399397.5%\n",
+ ": 41.1831613135692\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "-2.090387 41.183161 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_mse = (unlist(boot_compLasso$MSE_list) - unlist(boot_rf$MSE_list)) # use all 100*100 MSEs\n",
+ "mean(diff_mse)\n",
+ "quantile(diff_mse, probs = c(0.025, 0.975)) \n",
+ "# CI contain zero: compLasso is not significantly different from RF based on MSE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data_application/notebooks_application/.ipynb_checkpoints/2.1.Soils_biomTableConversion (Python)-checkpoint.ipynb b/data_application/notebooks_application/.ipynb_checkpoints/2.1.Soils_biomTableConversion (Python)-checkpoint.ipynb
new file mode 100755
index 0000000..16a40f2
--- /dev/null
+++ b/data_application/notebooks_application/.ipynb_checkpoints/2.1.Soils_biomTableConversion (Python)-checkpoint.ipynb
@@ -0,0 +1,1571 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# ref: the soil dataset used in balance tree\n",
+ "# https://msystems.asm.org/content/2/1/e00162-16"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "import biom\n",
+ "from biom.util import biom_open\n",
+ "import pandas as pd\n",
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def biom2pandas(file_biom, withTaxonomy=False, astype=int):\n",
+ " \"\"\" Converts a biom file into a Pandas.DataFrame\n",
+ " Parameters\n",
+ " ----------\n",
+ " file_biom : str\n",
+ " The path to the biom file.\n",
+ " withTaxonomy : bool\n",
+ " If TRUE, returns a second Pandas.Series with lineage information for\n",
+ " each feature, e.g. OTU or deblur-sequence. Default: FALSE\n",
+ " astype : type\n",
+ " datatype into each value of the biom table is casted. Default: int.\n",
+ " Use e.g. float if biom table contains relative abundances instead of\n",
+ " raw reads.\n",
+ " Returns\n",
+ " -------\n",
+ " A Pandas.DataFrame holding holding numerical values from the biom file.\n",
+ " If withTaxonomy is TRUE then a second Pandas.DataFrame is returned, holding\n",
+ " lineage information about each feature.\n",
+ " Raises\n",
+ " ------\n",
+ " IOError\n",
+ " If file_biom cannot be read.\n",
+ " ValueError\n",
+ " If withTaxonomy=TRUE but biom file does not hold taxonomy information.\n",
+ " \"\"\"\n",
+ " try:\n",
+ " table = biom.load_table(file_biom)\n",
+ " counts = pd.DataFrame(table.matrix_data.T.todense().astype(astype),\n",
+ " index=table.ids(axis='sample'),\n",
+ " columns=table.ids(axis='observation')).T\n",
+ " if withTaxonomy:\n",
+ " try:\n",
+ " md = table.metadata_to_dataframe('observation')\n",
+ " levels = [col\n",
+ " for col in md.columns\n",
+ " if col.startswith('taxonomy_')]\n",
+ " if levels == []:\n",
+ " raise ValueError(('No taxonomy information found in '\n",
+ " 'biom file.'))\n",
+ " else:\n",
+ " taxonomy = md.apply(lambda row:\n",
+ " \";\".join([row[l] for l in levels]),\n",
+ " axis=1)\n",
+ " return counts, taxonomy\n",
+ " except KeyError:\n",
+ " raise ValueError(('Biom file does not have any '\n",
+ " 'observation metadata!'))\n",
+ " else:\n",
+ " return counts\n",
+ " except IOError:\n",
+ " raise IOError('Cannot read file \"%s\"' % file_biom)\n",
+ "\n",
+ "\n",
+ "def pandas2biom(file_biom, table, taxonomy=None, err=sys.stderr):\n",
+ " \"\"\" Writes a Pandas.DataFrame into a biom file.\n",
+ " Parameters\n",
+ " ----------\n",
+ " file_biom: str\n",
+ " The filename of the BIOM file to be created.\n",
+ " table: a Pandas.DataFrame\n",
+ " The table that should be written as BIOM.\n",
+ " taxonomy : pandas.Series\n",
+ " Index is taxons corresponding to table, values are lineage strings like\n",
+ " 'k__Bacteria; p__Actinobacteria'\n",
+ " err : StringIO\n",
+ " Stream onto which errors / warnings should be printed.\n",
+ " Default is sys.stderr\n",
+ " Raises\n",
+ " ------\n",
+ " IOError\n",
+ " If file_biom cannot be written.\n",
+ " TODO\n",
+ " ----\n",
+ " 1) also store taxonomy information\n",
+ " \"\"\"\n",
+ " try:\n",
+ " bt = biom.Table(table.values,\n",
+ " observation_ids=table.index,\n",
+ " sample_ids=table.columns)\n",
+ "\n",
+ " # add taxonomy metadata if provided, i.e. is not None\n",
+ " if taxonomy is not None:\n",
+ " if not isinstance(taxonomy, pd.core.series.Series):\n",
+ " raise AttributeError('taxonomy must be a pandas.Series!')\n",
+ " idx_missing_intable = set(table.index) - set(taxonomy.index)\n",
+ " if len(idx_missing_intable) > 0:\n",
+ " err.write(('Warning: following %i taxa are not in the '\n",
+ " 'provided taxonomy:\\n%s\\n') % (\n",
+ " len(idx_missing_intable),\n",
+ " \", \".join(idx_missing_intable)))\n",
+ " missing = pd.Series(\n",
+ " index=idx_missing_intable,\n",
+ " name='taxonomy',\n",
+ " data='k__missing_lineage_information')\n",
+ " taxonomy = taxonomy.append(missing)\n",
+ " idx_missing_intaxonomy = set(taxonomy.index) - set(table.index)\n",
+ " if (len(idx_missing_intaxonomy) > 0) and err:\n",
+ " err.write(('Warning: following %i taxa are not in the '\n",
+ " 'provided count table, but in taxonomy:\\n%s\\n') % (\n",
+ " len(idx_missing_intaxonomy),\n",
+ " \", \".join(idx_missing_intaxonomy)))\n",
+ "\n",
+ " t = dict()\n",
+ " for taxon, linstr in taxonomy.iteritems():\n",
+ " # fill missing rank annotations with rank__\n",
+ " orig_lineage = {annot[0].lower(): annot\n",
+ " for annot\n",
+ " in (map(str.strip, linstr.split(';')))}\n",
+ " lineage = []\n",
+ " for rank in settings.RANKS:\n",
+ " rank_char = rank[0].lower()\n",
+ " if rank_char in orig_lineage:\n",
+ " lineage.append(orig_lineage[rank_char])\n",
+ " else:\n",
+ " lineage.append(rank_char+'__')\n",
+ " t[taxon] = {'taxonomy': \";\".join(lineage)}\n",
+ " bt.add_metadata(t, axis='observation')\n",
+ "\n",
+ " with biom_open(file_biom, 'w') as f:\n",
+ " bt.to_hdf5(f, \"example\")\n",
+ " except IOError:\n",
+ " raise IOError('Cannot write to file \"%s\"' % file_biom)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Balance_88soils"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(7396, 89)"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom = biom2pandas('../88soils/238_otu_table.biom')\n",
+ "soils_biom.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 103.CA2 | \n",
+ " 103.CO3 | \n",
+ " 103.SR3 | \n",
+ " 103.IE2 | \n",
+ " 103.BP1 | \n",
+ " 103.VC2 | \n",
+ " 103.SA2 | \n",
+ " 103.GB2 | \n",
+ " 103.CO2 | \n",
+ " 103.KP1 | \n",
+ " ... | \n",
+ " 103.LQ1 | \n",
+ " 103.HI1 | \n",
+ " 103.RT1 | \n",
+ " 103.HI2 | \n",
+ " 103.DF1 | \n",
+ " 103.CF3 | \n",
+ " 103.AR1 | \n",
+ " 103.TL1 | \n",
+ " 103.HI4 | \n",
+ " 103.BB1 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1124701 | \n",
+ " 15 | \n",
+ " 14 | \n",
+ " 1 | \n",
+ " 8 | \n",
+ " 13 | \n",
+ " 7 | \n",
+ " 6 | \n",
+ " 3 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 244336 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 973124 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
3 rows × 89 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 103.CA2 103.CO3 103.SR3 103.IE2 103.BP1 103.VC2 103.SA2 \\\n",
+ "1124701 15 14 1 8 13 7 6 \n",
+ "244336 0 0 0 1 0 0 0 \n",
+ "973124 0 0 0 0 0 0 0 \n",
+ "\n",
+ " 103.GB2 103.CO2 103.KP1 ... 103.LQ1 103.HI1 103.RT1 103.HI2 \\\n",
+ "1124701 3 2 2 ... 0 0 0 0 \n",
+ "244336 0 0 0 ... 0 0 0 0 \n",
+ "973124 0 0 1 ... 0 0 0 0 \n",
+ "\n",
+ " 103.DF1 103.CF3 103.AR1 103.TL1 103.HI4 103.BB1 \n",
+ "1124701 0 0 0 0 0 0 \n",
+ "244336 0 0 0 0 0 0 \n",
+ "973124 0 0 0 0 0 0 \n",
+ "\n",
+ "[3 rows x 89 columns]"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(7396, 1)"
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa = pd.read_csv('../88soils/88soils_taxonomy.txt', sep='\\t', index_col='Feature ID')\n",
+ "soils_taxa.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Taxon | \n",
+ "
\n",
+ " \n",
+ " Feature ID | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1000512 | \n",
+ " k__Bacteria;p__Actinobacteria;c__Thermoleophil... | \n",
+ "
\n",
+ " \n",
+ " 1000547 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... | \n",
+ "
\n",
+ " \n",
+ " 1000654 | \n",
+ " k__Bacteria;p__Bacteroidetes;c__Sphingobacteri... | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Taxon\n",
+ "Feature ID \n",
+ "1000512 k__Bacteria;p__Actinobacteria;c__Thermoleophil...\n",
+ "1000547 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob...\n",
+ "1000654 k__Bacteria;p__Bacteroidetes;c__Sphingobacteri..."
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 6 | \n",
+ "
\n",
+ " \n",
+ " Feature ID | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1000512 | \n",
+ " k__Bacteria | \n",
+ " p__Actinobacteria | \n",
+ " c__Thermoleophilia | \n",
+ " o__Gaiellales | \n",
+ " f__Gaiellaceae | \n",
+ " g__ | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ " 1000547 | \n",
+ " k__Bacteria | \n",
+ " p__Firmicutes | \n",
+ " c__Bacilli | \n",
+ " o__Lactobacillales | \n",
+ " f__Streptococcaceae | \n",
+ " g__Streptococcus | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ " 1000654 | \n",
+ " k__Bacteria | \n",
+ " p__Bacteroidetes | \n",
+ " c__Sphingobacteriia | \n",
+ " o__Sphingobacteriales | \n",
+ " f__Sphingobacteriaceae | \n",
+ " g__ | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ " 1000757 | \n",
+ " k__Bacteria | \n",
+ " p__Proteobacteria | \n",
+ " c__Alphaproteobacteria | \n",
+ " o__Rhizobiales | \n",
+ " f__Bradyrhizobiaceae | \n",
+ " g__ | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ " 1000876 | \n",
+ " k__Bacteria | \n",
+ " p__Actinobacteria | \n",
+ " c__Actinobacteria | \n",
+ " o__Actinomycetales | \n",
+ " f__Nocardioidaceae | \n",
+ " g__Nocardioides | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2 \\\n",
+ "Feature ID \n",
+ "1000512 k__Bacteria p__Actinobacteria c__Thermoleophilia \n",
+ "1000547 k__Bacteria p__Firmicutes c__Bacilli \n",
+ "1000654 k__Bacteria p__Bacteroidetes c__Sphingobacteriia \n",
+ "1000757 k__Bacteria p__Proteobacteria c__Alphaproteobacteria \n",
+ "1000876 k__Bacteria p__Actinobacteria c__Actinobacteria \n",
+ "\n",
+ " 3 4 5 \\\n",
+ "Feature ID \n",
+ "1000512 o__Gaiellales f__Gaiellaceae g__ \n",
+ "1000547 o__Lactobacillales f__Streptococcaceae g__Streptococcus \n",
+ "1000654 o__Sphingobacteriales f__Sphingobacteriaceae g__ \n",
+ "1000757 o__Rhizobiales f__Bradyrhizobiaceae g__ \n",
+ "1000876 o__Actinomycetales f__Nocardioidaceae g__Nocardioides \n",
+ "\n",
+ " 6 \n",
+ "Feature ID \n",
+ "1000512 s__ \n",
+ "1000547 s__ \n",
+ "1000654 s__ \n",
+ "1000757 s__ \n",
+ "1000876 s__ "
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "taxa_new = soils_taxa.Taxon.str.split(pat=\";\", expand=True)\n",
+ "taxa_new.head(5)\n",
+ "# ref: https://www.geeksforgeeks.org/python-pandas-split-strings-into-two-list-columns-using-str-split/"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Taxon | \n",
+ " Genus | \n",
+ "
\n",
+ " \n",
+ " Feature ID | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1000512 | \n",
+ " k__Bacteria;p__Actinobacteria;c__Thermoleophil... | \n",
+ " g__ | \n",
+ "
\n",
+ " \n",
+ " 1000547 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... | \n",
+ " g__Streptococcus | \n",
+ "
\n",
+ " \n",
+ " 1000654 | \n",
+ " k__Bacteria;p__Bacteroidetes;c__Sphingobacteri... | \n",
+ " g__ | \n",
+ "
\n",
+ " \n",
+ " 1000757 | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteoba... | \n",
+ " g__ | \n",
+ "
\n",
+ " \n",
+ " 1000876 | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteri... | \n",
+ " g__Nocardioides | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Taxon \\\n",
+ "Feature ID \n",
+ "1000512 k__Bacteria;p__Actinobacteria;c__Thermoleophil... \n",
+ "1000547 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... \n",
+ "1000654 k__Bacteria;p__Bacteroidetes;c__Sphingobacteri... \n",
+ "1000757 k__Bacteria;p__Proteobacteria;c__Alphaproteoba... \n",
+ "1000876 k__Bacteria;p__Actinobacteria;c__Actinobacteri... \n",
+ "\n",
+ " Genus \n",
+ "Feature ID \n",
+ "1000512 g__ \n",
+ "1000547 g__Streptococcus \n",
+ "1000654 g__ \n",
+ "1000757 g__ \n",
+ "1000876 g__Nocardioides "
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa['Genus'] = taxa_new[5]\n",
+ "soils_taxa.head(5)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "g__ 5213\n",
+ "g__Rhodoplanes 144\n",
+ "g__Bacillus 110\n",
+ "g__Candidatus Solibacter 100\n",
+ "g__Flavobacterium 71\n",
+ " ... \n",
+ "g__Rhodocyclus 1\n",
+ "g__Marinobacter 1\n",
+ "g__Afipia 1\n",
+ "g__Candidatus Amoebophilus 1\n",
+ "g__Desulfotomaculum 1\n",
+ "Name: Genus, Length: 335, dtype: int64"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa.Genus.value_counts()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(2183, 2)"
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# only keep those with genus assignment\n",
+ "soils_taxa_sub = soils_taxa[soils_taxa.Genus != 'g__']\n",
+ "soils_taxa_sub.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Taxon | \n",
+ " Genus | \n",
+ "
\n",
+ " \n",
+ " Feature ID | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1000547 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... | \n",
+ " g__Streptococcus | \n",
+ "
\n",
+ " \n",
+ " 1000876 | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteri... | \n",
+ " g__Nocardioides | \n",
+ "
\n",
+ " \n",
+ " 1003206 | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteoba... | \n",
+ " g__Sphingomonas | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Taxon \\\n",
+ "Feature ID \n",
+ "1000547 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... \n",
+ "1000876 k__Bacteria;p__Actinobacteria;c__Actinobacteri... \n",
+ "1003206 k__Bacteria;p__Proteobacteria;c__Alphaproteoba... \n",
+ "\n",
+ " Genus \n",
+ "Feature ID \n",
+ "1000547 g__Streptococcus \n",
+ "1000876 g__Nocardioides \n",
+ "1003206 g__Sphingomonas "
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa_sub.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(2183, 91)"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# partition biom table \n",
+ "soils_biom.index = soils_biom.index.astype('int64') \n",
+ "soils_biom_sub = soils_biom.merge(soils_taxa_sub, how='inner', left_index=True, right_index=True)\n",
+ "soils_biom_sub.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 103.CA2 | \n",
+ " 103.CO3 | \n",
+ " 103.SR3 | \n",
+ " 103.IE2 | \n",
+ " 103.BP1 | \n",
+ " 103.VC2 | \n",
+ " 103.SA2 | \n",
+ " 103.GB2 | \n",
+ " 103.CO2 | \n",
+ " 103.KP1 | \n",
+ " ... | \n",
+ " 103.RT1 | \n",
+ " 103.HI2 | \n",
+ " 103.DF1 | \n",
+ " 103.CF3 | \n",
+ " 103.AR1 | \n",
+ " 103.TL1 | \n",
+ " 103.HI4 | \n",
+ " 103.BB1 | \n",
+ " Taxon | \n",
+ " Genus | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 244336 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacill... | \n",
+ " g__Paenibacillus | \n",
+ "
\n",
+ " \n",
+ " 809489 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacill... | \n",
+ " g__Bacillus | \n",
+ "
\n",
+ " \n",
+ " 533625 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteoba... | \n",
+ " g__Novosphingobium | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
3 rows × 91 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 103.CA2 103.CO3 103.SR3 103.IE2 103.BP1 103.VC2 103.SA2 \\\n",
+ "244336 0 0 0 1 0 0 0 \n",
+ "809489 0 0 0 0 0 0 0 \n",
+ "533625 0 0 0 0 0 0 0 \n",
+ "\n",
+ " 103.GB2 103.CO2 103.KP1 ... 103.RT1 103.HI2 103.DF1 103.CF3 \\\n",
+ "244336 0 0 0 ... 0 0 0 0 \n",
+ "809489 1 0 0 ... 0 0 0 0 \n",
+ "533625 0 0 0 ... 0 0 0 0 \n",
+ "\n",
+ " 103.AR1 103.TL1 103.HI4 103.BB1 \\\n",
+ "244336 0 0 0 0 \n",
+ "809489 0 0 0 0 \n",
+ "533625 0 0 0 0 \n",
+ "\n",
+ " Taxon Genus \n",
+ "244336 k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacill... g__Paenibacillus \n",
+ "809489 k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacill... g__Bacillus \n",
+ "533625 k__Bacteria;p__Proteobacteria;c__Alphaproteoba... g__Novosphingobium \n",
+ "\n",
+ "[3 rows x 91 columns]"
+ ]
+ },
+ "execution_count": 40,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom_sub.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "soils_biom_sub.set_index('Taxon', inplace=True)\n",
+ "soils_biom_sub.drop(['Genus'], axis=1, inplace=True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(2183, 89)"
+ ]
+ },
+ "execution_count": 42,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom_sub.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 103.CA2 | \n",
+ " 103.CO3 | \n",
+ " 103.SR3 | \n",
+ " 103.IE2 | \n",
+ " 103.BP1 | \n",
+ " 103.VC2 | \n",
+ " 103.SA2 | \n",
+ " 103.GB2 | \n",
+ " 103.CO2 | \n",
+ " 103.KP1 | \n",
+ " ... | \n",
+ " 103.LQ1 | \n",
+ " 103.HI1 | \n",
+ " 103.RT1 | \n",
+ " 103.HI2 | \n",
+ " 103.DF1 | \n",
+ " 103.CF3 | \n",
+ " 103.AR1 | \n",
+ " 103.TL1 | \n",
+ " 103.HI4 | \n",
+ " 103.BB1 | \n",
+ "
\n",
+ " \n",
+ " Taxon | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Paenibacillaceae;g__Paenibacillus;s__ | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__muralis | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium;s__ | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
3 rows × 89 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 103.CA2 103.CO3 103.SR3 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.IE2 103.BP1 103.VC2 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 1 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.SA2 103.GB2 103.CO2 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 1 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.KP1 ... 103.LQ1 \\\n",
+ "Taxon ... \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 ... 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 ... 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 ... 0 \n",
+ "\n",
+ " 103.HI1 103.RT1 103.HI2 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.DF1 103.CF3 103.AR1 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.TL1 103.HI4 103.BB1 \n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ "[3 rows x 89 columns]"
+ ]
+ },
+ "execution_count": 43,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom_sub.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(89, 2183)"
+ ]
+ },
+ "execution_count": 44,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# transpose the dataframe \n",
+ "soils_biom_sub_t = soils_biom_sub.T\n",
+ "soils_biom_sub_t.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " Taxon | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Paenibacillaceae;g__Paenibacillus;s__ | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__muralis | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium;s__ | \n",
+ " k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus Solibacter;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__Rhodoplanes;s__ | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Nocardioidaceae;g__Nocardioides;s__ | \n",
+ " k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__Flavobacterium;s__ | \n",
+ " k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptococcaceae;g__Desulfotomaculum;s__ | \n",
+ " k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Koribacteraceae;g__Candidatus Koribacter;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Methylibium;s__ | \n",
+ " ... | \n",
+ " k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Rhodoferax;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae;g__Dokdonella;s__ | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Methylophilales;f__Methylophilaceae;g__Methylotenera;s__mobilis | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Streptomycetaceae;g__Streptomyces;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae;g__Luteimonas;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__Rhodoplanes;s__ | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Planococcaceae;g__Solibacillus;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Ramlibacter;s__ | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Mycobacteriaceae;g__Mycobacterium;s__ | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 103.CA2 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 103.CO3 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " 103.SR3 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
3 rows × 2183 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ "Taxon k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Paenibacillaceae;g__Paenibacillus;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__muralis \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus Solibacter;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__Rhodoplanes;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Nocardioidaceae;g__Nocardioides;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__Flavobacterium;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptococcaceae;g__Desulfotomaculum;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Koribacteraceae;g__Candidatus Koribacter;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Methylibium;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon ... \\\n",
+ "103.CA2 ... \n",
+ "103.CO3 ... \n",
+ "103.SR3 ... \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Rhodoferax;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae;g__Dokdonella;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 1 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Methylophilales;f__Methylophilaceae;g__Methylotenera;s__mobilis \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Streptomycetaceae;g__Streptomyces;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae;g__Luteimonas;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__Rhodoplanes;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Planococcaceae;g__Solibacillus;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Ramlibacter;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 1 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Mycobacteriaceae;g__Mycobacterium;s__ \n",
+ "103.CA2 0 \n",
+ "103.CO3 1 \n",
+ "103.SR3 1 \n",
+ "\n",
+ "[3 rows x 2183 columns]"
+ ]
+ },
+ "execution_count": 45,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom_sub_t.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "count 89.000000\n",
+ "mean 275.932584\n",
+ "std 115.508244\n",
+ "min 1.000000\n",
+ "25% 213.000000\n",
+ "50% 254.000000\n",
+ "75% 319.000000\n",
+ "max 805.000000\n",
+ "dtype: float64"
+ ]
+ },
+ "execution_count": 46,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# make sure that each genus exist in at least one sample\n",
+ "soils_biom_sub_t.sum(axis=1).describe() # column sum"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# export\n",
+ "soils_biom_sub_t.to_csv('../88soils/88soils_genus_table.txt', sep='\\t')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "count 2183.000000\n",
+ "mean 11.249656\n",
+ "std 33.869668\n",
+ "min 1.000000\n",
+ "25% 1.000000\n",
+ "50% 3.000000\n",
+ "75% 8.000000\n",
+ "max 690.000000\n",
+ "dtype: float64"
+ ]
+ },
+ "execution_count": 48,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# check that not rarefied\n",
+ "soils_biom_sub_t.sum(axis=0).describe() # row sum"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.2"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/data_application/notebooks_application/.ipynb_checkpoints/2.2.soils_dataPreparation-checkpoint.ipynb b/data_application/notebooks_application/.ipynb_checkpoints/2.2.soils_dataPreparation-checkpoint.ipynb
new file mode 100755
index 0000000..2217a77
--- /dev/null
+++ b/data_application/notebooks_application/.ipynb_checkpoints/2.2.soils_dataPreparation-checkpoint.ipynb
@@ -0,0 +1,1060 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../88soils/88soils_genus_table.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 2183
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 2183\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 2183\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 2183"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "count = soil_otu\n",
+ "dim(count)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__ | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__ | k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__ | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__ | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__ | k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__ | k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__ | k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__ | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__ | ... | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2 | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28 |
\n",
+ "\n",
+ "\t103.CA2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t103.CO3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
\n",
+ "\t103.SR3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
\n",
+ "\t103.IE2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 1 |
\n",
+ "\t103.BP1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
\n",
+ "\t103.VC2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Paenibacillaceae.g\\_\\_Paenibacillus.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Bacillaceae.g\\_\\_Bacillus.s\\_\\_muralis & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Sphingomonadales.f\\_\\_Sphingomonadaceae.g\\_\\_Novosphingobium.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Acidobacteria.c\\_\\_Solibacteres.o\\_\\_Solibacterales.f\\_\\_Solibacteraceae.g\\_\\_Candidatus.Solibacter.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Rhizobiales.f\\_\\_Hyphomicrobiaceae.g\\_\\_Rhodoplanes.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Nocardioidaceae.g\\_\\_Nocardioides.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Bacteroidetes.c\\_\\_Flavobacteriia.o\\_\\_Flavobacteriales.f\\_\\_Flavobacteriaceae.g\\_\\_Flavobacterium.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Clostridia.o\\_\\_Clostridiales.f\\_\\_Peptococcaceae.g\\_\\_Desulfotomaculum.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Acidobacteria.c\\_\\_Acidobacteriia.o\\_\\_Acidobacteriales.f\\_\\_Koribacteraceae.g\\_\\_Candidatus.Koribacter.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Methylibium.s\\_\\_ & ... & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Rhodoferax.s\\_\\_.4 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Gammaproteobacteria.o\\_\\_Xanthomonadales.f\\_\\_Xanthomonadaceae.g\\_\\_Dokdonella.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Lactobacillales.f\\_\\_Streptococcaceae.g\\_\\_Streptococcus.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Methylophilales.f\\_\\_Methylophilaceae.g\\_\\_Methylotenera.s\\_\\_mobilis.2 & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Streptomycetaceae.g\\_\\_Streptomyces.s\\_\\_.44 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Gammaproteobacteria.o\\_\\_Xanthomonadales.f\\_\\_Xanthomonadaceae.g\\_\\_Luteimonas.s\\_\\_.2 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Rhizobiales.f\\_\\_Hyphomicrobiaceae.g\\_\\_Rhodoplanes.s\\_\\_.142 & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Planococcaceae.g\\_\\_Solibacillus.s\\_\\_.3 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Ramlibacter.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Mycobacteriaceae.g\\_\\_Mycobacterium.s\\_\\_.28\\\\\n",
+ "\\hline\n",
+ "\t103.CA2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t103.CO3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\\\\n",
+ "\t103.SR3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 \\\\\n",
+ "\t103.IE2 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 1 \\\\\n",
+ "\t103.BP1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\\\\n",
+ "\t103.VC2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__ | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__ | k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__ | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__ | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__ | k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__ | k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__ | k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__ | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__ | ... | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2 | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28 |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 103.CA2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 103.CO3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |\n",
+ "| 103.SR3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |\n",
+ "| 103.IE2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 1 |\n",
+ "| 103.BP1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |\n",
+ "| 103.VC2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 1 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 2 \n",
+ " ...\n",
+ "103.CA2 ...\n",
+ "103.CO3 ...\n",
+ "103.SR3 ...\n",
+ "103.IE2 ...\n",
+ "103.BP1 ...\n",
+ "103.VC2 ...\n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 1 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 2 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 1 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28\n",
+ "103.CA2 0 \n",
+ "103.CO3 1 \n",
+ "103.SR3 1 \n",
+ "103.IE2 1 \n",
+ "103.BP1 1 \n",
+ "103.VC2 1 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(count)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 2183
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 2183\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 2183\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 2183"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dim(count[, colMeans(count > 0) >= 0.5/100]) # 0.5%"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 2183
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 2183\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 2183\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 2183"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dim(count[, colMeans(count > 0) >= 1/100]) # 1%"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 1297
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 1297\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 1297\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 1297"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dim(count[, colMeans(count > 0) >= 2/100]) # 2%"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 576
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 576\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 576\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 576"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dim(count[, colMeans(count > 0) >= 5/100]) # 5%"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 2183
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 2183\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 2183\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 2183"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__ | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__ | k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__ | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__ | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__ | k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__ | k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__ | k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__ | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__ | ... | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2 | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28 |
\n",
+ "\n",
+ "\t103.CA2 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | ... | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 |
\n",
+ "\t103.CO3 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | ... | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.199305 |
\n",
+ "\t103.SR3 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | ... | -7.942718 | -7.249570 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.249570 | -7.249570 |
\n",
+ "\t103.IE2 | -7.208230 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | ... | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -6.515083 | -7.901377 | -7.901377 | -7.208230 |
\n",
+ "\t103.BP1 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | ... | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.087991 |
\n",
+ "\t103.VC2 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -6.439350 | ... | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.132498 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Paenibacillaceae.g\\_\\_Paenibacillus.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Bacillaceae.g\\_\\_Bacillus.s\\_\\_muralis & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Sphingomonadales.f\\_\\_Sphingomonadaceae.g\\_\\_Novosphingobium.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Acidobacteria.c\\_\\_Solibacteres.o\\_\\_Solibacterales.f\\_\\_Solibacteraceae.g\\_\\_Candidatus.Solibacter.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Rhizobiales.f\\_\\_Hyphomicrobiaceae.g\\_\\_Rhodoplanes.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Nocardioidaceae.g\\_\\_Nocardioides.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Bacteroidetes.c\\_\\_Flavobacteriia.o\\_\\_Flavobacteriales.f\\_\\_Flavobacteriaceae.g\\_\\_Flavobacterium.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Clostridia.o\\_\\_Clostridiales.f\\_\\_Peptococcaceae.g\\_\\_Desulfotomaculum.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Acidobacteria.c\\_\\_Acidobacteriia.o\\_\\_Acidobacteriales.f\\_\\_Koribacteraceae.g\\_\\_Candidatus.Koribacter.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Methylibium.s\\_\\_ & ... & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Rhodoferax.s\\_\\_.4 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Gammaproteobacteria.o\\_\\_Xanthomonadales.f\\_\\_Xanthomonadaceae.g\\_\\_Dokdonella.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Lactobacillales.f\\_\\_Streptococcaceae.g\\_\\_Streptococcus.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Methylophilales.f\\_\\_Methylophilaceae.g\\_\\_Methylotenera.s\\_\\_mobilis.2 & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Streptomycetaceae.g\\_\\_Streptomyces.s\\_\\_.44 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Gammaproteobacteria.o\\_\\_Xanthomonadales.f\\_\\_Xanthomonadaceae.g\\_\\_Luteimonas.s\\_\\_.2 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Rhizobiales.f\\_\\_Hyphomicrobiaceae.g\\_\\_Rhodoplanes.s\\_\\_.142 & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Planococcaceae.g\\_\\_Solibacillus.s\\_\\_.3 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Ramlibacter.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Mycobacteriaceae.g\\_\\_Mycobacterium.s\\_\\_.28\\\\\n",
+ "\\hline\n",
+ "\t103.CA2 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & ... & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436\\\\\n",
+ "\t103.CO3 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & ... & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.199305\\\\\n",
+ "\t103.SR3 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & ... & -7.942718 & -7.249570 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.249570 & -7.249570\\\\\n",
+ "\t103.IE2 & -7.208230 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & ... & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -6.515083 & -7.901377 & -7.901377 & -7.208230\\\\\n",
+ "\t103.BP1 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & ... & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.087991\\\\\n",
+ "\t103.VC2 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -6.439350 & ... & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.132498\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__ | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__ | k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__ | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__ | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__ | k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__ | k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__ | k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__ | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__ | ... | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2 | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28 |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 103.CA2 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | ... | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 |\n",
+ "| 103.CO3 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | ... | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.199305 |\n",
+ "| 103.SR3 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | ... | -7.942718 | -7.249570 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.249570 | -7.249570 |\n",
+ "| 103.IE2 | -7.208230 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | ... | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -6.515083 | -7.901377 | -7.901377 | -7.208230 |\n",
+ "| 103.BP1 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | ... | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.087991 |\n",
+ "| 103.VC2 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -6.439350 | ... | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.132498 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.208230 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -6.439350 \n",
+ " ...\n",
+ "103.CA2 ...\n",
+ "103.CO3 ...\n",
+ "103.SR3 ...\n",
+ "103.IE2 ...\n",
+ "103.BP1 ...\n",
+ "103.VC2 ...\n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.249570 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -6.515083 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.249570 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.199305 \n",
+ "103.SR3 -7.249570 \n",
+ "103.IE2 -7.208230 \n",
+ "103.BP1 -7.087991 \n",
+ "103.VC2 -7.132498 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# add pesudo count 0.5\n",
+ "x = count # preprossed done already\n",
+ "x[x == 0] <- 0.5\n",
+ "x <- x/rowSums(x) # relative abundance\n",
+ "taxa <- log(x)\n",
+ "dim(taxa)\n",
+ "head(taxa)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 'BarcodeSequence'
\n",
+ "\t- 'LinkerPrimerSequence'
\n",
+ "\t- 'barcode_read_group_tag'
\n",
+ "\t- 'dna_extracted_prep'
\n",
+ "\t- 'experiment_alias'
\n",
+ "\t- 'experiment_center'
\n",
+ "\t- 'experiment_design_description'
\n",
+ "\t- 'experiment_title'
\n",
+ "\t- 'instrument_name'
\n",
+ "\t- 'key_seq'
\n",
+ "\t- 'library_construction_protocol'
\n",
+ "\t- 'linker'
\n",
+ "\t- 'pcr_primers'
\n",
+ "\t- 'physical_specimen_remaining_prep'
\n",
+ "\t- 'platform'
\n",
+ "\t- 'pool_member_name'
\n",
+ "\t- 'pool_proportion'
\n",
+ "\t- 'primer_read_group_tag'
\n",
+ "\t- 'region'
\n",
+ "\t- 'run_alias'
\n",
+ "\t- 'run_center'
\n",
+ "\t- 'run_date'
\n",
+ "\t- 'run_prefix'
\n",
+ "\t- 'samp_size'
\n",
+ "\t- 'sample_center'
\n",
+ "\t- 'sample_type_prep'
\n",
+ "\t- 'sequencing_meth'
\n",
+ "\t- 'study_center'
\n",
+ "\t- 'study_ref'
\n",
+ "\t- 'target_gene'
\n",
+ "\t- 'target_subfragment'
\n",
+ "\t- 'altitude'
\n",
+ "\t- 'annual_season_precpt'
\n",
+ "\t- 'annual_season_temp'
\n",
+ "\t- 'anonymized_name'
\n",
+ "\t- 'assigned_from_geo'
\n",
+ "\t- 'carb_nitro_ratio'
\n",
+ "\t- 'cmin_rate'
\n",
+ "\t- 'collection_date'
\n",
+ "\t- 'common_name'
\n",
+ "\t- 'country'
\n",
+ "\t- 'depth'
\n",
+ "\t- 'dna_extracted'
\n",
+ "\t- 'elevation'
\n",
+ "\t- 'env_biome'
\n",
+ "\t- 'env_feature'
\n",
+ "\t- 'env_matter'
\n",
+ "\t- 'host_subject_id'
\n",
+ "\t- 'latitude'
\n",
+ "\t- 'longitude'
\n",
+ "\t- 'ph'
\n",
+ "\t- 'physical_specimen_remaining'
\n",
+ "\t- 'project_name'
\n",
+ "\t- 'public'
\n",
+ "\t- 'sample_type'
\n",
+ "\t- 'silt_clay'
\n",
+ "\t- 'soil_moisture_deficit'
\n",
+ "\t- 'soil_type'
\n",
+ "\t- 'specific_location'
\n",
+ "\t- 'taxon_id'
\n",
+ "\t- 'texture'
\n",
+ "\t- 'title'
\n",
+ "\t- 'tot_org_carb'
\n",
+ "\t- 'tot_org_nitro'
\n",
+ "\t- 'Description'
\n",
+ "\t- 'ph2'
\n",
+ "\t- 'ph3'
\n",
+ "\t- 'ph4'
\n",
+ "\t- 'ph_rounded'
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 'BarcodeSequence'\n",
+ "\\item 'LinkerPrimerSequence'\n",
+ "\\item 'barcode\\_read\\_group\\_tag'\n",
+ "\\item 'dna\\_extracted\\_prep'\n",
+ "\\item 'experiment\\_alias'\n",
+ "\\item 'experiment\\_center'\n",
+ "\\item 'experiment\\_design\\_description'\n",
+ "\\item 'experiment\\_title'\n",
+ "\\item 'instrument\\_name'\n",
+ "\\item 'key\\_seq'\n",
+ "\\item 'library\\_construction\\_protocol'\n",
+ "\\item 'linker'\n",
+ "\\item 'pcr\\_primers'\n",
+ "\\item 'physical\\_specimen\\_remaining\\_prep'\n",
+ "\\item 'platform'\n",
+ "\\item 'pool\\_member\\_name'\n",
+ "\\item 'pool\\_proportion'\n",
+ "\\item 'primer\\_read\\_group\\_tag'\n",
+ "\\item 'region'\n",
+ "\\item 'run\\_alias'\n",
+ "\\item 'run\\_center'\n",
+ "\\item 'run\\_date'\n",
+ "\\item 'run\\_prefix'\n",
+ "\\item 'samp\\_size'\n",
+ "\\item 'sample\\_center'\n",
+ "\\item 'sample\\_type\\_prep'\n",
+ "\\item 'sequencing\\_meth'\n",
+ "\\item 'study\\_center'\n",
+ "\\item 'study\\_ref'\n",
+ "\\item 'target\\_gene'\n",
+ "\\item 'target\\_subfragment'\n",
+ "\\item 'altitude'\n",
+ "\\item 'annual\\_season\\_precpt'\n",
+ "\\item 'annual\\_season\\_temp'\n",
+ "\\item 'anonymized\\_name'\n",
+ "\\item 'assigned\\_from\\_geo'\n",
+ "\\item 'carb\\_nitro\\_ratio'\n",
+ "\\item 'cmin\\_rate'\n",
+ "\\item 'collection\\_date'\n",
+ "\\item 'common\\_name'\n",
+ "\\item 'country'\n",
+ "\\item 'depth'\n",
+ "\\item 'dna\\_extracted'\n",
+ "\\item 'elevation'\n",
+ "\\item 'env\\_biome'\n",
+ "\\item 'env\\_feature'\n",
+ "\\item 'env\\_matter'\n",
+ "\\item 'host\\_subject\\_id'\n",
+ "\\item 'latitude'\n",
+ "\\item 'longitude'\n",
+ "\\item 'ph'\n",
+ "\\item 'physical\\_specimen\\_remaining'\n",
+ "\\item 'project\\_name'\n",
+ "\\item 'public'\n",
+ "\\item 'sample\\_type'\n",
+ "\\item 'silt\\_clay'\n",
+ "\\item 'soil\\_moisture\\_deficit'\n",
+ "\\item 'soil\\_type'\n",
+ "\\item 'specific\\_location'\n",
+ "\\item 'taxon\\_id'\n",
+ "\\item 'texture'\n",
+ "\\item 'title'\n",
+ "\\item 'tot\\_org\\_carb'\n",
+ "\\item 'tot\\_org\\_nitro'\n",
+ "\\item 'Description'\n",
+ "\\item 'ph2'\n",
+ "\\item 'ph3'\n",
+ "\\item 'ph4'\n",
+ "\\item 'ph\\_rounded'\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 'BarcodeSequence'\n",
+ "2. 'LinkerPrimerSequence'\n",
+ "3. 'barcode_read_group_tag'\n",
+ "4. 'dna_extracted_prep'\n",
+ "5. 'experiment_alias'\n",
+ "6. 'experiment_center'\n",
+ "7. 'experiment_design_description'\n",
+ "8. 'experiment_title'\n",
+ "9. 'instrument_name'\n",
+ "10. 'key_seq'\n",
+ "11. 'library_construction_protocol'\n",
+ "12. 'linker'\n",
+ "13. 'pcr_primers'\n",
+ "14. 'physical_specimen_remaining_prep'\n",
+ "15. 'platform'\n",
+ "16. 'pool_member_name'\n",
+ "17. 'pool_proportion'\n",
+ "18. 'primer_read_group_tag'\n",
+ "19. 'region'\n",
+ "20. 'run_alias'\n",
+ "21. 'run_center'\n",
+ "22. 'run_date'\n",
+ "23. 'run_prefix'\n",
+ "24. 'samp_size'\n",
+ "25. 'sample_center'\n",
+ "26. 'sample_type_prep'\n",
+ "27. 'sequencing_meth'\n",
+ "28. 'study_center'\n",
+ "29. 'study_ref'\n",
+ "30. 'target_gene'\n",
+ "31. 'target_subfragment'\n",
+ "32. 'altitude'\n",
+ "33. 'annual_season_precpt'\n",
+ "34. 'annual_season_temp'\n",
+ "35. 'anonymized_name'\n",
+ "36. 'assigned_from_geo'\n",
+ "37. 'carb_nitro_ratio'\n",
+ "38. 'cmin_rate'\n",
+ "39. 'collection_date'\n",
+ "40. 'common_name'\n",
+ "41. 'country'\n",
+ "42. 'depth'\n",
+ "43. 'dna_extracted'\n",
+ "44. 'elevation'\n",
+ "45. 'env_biome'\n",
+ "46. 'env_feature'\n",
+ "47. 'env_matter'\n",
+ "48. 'host_subject_id'\n",
+ "49. 'latitude'\n",
+ "50. 'longitude'\n",
+ "51. 'ph'\n",
+ "52. 'physical_specimen_remaining'\n",
+ "53. 'project_name'\n",
+ "54. 'public'\n",
+ "55. 'sample_type'\n",
+ "56. 'silt_clay'\n",
+ "57. 'soil_moisture_deficit'\n",
+ "58. 'soil_type'\n",
+ "59. 'specific_location'\n",
+ "60. 'taxon_id'\n",
+ "61. 'texture'\n",
+ "62. 'title'\n",
+ "63. 'tot_org_carb'\n",
+ "64. 'tot_org_nitro'\n",
+ "65. 'Description'\n",
+ "66. 'ph2'\n",
+ "67. 'ph3'\n",
+ "68. 'ph4'\n",
+ "69. 'ph_rounded'\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ " [1] \"BarcodeSequence\" \"LinkerPrimerSequence\" \n",
+ " [3] \"barcode_read_group_tag\" \"dna_extracted_prep\" \n",
+ " [5] \"experiment_alias\" \"experiment_center\" \n",
+ " [7] \"experiment_design_description\" \"experiment_title\" \n",
+ " [9] \"instrument_name\" \"key_seq\" \n",
+ "[11] \"library_construction_protocol\" \"linker\" \n",
+ "[13] \"pcr_primers\" \"physical_specimen_remaining_prep\"\n",
+ "[15] \"platform\" \"pool_member_name\" \n",
+ "[17] \"pool_proportion\" \"primer_read_group_tag\" \n",
+ "[19] \"region\" \"run_alias\" \n",
+ "[21] \"run_center\" \"run_date\" \n",
+ "[23] \"run_prefix\" \"samp_size\" \n",
+ "[25] \"sample_center\" \"sample_type_prep\" \n",
+ "[27] \"sequencing_meth\" \"study_center\" \n",
+ "[29] \"study_ref\" \"target_gene\" \n",
+ "[31] \"target_subfragment\" \"altitude\" \n",
+ "[33] \"annual_season_precpt\" \"annual_season_temp\" \n",
+ "[35] \"anonymized_name\" \"assigned_from_geo\" \n",
+ "[37] \"carb_nitro_ratio\" \"cmin_rate\" \n",
+ "[39] \"collection_date\" \"common_name\" \n",
+ "[41] \"country\" \"depth\" \n",
+ "[43] \"dna_extracted\" \"elevation\" \n",
+ "[45] \"env_biome\" \"env_feature\" \n",
+ "[47] \"env_matter\" \"host_subject_id\" \n",
+ "[49] \"latitude\" \"longitude\" \n",
+ "[51] \"ph\" \"physical_specimen_remaining\" \n",
+ "[53] \"project_name\" \"public\" \n",
+ "[55] \"sample_type\" \"silt_clay\" \n",
+ "[57] \"soil_moisture_deficit\" \"soil_type\" \n",
+ "[59] \"specific_location\" \"taxon_id\" \n",
+ "[61] \"texture\" \"title\" \n",
+ "[63] \"tot_org_carb\" \"tot_org_nitro\" \n",
+ "[65] \"Description\" \"ph2\" \n",
+ "[67] \"ph3\" \"ph4\" \n",
+ "[69] \"ph_rounded\" "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "colnames(demo)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "89"
+ ],
+ "text/latex": [
+ "89"
+ ],
+ "text/markdown": [
+ "89"
+ ],
+ "text/plain": [
+ "[1] 89"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 8.02
\n",
+ "\t- 6.02
\n",
+ "\t- 6.95
\n",
+ "\t- 5.52
\n",
+ "\t- 7.53
\n",
+ "\t- 5.99
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 8.02\n",
+ "\\item 6.02\n",
+ "\\item 6.95\n",
+ "\\item 5.52\n",
+ "\\item 7.53\n",
+ "\\item 5.99\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 8.02\n",
+ "2. 6.02\n",
+ "3. 6.95\n",
+ "4. 5.52\n",
+ "5. 7.53\n",
+ "6. 5.99\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 8.02 6.02 6.95 5.52 7.53 5.99"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# metadata\n",
+ "mf <- read.csv(\"../88soils/88soils_modified_metadata.txt\", sep='\\t', row.names=1)\n",
+ "y <- mf$ph[match(rownames(count), rownames(mf))]\n",
+ "length(y)\n",
+ "head(y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "'matrix'"
+ ],
+ "text/latex": [
+ "'matrix'"
+ ],
+ "text/markdown": [
+ "'matrix'"
+ ],
+ "text/plain": [
+ "[1] \"matrix\""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "'numeric'"
+ ],
+ "text/latex": [
+ "'numeric'"
+ ],
+ "text/markdown": [
+ "'numeric'"
+ ],
+ "text/plain": [
+ "[1] \"numeric\""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# check datatype\n",
+ "class(taxa); class(y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# save processed data\n",
+ "save(y, taxa, file='../88soils/soils_ph.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data_application/notebooks_application/.ipynb_checkpoints/2.3 soils_results_cts-checkpoint.ipynb b/data_application/notebooks_application/.ipynb_checkpoints/2.3 soils_results_cts-checkpoint.ipynb
new file mode 100755
index 0000000..34fdcc6
--- /dev/null
+++ b/data_application/notebooks_application/.ipynb_checkpoints/2.3 soils_results_cts-checkpoint.ipynb
@@ -0,0 +1,402 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Soil microbiome data application results for continuous outcome"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### method comparisons"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../88soils/results_cts/soils_ph_compLasso.RData')\n",
+ "load('../88soils/results_cts/soils_ph_elnet.RData')\n",
+ "load('../88soils/results_cts/soils_ph_lasso.RData')\n",
+ "load('../88soils/results_cts/soils_ph_rf.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.39
\n",
+ "\t- 0.46
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.39\n",
+ "\\item 0.46\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.39\n",
+ "2. 0.46\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.39 0.46"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_compLasso$stab_index, out_compLasso$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.31
\n",
+ "\t- 0.34
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.31\n",
+ "\\item 0.34\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.31\n",
+ "2. 0.34\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.31 0.34"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_lasso$stab_index, out_lasso$MSE_mean) "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.16
\n",
+ "\t- 0.23
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.16\n",
+ "\\item 0.23\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.16\n",
+ "2. 0.23\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.16 0.23"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_elnet$stab_index, out_elnet$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.04
\n",
+ "\t- 0.26
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.04\n",
+ "\\item 0.26\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.04\n",
+ "2. 0.26\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.04 0.26"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_rf$stab_index, out_rf$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "dataset | method | mse | stability |
\n",
+ "\n",
+ "\tsoil_88 | lasso | 0.34 | 0.31 |
\n",
+ "\tsoil_88 | elent | 0.23 | 0.16 |
\n",
+ "\tsoil_88 | rf | 0.26 | 0.04 |
\n",
+ "\tsoil_88 | compLasso | 0.46 | 0.39 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|llll}\n",
+ " dataset & method & mse & stability\\\\\n",
+ "\\hline\n",
+ "\t soil\\_88 & lasso & 0.34 & 0.31 \\\\\n",
+ "\t soil\\_88 & elent & 0.23 & 0.16 \\\\\n",
+ "\t soil\\_88 & rf & 0.26 & 0.04 \\\\\n",
+ "\t soil\\_88 & compLasso & 0.46 & 0.39 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| dataset | method | mse | stability |\n",
+ "|---|---|---|---|\n",
+ "| soil_88 | lasso | 0.34 | 0.31 |\n",
+ "| soil_88 | elent | 0.23 | 0.16 |\n",
+ "| soil_88 | rf | 0.26 | 0.04 |\n",
+ "| soil_88 | compLasso | 0.46 | 0.39 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " dataset method mse stability\n",
+ "1 soil_88 lasso 0.34 0.31 \n",
+ "2 soil_88 elent 0.23 0.16 \n",
+ "3 soil_88 rf 0.26 0.04 \n",
+ "4 soil_88 compLasso 0.46 0.39 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# combine and export results\n",
+ "soil_88 = as.data.frame(matrix(NA, nrow=4, ncol=4))\n",
+ "colnames(soil_88) = c('dataset', 'method', 'mse', 'stability')\n",
+ "soil_88$dataset = 'soil_88'\n",
+ "soil_88$method = c('lasso', 'elent', 'rf', 'compLasso')\n",
+ "soil_88$mse = c(out_lasso$MSE_mean, out_elnet$MSE_mean, out_rf$MSE_mean, out_compLasso$MSE_mean)\n",
+ "soil_88$stability = c(out_lasso$stab_index, out_elnet$stab_index, out_rf$stab_index, out_compLasso$stab_index)\n",
+ "soil_88"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### hypothesis testing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../88soils/results_cts/soils_ph_boot_compLasso.RData')\n",
+ "load('../88soils/results_cts/soils_ph_boot_rf.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.3595"
+ ],
+ "text/latex": [
+ "0.3595"
+ ],
+ "text/markdown": [
+ "0.3595"
+ ],
+ "text/plain": [
+ "[1] 0.3595"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 2.5%
\n",
+ "\t\t- 0.28
\n",
+ "\t- 97.5%
\n",
+ "\t\t- 0.44
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] 0.28\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.44\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": 0.2897.5%\n",
+ ": 0.44\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ " 0.28 0.44 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_stab = (boot_compLasso$stab_index - boot_rf$stab_index)\n",
+ "mean(diff_stab)\n",
+ "quantile(diff_stab, probs = c(0.025, 0.975)) \n",
+ "# CI doesn't contain zero: compLasso is significantly more stable than RF"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.0811150164524993"
+ ],
+ "text/latex": [
+ "0.0811150164524993"
+ ],
+ "text/markdown": [
+ "0.0811150164524993"
+ ],
+ "text/plain": [
+ "[1] 0.08111502"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 2.5%
\n",
+ "\t\t- -0.283912044397921
\n",
+ "\t- 97.5%
\n",
+ "\t\t- 0.945808111396662
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] -0.283912044397921\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.945808111396662\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": -0.28391204439792197.5%\n",
+ ": 0.945808111396662\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "-0.2839120 0.9458081 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_mse = (unlist(boot_compLasso$MSE_list) - unlist(boot_rf$MSE_list)) # use all 100*100 MSEs\n",
+ "mean(diff_mse)\n",
+ "quantile(diff_mse, probs = c(0.025, 0.975)) \n",
+ "# CI contain zero: compLasso is not significantly different from RF based on MSE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data_application/notebooks_application/.ipynb_checkpoints/3_applications_bin_results-checkpoint.ipynb b/data_application/notebooks_application/.ipynb_checkpoints/3_applications_bin_results-checkpoint.ipynb
new file mode 100644
index 0000000..fa46c90
--- /dev/null
+++ b/data_application/notebooks_application/.ipynb_checkpoints/3_applications_bin_results-checkpoint.ipynb
@@ -0,0 +1,505 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Results for real microbiome data applications"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dir = '/panfs/panfs1.ucsd.edu/panscratch/lij014/Stability_2020/data_applications/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### BMI dataset application"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load(paste0(dir, '/BMI_binary_GenCompLasso.RData'))\n",
+ "load(paste0(dir, '/BMI_binary_lasso.RData', sep=''))\n",
+ "load(paste0(dir, '/BMI_binary_elnet.RData', sep=''))\n",
+ "load(paste0(dir, '/BMI_binary_rf.RData', sep=''))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 4 × 4\n",
+ "\n",
+ "\tdataset | method | ROC | stability |
\n",
+ "\t<chr> | <chr> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\tbmi_gut | lasso | 0.63 | 0.14 |
\n",
+ "\tbmi_gut | elent | 0.78 | 0.19 |
\n",
+ "\tbmi_gut | rf | 1.00 | 0.01 |
\n",
+ "\tbmi_gut | compLasso | 0.85 | 0.29 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 4 × 4\n",
+ "\\begin{tabular}{llll}\n",
+ " dataset & method & ROC & stability\\\\\n",
+ " & & & \\\\\n",
+ "\\hline\n",
+ "\t bmi\\_gut & lasso & 0.63 & 0.14\\\\\n",
+ "\t bmi\\_gut & elent & 0.78 & 0.19\\\\\n",
+ "\t bmi\\_gut & rf & 1.00 & 0.01\\\\\n",
+ "\t bmi\\_gut & compLasso & 0.85 & 0.29\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 4 × 4\n",
+ "\n",
+ "| dataset <chr> | method <chr> | ROC <dbl> | stability <dbl> |\n",
+ "|---|---|---|---|\n",
+ "| bmi_gut | lasso | 0.63 | 0.14 |\n",
+ "| bmi_gut | elent | 0.78 | 0.19 |\n",
+ "| bmi_gut | rf | 1.00 | 0.01 |\n",
+ "| bmi_gut | compLasso | 0.85 | 0.29 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " dataset method ROC stability\n",
+ "1 bmi_gut lasso 0.63 0.14 \n",
+ "2 bmi_gut elent 0.78 0.19 \n",
+ "3 bmi_gut rf 1.00 0.01 \n",
+ "4 bmi_gut compLasso 0.85 0.29 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "bmi_gut = as.data.frame(matrix(NA, nrow=4, ncol=4))\n",
+ "colnames(bmi_gut) = c('dataset', 'method', 'ROC', 'stability')\n",
+ "bmi_gut$dataset = 'bmi_gut'\n",
+ "bmi_gut$method = c('lasso', 'elent', 'rf', 'compLasso')\n",
+ "bmi_gut$ROC = c(out_lasso$ROC_mean, out_elnet$ROC_mean, out_rf$ROC_mean, out_GenCompLasso$ROC_mean)\n",
+ "bmi_gut$stability = c(out_lasso$stab_index, out_elnet$stab_index, out_rf$stab_index, out_GenCompLasso$stab_index)\n",
+ "bmi_gut"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# testing\n",
+ "load(paste0(dir, '/BMI_binary_boot_rf.RData'))\n",
+ "load(paste0(dir, '/BMI_binary_boot_compLasso.RData'))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.2997"
+ ],
+ "text/latex": [
+ "0.2997"
+ ],
+ "text/markdown": [
+ "0.2997"
+ ],
+ "text/plain": [
+ "[1] 0.2997"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "- 2.5%
- 0.11
- 97.5%
- 0.41525
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] 0.11\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.41525\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": 0.1197.5%\n",
+ ": 0.41525\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "0.11000 0.41525 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_stab = (boot_compLasso$stab_index - boot_rf$stab_index)\n",
+ "mean(diff_stab)\n",
+ "quantile(diff_stab, probs = c(0.025, 0.975)) \n",
+ "# CI doesn't contain zero: compLasso is significantly more stable than RF"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "-0.0942798529453605"
+ ],
+ "text/latex": [
+ "-0.0942798529453605"
+ ],
+ "text/markdown": [
+ "-0.0942798529453605"
+ ],
+ "text/plain": [
+ "[1] -0.09427985"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "- 2.5%
- -0.19020132947925
- 97.5%
- -0.020967051775781
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] -0.19020132947925\n",
+ "\\item[97.5\\textbackslash{}\\%] -0.020967051775781\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": -0.1902013294792597.5%\n",
+ ": -0.020967051775781\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "-0.19020133 -0.02096705 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_ROC = (unlist(boot_compLasso$ROC_list) - unlist(boot_rf$ROC_list)) # use all 100*100 ROCs\n",
+ "mean(diff_ROC)\n",
+ "quantile(diff_ROC, probs = c(0.025, 0.975)) \n",
+ "### CI doesn't contain zero: compLasso is significantly different from RF based on ROC\n",
+ "### although very close to zero (now ROC has similar scale as Stability, Stability still better differentiation)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 88 soils dataset application"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load(paste0(dir, '/soils_binary_ph_GenCompLasso.RData', sep=''))\n",
+ "load(paste0(dir, '/soils_binary_ph_lasso.RData', sep=''))\n",
+ "load(paste0(dir, '/soils_binary_ph_elnet.RData', sep=''))\n",
+ "load(paste0(dir, '/soils_binary_ph_rf.RData', sep=''))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 4 × 4\n",
+ "\n",
+ "\tdataset | method | ROC | stability |
\n",
+ "\t<chr> | <chr> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\tsoil_88 | lasso | 0.90 | 0.28 |
\n",
+ "\tsoil_88 | elent | 0.94 | 0.32 |
\n",
+ "\tsoil_88 | rf | 1.00 | 0.03 |
\n",
+ "\tsoil_88 | compLasso | 0.96 | 0.46 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 4 × 4\n",
+ "\\begin{tabular}{llll}\n",
+ " dataset & method & ROC & stability\\\\\n",
+ " & & & \\\\\n",
+ "\\hline\n",
+ "\t soil\\_88 & lasso & 0.90 & 0.28\\\\\n",
+ "\t soil\\_88 & elent & 0.94 & 0.32\\\\\n",
+ "\t soil\\_88 & rf & 1.00 & 0.03\\\\\n",
+ "\t soil\\_88 & compLasso & 0.96 & 0.46\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 4 × 4\n",
+ "\n",
+ "| dataset <chr> | method <chr> | ROC <dbl> | stability <dbl> |\n",
+ "|---|---|---|---|\n",
+ "| soil_88 | lasso | 0.90 | 0.28 |\n",
+ "| soil_88 | elent | 0.94 | 0.32 |\n",
+ "| soil_88 | rf | 1.00 | 0.03 |\n",
+ "| soil_88 | compLasso | 0.96 | 0.46 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " dataset method ROC stability\n",
+ "1 soil_88 lasso 0.90 0.28 \n",
+ "2 soil_88 elent 0.94 0.32 \n",
+ "3 soil_88 rf 1.00 0.03 \n",
+ "4 soil_88 compLasso 0.96 0.46 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "soil_88 = as.data.frame(matrix(NA, nrow=4, ncol=4))\n",
+ "colnames(soil_88) = c('dataset', 'method', 'ROC', 'stability')\n",
+ "soil_88$dataset = 'soil_88'\n",
+ "soil_88$method = c('lasso', 'elent', 'rf', 'compLasso')\n",
+ "soil_88$ROC = c(out_lasso$ROC_mean, out_elnet$ROC_mean, out_rf$ROC_mean, out_GenCompLasso$ROC_mean)\n",
+ "soil_88$stability = c(out_lasso$stab_index, out_elnet$stab_index, out_rf$stab_index, out_GenCompLasso$stab_index)\n",
+ "soil_88"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# testing\n",
+ "load(paste0(dir, '/soils_binary_ph_boot_rf.RData'))\n",
+ "load(paste0(dir, '/soils_binary_ph_boot_compLasso.RData'))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.4323"
+ ],
+ "text/latex": [
+ "0.4323"
+ ],
+ "text/markdown": [
+ "0.4323"
+ ],
+ "text/plain": [
+ "[1] 0.4323"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "- 2.5%
- 0.37
- 97.5%
- 0.5
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] 0.37\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.5\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": 0.3797.5%\n",
+ ": 0.5\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ " 0.37 0.50 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_stab = (boot_compLasso$stab_index - boot_rf$stab_index)\n",
+ "mean(diff_stab)\n",
+ "quantile(diff_stab, probs = c(0.025, 0.975)) \n",
+ "# CI doesn't contain zero: compLasso is significantly more stable than RF"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "-0.0247120820917325"
+ ],
+ "text/latex": [
+ "-0.0247120820917325"
+ ],
+ "text/markdown": [
+ "-0.0247120820917325"
+ ],
+ "text/plain": [
+ "[1] -0.02471208"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "- 2.5%
- -0.0838574423480084
- 97.5%
- 0
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] -0.0838574423480084\n",
+ "\\item[97.5\\textbackslash{}\\%] 0\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": -0.083857442348008497.5%\n",
+ ": 0\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "-0.08385744 0.00000000 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_ROC = (unlist(boot_compLasso$ROC_list) - unlist(boot_rf$ROC_list)) # use all 100*100 ROCs\n",
+ "mean(diff_ROC)\n",
+ "quantile(diff_ROC, probs = c(0.025, 0.975)) \n",
+ "# CI contain zero: compLasso is not significantly different from RF based on ROC"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ " Min. 1st Qu. Median Mean 3rd Qu. Max. \n",
+ "-0.24949 -0.04206 -0.02250 -0.02471 0.00000 0.03769 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "summary(diff_ROC)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data_application/notebooks_application/1.1. BMI_DataPreparation.ipynb b/data_application/notebooks_application/1.1. BMI_DataPreparation.ipynb
new file mode 100755
index 0000000..2a6fe2d
--- /dev/null
+++ b/data_application/notebooks_application/1.1. BMI_DataPreparation.ipynb
@@ -0,0 +1,875 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### BMI dataset in Lin et al, 2014 \n",
+ "##### use unrarifed count table and retain microbes only at genus level (+ present at least one sample)\n",
+ "#### add 0.5 as pseudo count to zero counts"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 98
\n",
+ "\t- 263
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 98\n",
+ "\\item 263\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 98\n",
+ "2. 263\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 98 263"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | Bacteria.Actinobacteria | Bacteria.Bacteroidetes | Bacteria.Cyanobacteria | Bacteria.Firmicutes | Bacteria.Fusobacteria | Bacteria.Lentisphaerae | Bacteria.OD1 | Bacteria.Proteobacteria | Bacteria.Spirochaetes | Bacteria.Synergistetes | ... | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Moraxellaceae.Enhydrobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Vibrionales.Vibrionaceae.Vibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Lysobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Brachyspiraceae.Brachyspira | Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Spirochaetaceae.Treponema | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |
\n",
+ "\n",
+ "\t3001 | 1 | 5067 | 0 | 4153 | 0 | 0 | 0 | 534 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3003 | 0 | 4659 | 0 | 2177 | 0 | 0 | 0 | 105 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3004 | 0 | 4342 | 0 | 3008 | 0 | 2 | 0 | 134 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3006 | 8 | 2910 | 0 | 4147 | 0 | 0 | 0 | 459 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3007 | 5 | 5630 | 0 | 4705 | 0 | 0 | 0 | 214 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3008 | 5 | 1868 | 0 | 1619 | 0 | 0 | 0 | 17 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & Bacteria.Actinobacteria & Bacteria.Bacteroidetes & Bacteria.Cyanobacteria & Bacteria.Firmicutes & Bacteria.Fusobacteria & Bacteria.Lentisphaerae & Bacteria.OD1 & Bacteria.Proteobacteria & Bacteria.Spirochaetes & Bacteria.Synergistetes & ... & Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Moraxellaceae.Enhydrobacter & Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas & Bacteria.Proteobacteria.Gammaproteobacteria.Vibrionales.Vibrionaceae.Vibrio & Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Lysobacter & Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas & Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Brachyspiraceae.Brachyspira & Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Spirochaetaceae.Treponema & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter & Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\\\\\n",
+ "\\hline\n",
+ "\t3001 & 1 & 5067 & 0 & 4153 & 0 & 0 & 0 & 534 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3003 & 0 & 4659 & 0 & 2177 & 0 & 0 & 0 & 105 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3004 & 0 & 4342 & 0 & 3008 & 0 & 2 & 0 & 134 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3006 & 8 & 2910 & 0 & 4147 & 0 & 0 & 0 & 459 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3007 & 5 & 5630 & 0 & 4705 & 0 & 0 & 0 & 214 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3008 & 5 & 1868 & 0 & 1619 & 0 & 0 & 0 & 17 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | Bacteria.Actinobacteria | Bacteria.Bacteroidetes | Bacteria.Cyanobacteria | Bacteria.Firmicutes | Bacteria.Fusobacteria | Bacteria.Lentisphaerae | Bacteria.OD1 | Bacteria.Proteobacteria | Bacteria.Spirochaetes | Bacteria.Synergistetes | ... | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Moraxellaceae.Enhydrobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Vibrionales.Vibrionaceae.Vibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Lysobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Brachyspiraceae.Brachyspira | Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Spirochaetaceae.Treponema | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 3001 | 1 | 5067 | 0 | 4153 | 0 | 0 | 0 | 534 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3003 | 0 | 4659 | 0 | 2177 | 0 | 0 | 0 | 105 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3004 | 0 | 4342 | 0 | 3008 | 0 | 2 | 0 | 134 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3006 | 8 | 2910 | 0 | 4147 | 0 | 0 | 0 | 459 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3007 | 5 | 5630 | 0 | 4705 | 0 | 0 | 0 | 214 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3008 | 5 | 1868 | 0 | 1619 | 0 | 0 | 0 | 17 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " Bacteria.Actinobacteria Bacteria.Bacteroidetes Bacteria.Cyanobacteria\n",
+ "3001 1 5067 0 \n",
+ "3003 0 4659 0 \n",
+ "3004 0 4342 0 \n",
+ "3006 8 2910 0 \n",
+ "3007 5 5630 0 \n",
+ "3008 5 1868 0 \n",
+ " Bacteria.Firmicutes Bacteria.Fusobacteria Bacteria.Lentisphaerae\n",
+ "3001 4153 0 0 \n",
+ "3003 2177 0 0 \n",
+ "3004 3008 0 2 \n",
+ "3006 4147 0 0 \n",
+ "3007 4705 0 0 \n",
+ "3008 1619 0 0 \n",
+ " Bacteria.OD1 Bacteria.Proteobacteria Bacteria.Spirochaetes\n",
+ "3001 0 534 0 \n",
+ "3003 0 105 0 \n",
+ "3004 0 134 0 \n",
+ "3006 0 459 0 \n",
+ "3007 0 214 0 \n",
+ "3008 0 17 0 \n",
+ " Bacteria.Synergistetes ...\n",
+ "3001 0 ...\n",
+ "3003 0 ...\n",
+ "3004 0 ...\n",
+ "3006 0 ...\n",
+ "3007 0 ...\n",
+ "3008 0 ...\n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Moraxellaceae.Enhydrobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Vibrionales.Vibrionaceae.Vibrio\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Lysobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Brachyspiraceae.Brachyspira\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Spirochaetes.Spirochaetes.Spirochaetales.Spirochaetaceae.Treponema\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# unrarefied microbe count table\n",
+ "count <- as.matrix(read.table(\"../../code_Lin/cvs/data/combo_count_tab.txt\")) \n",
+ "dim(count)\n",
+ "head(count)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 98
\n",
+ "\t- 87
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 98\n",
+ "\\item 87\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 98\n",
+ "2. 87\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 98 87"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas | ... | Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter | Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria | Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio | Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |
\n",
+ "\n",
+ "\t3001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2878 | 0 | 69 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3003 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 578 | 180 | 0 | ... | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3004 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3503 | 143 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3006 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2162 | 1 | 0 | ... | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3007 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 4453 | 0 | 0 | ... | 9 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t3008 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 950 | 237 | 19 | ... | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas & ... & Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter & Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria & Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio & Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter & Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio & Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas & Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter & Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\\\\\n",
+ "\\hline\n",
+ "\t3001 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2878 & 0 & 69 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3003 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 578 & 180 & 0 & ... & 2 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3004 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3503 & 143 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3006 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2162 & 1 & 0 & ... & 3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3007 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 4453 & 0 & 0 & ... & 9 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t3008 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 950 & 237 & 19 & ... & 4 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas | ... | Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter | Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria | Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio | Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 3001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2878 | 0 | 69 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3003 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 578 | 180 | 0 | ... | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3004 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3503 | 143 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3006 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2162 | 1 | 0 | ... | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3007 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 4453 | 0 | 0 | ... | 9 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 3008 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 950 | 237 | 19 | ... | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 2 \n",
+ "3008 1 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides\n",
+ "3001 2878 \n",
+ "3003 578 \n",
+ "3004 3503 \n",
+ "3006 2162 \n",
+ "3007 4453 \n",
+ "3008 950 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella\n",
+ "3001 0 \n",
+ "3003 180 \n",
+ "3004 143 \n",
+ "3006 1 \n",
+ "3007 0 \n",
+ "3008 237 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas\n",
+ "3001 69 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 19 \n",
+ " ...\n",
+ "3001 ...\n",
+ "3003 ...\n",
+ "3004 ...\n",
+ "3006 ...\n",
+ "3007 ...\n",
+ "3008 ...\n",
+ " Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter\n",
+ "3001 0 \n",
+ "3003 2 \n",
+ "3004 0 \n",
+ "3006 3 \n",
+ "3007 9 \n",
+ "3008 4 \n",
+ " Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter\n",
+ "3001 0 \n",
+ "3003 1 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 1 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 \n",
+ " Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\n",
+ "3001 0 \n",
+ "3003 0 \n",
+ "3004 0 \n",
+ "3006 0 \n",
+ "3007 0 \n",
+ "3008 0 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# retain only microbes at genus level and exist at least one sample\n",
+ "depth <- sapply(strsplit(colnames(count), \"\\\\.\"), length)\n",
+ "x <- count[, depth == 6 & colSums(count != 0) >= 1] # 98 * 87\n",
+ "dim(x)\n",
+ "head(x)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 98
\n",
+ "\t- 87
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 98\n",
+ "\\item 87\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 98\n",
+ "2. 87\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 98 87"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas | ... | Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter | Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria | Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio | Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |
\n",
+ "\n",
+ "\t3001 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -0.8429202 | -9.500918 | -4.573665 | ... | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 |
\n",
+ "\t3003 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -2.2258386 | -3.392456 | -9.278560 | ... | -7.892265 | -9.278560 | -9.278560 | -8.585412 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 |
\n",
+ "\t3004 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -0.4151243 | -3.613655 | -9.269646 | ... | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 |
\n",
+ "\t3006 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -0.7426639 | -8.421453 | -9.114600 | ... | -7.322841 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 |
\n",
+ "\t3007 | -9.643356 | -9.643356 | -9.643356 | -8.257061 | -9.643356 | -9.643356 | -9.643356 | -0.5488753 | -9.643356 | -9.643356 | ... | -6.752984 | -9.643356 | -9.643356 | -8.950209 | -9.643356 | -9.643356 | -9.643356 | -9.643356 | -9.643356 | -9.643356 |
\n",
+ "\t3008 | -8.387085 | -8.387085 | -8.387085 | -7.693937 | -8.387085 | -8.387085 | -8.387085 | -0.8374753 | -2.225877 | -4.749498 | ... | -6.307643 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella & Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella & Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas & ... & Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter & Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria & Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio & Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter & Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio & Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas & Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus & Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter & Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\\\\\n",
+ "\\hline\n",
+ "\t3001 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -0.8429202 & -9.500918 & -4.573665 & ... & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 & -9.500918 \\\\\n",
+ "\t3003 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -2.2258386 & -3.392456 & -9.278560 & ... & -7.892265 & -9.278560 & -9.278560 & -8.585412 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 & -9.278560 \\\\\n",
+ "\t3004 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -0.4151243 & -3.613655 & -9.269646 & ... & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 & -9.269646 \\\\\n",
+ "\t3006 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -0.7426639 & -8.421453 & -9.114600 & ... & -7.322841 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 & -9.114600 \\\\\n",
+ "\t3007 & -9.643356 & -9.643356 & -9.643356 & -8.257061 & -9.643356 & -9.643356 & -9.643356 & -0.5488753 & -9.643356 & -9.643356 & ... & -6.752984 & -9.643356 & -9.643356 & -8.950209 & -9.643356 & -9.643356 & -9.643356 & -9.643356 & -9.643356 & -9.643356 \\\\\n",
+ "\t3008 & -8.387085 & -8.387085 & -8.387085 & -7.693937 & -8.387085 & -8.387085 & -8.387085 & -0.8374753 & -2.225877 & -4.749498 & ... & -6.307643 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 & -8.387085 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella | Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella | Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas | ... | Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter | Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria | Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio | Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter | Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio | Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas | Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus | Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter | Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 3001 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -0.8429202 | -9.500918 | -4.573665 | ... | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 | -9.500918 |\n",
+ "| 3003 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -2.2258386 | -3.392456 | -9.278560 | ... | -7.892265 | -9.278560 | -9.278560 | -8.585412 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 | -9.278560 |\n",
+ "| 3004 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -0.4151243 | -3.613655 | -9.269646 | ... | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 | -9.269646 |\n",
+ "| 3006 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -0.7426639 | -8.421453 | -9.114600 | ... | -7.322841 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 | -9.114600 |\n",
+ "| 3007 | -9.643356 | -9.643356 | -9.643356 | -8.257061 | -9.643356 | -9.643356 | -9.643356 | -0.5488753 | -9.643356 | -9.643356 | ... | -6.752984 | -9.643356 | -9.643356 | -8.950209 | -9.643356 | -9.643356 | -9.643356 | -9.643356 | -9.643356 | -9.643356 |\n",
+ "| 3008 | -8.387085 | -8.387085 | -8.387085 | -7.693937 | -8.387085 | -8.387085 | -8.387085 | -0.8374753 | -2.225877 | -4.749498 | ... | -6.307643 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 | -8.387085 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Asaccharobacter\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Atopobium\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Collinsella\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Eggerthella\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -8.257061 \n",
+ "3008 -7.693937 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Gordonibacter\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Olsenella\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Actinobacteria.Actinobacteria.Coriobacteriales.Coriobacteriaceae.Slackia\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Bacteroidaceae.Bacteroides\n",
+ "3001 -0.8429202 \n",
+ "3003 -2.2258386 \n",
+ "3004 -0.4151243 \n",
+ "3006 -0.7426639 \n",
+ "3007 -0.5488753 \n",
+ "3008 -0.8374753 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Barnesiella\n",
+ "3001 -9.500918 \n",
+ "3003 -3.392456 \n",
+ "3004 -3.613655 \n",
+ "3006 -8.421453 \n",
+ "3007 -9.643356 \n",
+ "3008 -2.225877 \n",
+ " Bacteria.Bacteroidetes.Bacteroidia.Bacteroidales.Porphyromonadaceae.Butyricimonas\n",
+ "3001 -4.573665 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -4.749498 \n",
+ " ...\n",
+ "3001 ...\n",
+ "3003 ...\n",
+ "3004 ...\n",
+ "3006 ...\n",
+ "3007 ...\n",
+ "3008 ...\n",
+ " Bacteria.Proteobacteria.Betaproteobacteria.Burkholderiales.Oxalobacteraceae.Oxalobacter\n",
+ "3001 -9.500918 \n",
+ "3003 -7.892265 \n",
+ "3004 -9.269646 \n",
+ "3006 -7.322841 \n",
+ "3007 -6.752984 \n",
+ "3008 -6.307643 \n",
+ " Bacteria.Proteobacteria.Betaproteobacteria.Neisseriales.Neisseriaceae.Neisseria\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Deltaproteobacteria.Desulfovibrionales.Desulfovibrionaceae.Desulfovibrio\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Epsilonproteobacteria.Campylobacterales.Campylobacteraceae.Campylobacter\n",
+ "3001 -9.500918 \n",
+ "3003 -8.585412 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -8.950209 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Aeromonadales.Succinivibrionaceae.Succinivibrio\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Pseudomonadales.Pseudomonadaceae.Pseudomonas\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Proteobacteria.Gammaproteobacteria.Xanthomonadales.Xanthomonadaceae.Stenotrophomonas\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Cloacibacillus\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Synergistetes.Synergistia.Synergistales.Synergistaceae.Pyramidobacter\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 \n",
+ " Bacteria.Verrucomicrobia.Verrucomicrobiae.Verrucomicrobiales.Verrucomicrobiaceae.Akkermansia\n",
+ "3001 -9.500918 \n",
+ "3003 -9.278560 \n",
+ "3004 -9.269646 \n",
+ "3006 -9.114600 \n",
+ "3007 -9.643356 \n",
+ "3008 -8.387085 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# add pesudo count 0.5\n",
+ "x[x == 0] <- 0.5\n",
+ "x <- x/rowSums(x) # relative abundance\n",
+ "taxa <- log(x)\n",
+ "dim(taxa)\n",
+ "head(taxa)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "pid | visitdate | birthdate | sex1m2f | heightcm | weightkg | vdate | bdate | age | heightm | bmi | zbmius | zbmicatus | bmicat1norm2ow3ob |
\n",
+ "\n",
+ "\t3029 | 12-Apr-10 | 1-May-86 | 1 | 172.43 | 83.0 | 18364 | 9617 | 23.94798 | 1.7243 | 27.91595 | NA | | 2 |
\n",
+ "\t3030 | 12-Apr-10 | 22-May-87 | 1 | 178.87 | 70.3 | 18364 | 10003 | 22.89117 | 1.7887 | 21.97254 | NA | | 1 |
\n",
+ "\t3031 | 20-Apr-10 | 1-Dec-82 | 2 | 157.60 | 52.0 | 18372 | 8370 | 27.38398 | 1.5760 | 20.93586 | NA | | 1 |
\n",
+ "\t3032 | 22-Apr-10 | 9-Feb-86 | 1 | 188.10 | 89.6 | 18374 | 9536 | 24.19712 | 1.8810 | 25.32389 | NA | | 2 |
\n",
+ "\t3033 | 22-Apr-10 | 9-Apr-86 | 2 | 170.03 | 65.2 | 18374 | 9595 | 24.03559 | 1.7003 | 22.55259 | NA | | 1 |
\n",
+ "\t3034 | 28-Apr-10 | 6-Feb-86 | 2 | 162.16 | 59.9 | 18380 | 9533 | 24.22177 | 1.6216 | 22.77925 | NA | | 1 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|llllllllllllll}\n",
+ " pid & visitdate & birthdate & sex1m2f & heightcm & weightkg & vdate & bdate & age & heightm & bmi & zbmius & zbmicatus & bmicat1norm2ow3ob\\\\\n",
+ "\\hline\n",
+ "\t 3029 & 12-Apr-10 & 1-May-86 & 1 & 172.43 & 83.0 & 18364 & 9617 & 23.94798 & 1.7243 & 27.91595 & NA & & 2 \\\\\n",
+ "\t 3030 & 12-Apr-10 & 22-May-87 & 1 & 178.87 & 70.3 & 18364 & 10003 & 22.89117 & 1.7887 & 21.97254 & NA & & 1 \\\\\n",
+ "\t 3031 & 20-Apr-10 & 1-Dec-82 & 2 & 157.60 & 52.0 & 18372 & 8370 & 27.38398 & 1.5760 & 20.93586 & NA & & 1 \\\\\n",
+ "\t 3032 & 22-Apr-10 & 9-Feb-86 & 1 & 188.10 & 89.6 & 18374 & 9536 & 24.19712 & 1.8810 & 25.32389 & NA & & 2 \\\\\n",
+ "\t 3033 & 22-Apr-10 & 9-Apr-86 & 2 & 170.03 & 65.2 & 18374 & 9595 & 24.03559 & 1.7003 & 22.55259 & NA & & 1 \\\\\n",
+ "\t 3034 & 28-Apr-10 & 6-Feb-86 & 2 & 162.16 & 59.9 & 18380 & 9533 & 24.22177 & 1.6216 & 22.77925 & NA & & 1 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| pid | visitdate | birthdate | sex1m2f | heightcm | weightkg | vdate | bdate | age | heightm | bmi | zbmius | zbmicatus | bmicat1norm2ow3ob |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 3029 | 12-Apr-10 | 1-May-86 | 1 | 172.43 | 83.0 | 18364 | 9617 | 23.94798 | 1.7243 | 27.91595 | NA | | 2 |\n",
+ "| 3030 | 12-Apr-10 | 22-May-87 | 1 | 178.87 | 70.3 | 18364 | 10003 | 22.89117 | 1.7887 | 21.97254 | NA | | 1 |\n",
+ "| 3031 | 20-Apr-10 | 1-Dec-82 | 2 | 157.60 | 52.0 | 18372 | 8370 | 27.38398 | 1.5760 | 20.93586 | NA | | 1 |\n",
+ "| 3032 | 22-Apr-10 | 9-Feb-86 | 1 | 188.10 | 89.6 | 18374 | 9536 | 24.19712 | 1.8810 | 25.32389 | NA | | 2 |\n",
+ "| 3033 | 22-Apr-10 | 9-Apr-86 | 2 | 170.03 | 65.2 | 18374 | 9595 | 24.03559 | 1.7003 | 22.55259 | NA | | 1 |\n",
+ "| 3034 | 28-Apr-10 | 6-Feb-86 | 2 | 162.16 | 59.9 | 18380 | 9533 | 24.22177 | 1.6216 | 22.77925 | NA | | 1 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " pid visitdate birthdate sex1m2f heightcm weightkg vdate bdate age \n",
+ "1 3029 12-Apr-10 1-May-86 1 172.43 83.0 18364 9617 23.94798\n",
+ "2 3030 12-Apr-10 22-May-87 1 178.87 70.3 18364 10003 22.89117\n",
+ "3 3031 20-Apr-10 1-Dec-82 2 157.60 52.0 18372 8370 27.38398\n",
+ "4 3032 22-Apr-10 9-Feb-86 1 188.10 89.6 18374 9536 24.19712\n",
+ "5 3033 22-Apr-10 9-Apr-86 2 170.03 65.2 18374 9595 24.03559\n",
+ "6 3034 28-Apr-10 6-Feb-86 2 162.16 59.9 18380 9533 24.22177\n",
+ " heightm bmi zbmius zbmicatus bmicat1norm2ow3ob\n",
+ "1 1.7243 27.91595 NA 2 \n",
+ "2 1.7887 21.97254 NA 1 \n",
+ "3 1.5760 20.93586 NA 1 \n",
+ "4 1.8810 25.32389 NA 2 \n",
+ "5 1.7003 22.55259 NA 1 \n",
+ "6 1.6216 22.77925 NA 1 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "98"
+ ],
+ "text/latex": [
+ "98"
+ ],
+ "text/markdown": [
+ "98"
+ ],
+ "text/plain": [
+ "[1] 98"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 21.6186
\n",
+ "\t- 21.82244
\n",
+ "\t- 20.03762
\n",
+ "\t- 20.82412
\n",
+ "\t- 22.66875
\n",
+ "\t- 24.97552
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 21.6186\n",
+ "\\item 21.82244\n",
+ "\\item 20.03762\n",
+ "\\item 20.82412\n",
+ "\\item 22.66875\n",
+ "\\item 24.97552\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 21.6186\n",
+ "2. 21.82244\n",
+ "3. 20.03762\n",
+ "4. 20.82412\n",
+ "5. 22.66875\n",
+ "6. 24.97552\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 21.61860 21.82244 20.03762 20.82412 22.66875 24.97552"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# metadata\n",
+ "demo <- read.delim(\"../../code_Lin/cvs/data/demographic.txt\")\n",
+ "head(demo)\n",
+ "y <- demo$bmi[match(rownames(count), demo$pid)]\n",
+ "length(y)\n",
+ "head(y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "'matrix'"
+ ],
+ "text/latex": [
+ "'matrix'"
+ ],
+ "text/markdown": [
+ "'matrix'"
+ ],
+ "text/plain": [
+ "[1] \"matrix\""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "'numeric'"
+ ],
+ "text/latex": [
+ "'numeric'"
+ ],
+ "text/markdown": [
+ "'numeric'"
+ ],
+ "text/plain": [
+ "[1] \"numeric\""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# check datatype\n",
+ "class(taxa); class(y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# save processed data\n",
+ "save(y, taxa, file='../BMI/BMI_Lin_2014.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data_application/notebooks_application/1.2 BMI_results_cts.ipynb b/data_application/notebooks_application/1.2 BMI_results_cts.ipynb
new file mode 100755
index 0000000..3d0c385
--- /dev/null
+++ b/data_application/notebooks_application/1.2 BMI_results_cts.ipynb
@@ -0,0 +1,402 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### BMI microbiome data application results for continuous outcome"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### method comparisons"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../BMI/results_cts/BMI_compLasso.RData')\n",
+ "load('../BMI/results_cts/BMI_elnet.RData')\n",
+ "load('../BMI/results_cts/BMI_lasso.RData')\n",
+ "load('../BMI/results_cts/BMI_rf.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.22
\n",
+ "\t- 21.59
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.22\n",
+ "\\item 21.59\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.22\n",
+ "2. 21.59\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.22 21.59"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_compLasso$stab_index, out_compLasso$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.14
\n",
+ "\t- 24.07
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.14\n",
+ "\\item 24.07\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.14\n",
+ "2. 24.07\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.14 24.07"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_lasso$stab_index, out_lasso$MSE_mean) "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.23
\n",
+ "\t- 25.33
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.23\n",
+ "\\item 25.33\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.23\n",
+ "2. 25.33\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.23 25.33"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_elnet$stab_index, out_elnet$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.02
\n",
+ "\t- 4.99
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.02\n",
+ "\\item 4.99\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.02\n",
+ "2. 4.99\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.02 4.99"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_rf$stab_index, out_rf$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "dataset | method | mse | stability |
\n",
+ "\n",
+ "\tbmi_gut | lasso | 24.07 | 0.14 |
\n",
+ "\tbmi_gut | elent | 25.33 | 0.23 |
\n",
+ "\tbmi_gut | rf | 4.99 | 0.02 |
\n",
+ "\tbmi_gut | compLasso | 21.59 | 0.22 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|llll}\n",
+ " dataset & method & mse & stability\\\\\n",
+ "\\hline\n",
+ "\t bmi\\_gut & lasso & 24.07 & 0.14 \\\\\n",
+ "\t bmi\\_gut & elent & 25.33 & 0.23 \\\\\n",
+ "\t bmi\\_gut & rf & 4.99 & 0.02 \\\\\n",
+ "\t bmi\\_gut & compLasso & 21.59 & 0.22 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| dataset | method | mse | stability |\n",
+ "|---|---|---|---|\n",
+ "| bmi_gut | lasso | 24.07 | 0.14 |\n",
+ "| bmi_gut | elent | 25.33 | 0.23 |\n",
+ "| bmi_gut | rf | 4.99 | 0.02 |\n",
+ "| bmi_gut | compLasso | 21.59 | 0.22 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " dataset method mse stability\n",
+ "1 bmi_gut lasso 24.07 0.14 \n",
+ "2 bmi_gut elent 25.33 0.23 \n",
+ "3 bmi_gut rf 4.99 0.02 \n",
+ "4 bmi_gut compLasso 21.59 0.22 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# combine and export results\n",
+ "bmi_gut = as.data.frame(matrix(NA, nrow=4, ncol=4))\n",
+ "colnames(bmi_gut) = c('dataset', 'method', 'mse', 'stability')\n",
+ "bmi_gut$dataset = 'bmi_gut'\n",
+ "bmi_gut$method = c('lasso', 'elent', 'rf', 'compLasso')\n",
+ "bmi_gut$mse = c(out_lasso$MSE_mean, out_elnet$MSE_mean, out_rf$MSE_mean, out_compLasso$MSE_mean)\n",
+ "bmi_gut$stability = c(out_lasso$stab_index, out_elnet$stab_index, out_rf$stab_index, out_compLasso$stab_index)\n",
+ "bmi_gut"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### hypothesis testing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../BMI/results_cts/BMI_boot_compLasso.RData')\n",
+ "load('../BMI/results_cts/BMI_boot_rf.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.2659"
+ ],
+ "text/latex": [
+ "0.2659"
+ ],
+ "text/markdown": [
+ "0.2659"
+ ],
+ "text/plain": [
+ "[1] 0.2659"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 2.5%
\n",
+ "\t\t- 0.17475
\n",
+ "\t- 97.5%
\n",
+ "\t\t- 0.34
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] 0.17475\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.34\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": 0.1747597.5%\n",
+ ": 0.34\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "0.17475 0.34000 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_stab = (boot_compLasso$stab_index - boot_rf$stab_index)\n",
+ "mean(diff_stab)\n",
+ "quantile(diff_stab, probs = c(0.025, 0.975)) \n",
+ "# CI doesn't contain zero: compLasso is significantly more stable than RF"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "11.8026936858767"
+ ],
+ "text/latex": [
+ "11.8026936858767"
+ ],
+ "text/markdown": [
+ "11.8026936858767"
+ ],
+ "text/plain": [
+ "[1] 11.80269"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 2.5%
\n",
+ "\t\t- -2.09038719853993
\n",
+ "\t- 97.5%
\n",
+ "\t\t- 41.1831613135692
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] -2.09038719853993\n",
+ "\\item[97.5\\textbackslash{}\\%] 41.1831613135692\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": -2.0903871985399397.5%\n",
+ ": 41.1831613135692\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "-2.090387 41.183161 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_mse = (unlist(boot_compLasso$MSE_list) - unlist(boot_rf$MSE_list)) # use all 100*100 MSEs\n",
+ "mean(diff_mse)\n",
+ "quantile(diff_mse, probs = c(0.025, 0.975)) \n",
+ "# CI contain zero: compLasso is not significantly different from RF based on MSE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data_application/notebooks_application/2.1.Soils_biomTableConversion (Python).ipynb b/data_application/notebooks_application/2.1.Soils_biomTableConversion (Python).ipynb
new file mode 100755
index 0000000..16a40f2
--- /dev/null
+++ b/data_application/notebooks_application/2.1.Soils_biomTableConversion (Python).ipynb
@@ -0,0 +1,1571 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# ref: the soil dataset used in balance tree\n",
+ "# https://msystems.asm.org/content/2/1/e00162-16"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "import biom\n",
+ "from biom.util import biom_open\n",
+ "import pandas as pd\n",
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def biom2pandas(file_biom, withTaxonomy=False, astype=int):\n",
+ " \"\"\" Converts a biom file into a Pandas.DataFrame\n",
+ " Parameters\n",
+ " ----------\n",
+ " file_biom : str\n",
+ " The path to the biom file.\n",
+ " withTaxonomy : bool\n",
+ " If TRUE, returns a second Pandas.Series with lineage information for\n",
+ " each feature, e.g. OTU or deblur-sequence. Default: FALSE\n",
+ " astype : type\n",
+ " datatype into each value of the biom table is casted. Default: int.\n",
+ " Use e.g. float if biom table contains relative abundances instead of\n",
+ " raw reads.\n",
+ " Returns\n",
+ " -------\n",
+ " A Pandas.DataFrame holding holding numerical values from the biom file.\n",
+ " If withTaxonomy is TRUE then a second Pandas.DataFrame is returned, holding\n",
+ " lineage information about each feature.\n",
+ " Raises\n",
+ " ------\n",
+ " IOError\n",
+ " If file_biom cannot be read.\n",
+ " ValueError\n",
+ " If withTaxonomy=TRUE but biom file does not hold taxonomy information.\n",
+ " \"\"\"\n",
+ " try:\n",
+ " table = biom.load_table(file_biom)\n",
+ " counts = pd.DataFrame(table.matrix_data.T.todense().astype(astype),\n",
+ " index=table.ids(axis='sample'),\n",
+ " columns=table.ids(axis='observation')).T\n",
+ " if withTaxonomy:\n",
+ " try:\n",
+ " md = table.metadata_to_dataframe('observation')\n",
+ " levels = [col\n",
+ " for col in md.columns\n",
+ " if col.startswith('taxonomy_')]\n",
+ " if levels == []:\n",
+ " raise ValueError(('No taxonomy information found in '\n",
+ " 'biom file.'))\n",
+ " else:\n",
+ " taxonomy = md.apply(lambda row:\n",
+ " \";\".join([row[l] for l in levels]),\n",
+ " axis=1)\n",
+ " return counts, taxonomy\n",
+ " except KeyError:\n",
+ " raise ValueError(('Biom file does not have any '\n",
+ " 'observation metadata!'))\n",
+ " else:\n",
+ " return counts\n",
+ " except IOError:\n",
+ " raise IOError('Cannot read file \"%s\"' % file_biom)\n",
+ "\n",
+ "\n",
+ "def pandas2biom(file_biom, table, taxonomy=None, err=sys.stderr):\n",
+ " \"\"\" Writes a Pandas.DataFrame into a biom file.\n",
+ " Parameters\n",
+ " ----------\n",
+ " file_biom: str\n",
+ " The filename of the BIOM file to be created.\n",
+ " table: a Pandas.DataFrame\n",
+ " The table that should be written as BIOM.\n",
+ " taxonomy : pandas.Series\n",
+ " Index is taxons corresponding to table, values are lineage strings like\n",
+ " 'k__Bacteria; p__Actinobacteria'\n",
+ " err : StringIO\n",
+ " Stream onto which errors / warnings should be printed.\n",
+ " Default is sys.stderr\n",
+ " Raises\n",
+ " ------\n",
+ " IOError\n",
+ " If file_biom cannot be written.\n",
+ " TODO\n",
+ " ----\n",
+ " 1) also store taxonomy information\n",
+ " \"\"\"\n",
+ " try:\n",
+ " bt = biom.Table(table.values,\n",
+ " observation_ids=table.index,\n",
+ " sample_ids=table.columns)\n",
+ "\n",
+ " # add taxonomy metadata if provided, i.e. is not None\n",
+ " if taxonomy is not None:\n",
+ " if not isinstance(taxonomy, pd.core.series.Series):\n",
+ " raise AttributeError('taxonomy must be a pandas.Series!')\n",
+ " idx_missing_intable = set(table.index) - set(taxonomy.index)\n",
+ " if len(idx_missing_intable) > 0:\n",
+ " err.write(('Warning: following %i taxa are not in the '\n",
+ " 'provided taxonomy:\\n%s\\n') % (\n",
+ " len(idx_missing_intable),\n",
+ " \", \".join(idx_missing_intable)))\n",
+ " missing = pd.Series(\n",
+ " index=idx_missing_intable,\n",
+ " name='taxonomy',\n",
+ " data='k__missing_lineage_information')\n",
+ " taxonomy = taxonomy.append(missing)\n",
+ " idx_missing_intaxonomy = set(taxonomy.index) - set(table.index)\n",
+ " if (len(idx_missing_intaxonomy) > 0) and err:\n",
+ " err.write(('Warning: following %i taxa are not in the '\n",
+ " 'provided count table, but in taxonomy:\\n%s\\n') % (\n",
+ " len(idx_missing_intaxonomy),\n",
+ " \", \".join(idx_missing_intaxonomy)))\n",
+ "\n",
+ " t = dict()\n",
+ " for taxon, linstr in taxonomy.iteritems():\n",
+ " # fill missing rank annotations with rank__\n",
+ " orig_lineage = {annot[0].lower(): annot\n",
+ " for annot\n",
+ " in (map(str.strip, linstr.split(';')))}\n",
+ " lineage = []\n",
+ " for rank in settings.RANKS:\n",
+ " rank_char = rank[0].lower()\n",
+ " if rank_char in orig_lineage:\n",
+ " lineage.append(orig_lineage[rank_char])\n",
+ " else:\n",
+ " lineage.append(rank_char+'__')\n",
+ " t[taxon] = {'taxonomy': \";\".join(lineage)}\n",
+ " bt.add_metadata(t, axis='observation')\n",
+ "\n",
+ " with biom_open(file_biom, 'w') as f:\n",
+ " bt.to_hdf5(f, \"example\")\n",
+ " except IOError:\n",
+ " raise IOError('Cannot write to file \"%s\"' % file_biom)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Balance_88soils"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(7396, 89)"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom = biom2pandas('../88soils/238_otu_table.biom')\n",
+ "soils_biom.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 103.CA2 | \n",
+ " 103.CO3 | \n",
+ " 103.SR3 | \n",
+ " 103.IE2 | \n",
+ " 103.BP1 | \n",
+ " 103.VC2 | \n",
+ " 103.SA2 | \n",
+ " 103.GB2 | \n",
+ " 103.CO2 | \n",
+ " 103.KP1 | \n",
+ " ... | \n",
+ " 103.LQ1 | \n",
+ " 103.HI1 | \n",
+ " 103.RT1 | \n",
+ " 103.HI2 | \n",
+ " 103.DF1 | \n",
+ " 103.CF3 | \n",
+ " 103.AR1 | \n",
+ " 103.TL1 | \n",
+ " 103.HI4 | \n",
+ " 103.BB1 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1124701 | \n",
+ " 15 | \n",
+ " 14 | \n",
+ " 1 | \n",
+ " 8 | \n",
+ " 13 | \n",
+ " 7 | \n",
+ " 6 | \n",
+ " 3 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 244336 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 973124 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
3 rows × 89 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 103.CA2 103.CO3 103.SR3 103.IE2 103.BP1 103.VC2 103.SA2 \\\n",
+ "1124701 15 14 1 8 13 7 6 \n",
+ "244336 0 0 0 1 0 0 0 \n",
+ "973124 0 0 0 0 0 0 0 \n",
+ "\n",
+ " 103.GB2 103.CO2 103.KP1 ... 103.LQ1 103.HI1 103.RT1 103.HI2 \\\n",
+ "1124701 3 2 2 ... 0 0 0 0 \n",
+ "244336 0 0 0 ... 0 0 0 0 \n",
+ "973124 0 0 1 ... 0 0 0 0 \n",
+ "\n",
+ " 103.DF1 103.CF3 103.AR1 103.TL1 103.HI4 103.BB1 \n",
+ "1124701 0 0 0 0 0 0 \n",
+ "244336 0 0 0 0 0 0 \n",
+ "973124 0 0 0 0 0 0 \n",
+ "\n",
+ "[3 rows x 89 columns]"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(7396, 1)"
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa = pd.read_csv('../88soils/88soils_taxonomy.txt', sep='\\t', index_col='Feature ID')\n",
+ "soils_taxa.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Taxon | \n",
+ "
\n",
+ " \n",
+ " Feature ID | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1000512 | \n",
+ " k__Bacteria;p__Actinobacteria;c__Thermoleophil... | \n",
+ "
\n",
+ " \n",
+ " 1000547 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... | \n",
+ "
\n",
+ " \n",
+ " 1000654 | \n",
+ " k__Bacteria;p__Bacteroidetes;c__Sphingobacteri... | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Taxon\n",
+ "Feature ID \n",
+ "1000512 k__Bacteria;p__Actinobacteria;c__Thermoleophil...\n",
+ "1000547 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob...\n",
+ "1000654 k__Bacteria;p__Bacteroidetes;c__Sphingobacteri..."
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 6 | \n",
+ "
\n",
+ " \n",
+ " Feature ID | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1000512 | \n",
+ " k__Bacteria | \n",
+ " p__Actinobacteria | \n",
+ " c__Thermoleophilia | \n",
+ " o__Gaiellales | \n",
+ " f__Gaiellaceae | \n",
+ " g__ | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ " 1000547 | \n",
+ " k__Bacteria | \n",
+ " p__Firmicutes | \n",
+ " c__Bacilli | \n",
+ " o__Lactobacillales | \n",
+ " f__Streptococcaceae | \n",
+ " g__Streptococcus | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ " 1000654 | \n",
+ " k__Bacteria | \n",
+ " p__Bacteroidetes | \n",
+ " c__Sphingobacteriia | \n",
+ " o__Sphingobacteriales | \n",
+ " f__Sphingobacteriaceae | \n",
+ " g__ | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ " 1000757 | \n",
+ " k__Bacteria | \n",
+ " p__Proteobacteria | \n",
+ " c__Alphaproteobacteria | \n",
+ " o__Rhizobiales | \n",
+ " f__Bradyrhizobiaceae | \n",
+ " g__ | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ " 1000876 | \n",
+ " k__Bacteria | \n",
+ " p__Actinobacteria | \n",
+ " c__Actinobacteria | \n",
+ " o__Actinomycetales | \n",
+ " f__Nocardioidaceae | \n",
+ " g__Nocardioides | \n",
+ " s__ | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2 \\\n",
+ "Feature ID \n",
+ "1000512 k__Bacteria p__Actinobacteria c__Thermoleophilia \n",
+ "1000547 k__Bacteria p__Firmicutes c__Bacilli \n",
+ "1000654 k__Bacteria p__Bacteroidetes c__Sphingobacteriia \n",
+ "1000757 k__Bacteria p__Proteobacteria c__Alphaproteobacteria \n",
+ "1000876 k__Bacteria p__Actinobacteria c__Actinobacteria \n",
+ "\n",
+ " 3 4 5 \\\n",
+ "Feature ID \n",
+ "1000512 o__Gaiellales f__Gaiellaceae g__ \n",
+ "1000547 o__Lactobacillales f__Streptococcaceae g__Streptococcus \n",
+ "1000654 o__Sphingobacteriales f__Sphingobacteriaceae g__ \n",
+ "1000757 o__Rhizobiales f__Bradyrhizobiaceae g__ \n",
+ "1000876 o__Actinomycetales f__Nocardioidaceae g__Nocardioides \n",
+ "\n",
+ " 6 \n",
+ "Feature ID \n",
+ "1000512 s__ \n",
+ "1000547 s__ \n",
+ "1000654 s__ \n",
+ "1000757 s__ \n",
+ "1000876 s__ "
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "taxa_new = soils_taxa.Taxon.str.split(pat=\";\", expand=True)\n",
+ "taxa_new.head(5)\n",
+ "# ref: https://www.geeksforgeeks.org/python-pandas-split-strings-into-two-list-columns-using-str-split/"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Taxon | \n",
+ " Genus | \n",
+ "
\n",
+ " \n",
+ " Feature ID | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1000512 | \n",
+ " k__Bacteria;p__Actinobacteria;c__Thermoleophil... | \n",
+ " g__ | \n",
+ "
\n",
+ " \n",
+ " 1000547 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... | \n",
+ " g__Streptococcus | \n",
+ "
\n",
+ " \n",
+ " 1000654 | \n",
+ " k__Bacteria;p__Bacteroidetes;c__Sphingobacteri... | \n",
+ " g__ | \n",
+ "
\n",
+ " \n",
+ " 1000757 | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteoba... | \n",
+ " g__ | \n",
+ "
\n",
+ " \n",
+ " 1000876 | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteri... | \n",
+ " g__Nocardioides | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Taxon \\\n",
+ "Feature ID \n",
+ "1000512 k__Bacteria;p__Actinobacteria;c__Thermoleophil... \n",
+ "1000547 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... \n",
+ "1000654 k__Bacteria;p__Bacteroidetes;c__Sphingobacteri... \n",
+ "1000757 k__Bacteria;p__Proteobacteria;c__Alphaproteoba... \n",
+ "1000876 k__Bacteria;p__Actinobacteria;c__Actinobacteri... \n",
+ "\n",
+ " Genus \n",
+ "Feature ID \n",
+ "1000512 g__ \n",
+ "1000547 g__Streptococcus \n",
+ "1000654 g__ \n",
+ "1000757 g__ \n",
+ "1000876 g__Nocardioides "
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa['Genus'] = taxa_new[5]\n",
+ "soils_taxa.head(5)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "g__ 5213\n",
+ "g__Rhodoplanes 144\n",
+ "g__Bacillus 110\n",
+ "g__Candidatus Solibacter 100\n",
+ "g__Flavobacterium 71\n",
+ " ... \n",
+ "g__Rhodocyclus 1\n",
+ "g__Marinobacter 1\n",
+ "g__Afipia 1\n",
+ "g__Candidatus Amoebophilus 1\n",
+ "g__Desulfotomaculum 1\n",
+ "Name: Genus, Length: 335, dtype: int64"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa.Genus.value_counts()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(2183, 2)"
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# only keep those with genus assignment\n",
+ "soils_taxa_sub = soils_taxa[soils_taxa.Genus != 'g__']\n",
+ "soils_taxa_sub.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Taxon | \n",
+ " Genus | \n",
+ "
\n",
+ " \n",
+ " Feature ID | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1000547 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... | \n",
+ " g__Streptococcus | \n",
+ "
\n",
+ " \n",
+ " 1000876 | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteri... | \n",
+ " g__Nocardioides | \n",
+ "
\n",
+ " \n",
+ " 1003206 | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteoba... | \n",
+ " g__Sphingomonas | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Taxon \\\n",
+ "Feature ID \n",
+ "1000547 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactob... \n",
+ "1000876 k__Bacteria;p__Actinobacteria;c__Actinobacteri... \n",
+ "1003206 k__Bacteria;p__Proteobacteria;c__Alphaproteoba... \n",
+ "\n",
+ " Genus \n",
+ "Feature ID \n",
+ "1000547 g__Streptococcus \n",
+ "1000876 g__Nocardioides \n",
+ "1003206 g__Sphingomonas "
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_taxa_sub.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(2183, 91)"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# partition biom table \n",
+ "soils_biom.index = soils_biom.index.astype('int64') \n",
+ "soils_biom_sub = soils_biom.merge(soils_taxa_sub, how='inner', left_index=True, right_index=True)\n",
+ "soils_biom_sub.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 103.CA2 | \n",
+ " 103.CO3 | \n",
+ " 103.SR3 | \n",
+ " 103.IE2 | \n",
+ " 103.BP1 | \n",
+ " 103.VC2 | \n",
+ " 103.SA2 | \n",
+ " 103.GB2 | \n",
+ " 103.CO2 | \n",
+ " 103.KP1 | \n",
+ " ... | \n",
+ " 103.RT1 | \n",
+ " 103.HI2 | \n",
+ " 103.DF1 | \n",
+ " 103.CF3 | \n",
+ " 103.AR1 | \n",
+ " 103.TL1 | \n",
+ " 103.HI4 | \n",
+ " 103.BB1 | \n",
+ " Taxon | \n",
+ " Genus | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 244336 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacill... | \n",
+ " g__Paenibacillus | \n",
+ "
\n",
+ " \n",
+ " 809489 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacill... | \n",
+ " g__Bacillus | \n",
+ "
\n",
+ " \n",
+ " 533625 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteoba... | \n",
+ " g__Novosphingobium | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
3 rows × 91 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 103.CA2 103.CO3 103.SR3 103.IE2 103.BP1 103.VC2 103.SA2 \\\n",
+ "244336 0 0 0 1 0 0 0 \n",
+ "809489 0 0 0 0 0 0 0 \n",
+ "533625 0 0 0 0 0 0 0 \n",
+ "\n",
+ " 103.GB2 103.CO2 103.KP1 ... 103.RT1 103.HI2 103.DF1 103.CF3 \\\n",
+ "244336 0 0 0 ... 0 0 0 0 \n",
+ "809489 1 0 0 ... 0 0 0 0 \n",
+ "533625 0 0 0 ... 0 0 0 0 \n",
+ "\n",
+ " 103.AR1 103.TL1 103.HI4 103.BB1 \\\n",
+ "244336 0 0 0 0 \n",
+ "809489 0 0 0 0 \n",
+ "533625 0 0 0 0 \n",
+ "\n",
+ " Taxon Genus \n",
+ "244336 k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacill... g__Paenibacillus \n",
+ "809489 k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacill... g__Bacillus \n",
+ "533625 k__Bacteria;p__Proteobacteria;c__Alphaproteoba... g__Novosphingobium \n",
+ "\n",
+ "[3 rows x 91 columns]"
+ ]
+ },
+ "execution_count": 40,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom_sub.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "soils_biom_sub.set_index('Taxon', inplace=True)\n",
+ "soils_biom_sub.drop(['Genus'], axis=1, inplace=True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(2183, 89)"
+ ]
+ },
+ "execution_count": 42,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom_sub.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 103.CA2 | \n",
+ " 103.CO3 | \n",
+ " 103.SR3 | \n",
+ " 103.IE2 | \n",
+ " 103.BP1 | \n",
+ " 103.VC2 | \n",
+ " 103.SA2 | \n",
+ " 103.GB2 | \n",
+ " 103.CO2 | \n",
+ " 103.KP1 | \n",
+ " ... | \n",
+ " 103.LQ1 | \n",
+ " 103.HI1 | \n",
+ " 103.RT1 | \n",
+ " 103.HI2 | \n",
+ " 103.DF1 | \n",
+ " 103.CF3 | \n",
+ " 103.AR1 | \n",
+ " 103.TL1 | \n",
+ " 103.HI4 | \n",
+ " 103.BB1 | \n",
+ "
\n",
+ " \n",
+ " Taxon | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Paenibacillaceae;g__Paenibacillus;s__ | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__muralis | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium;s__ | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
3 rows × 89 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 103.CA2 103.CO3 103.SR3 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.IE2 103.BP1 103.VC2 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 1 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.SA2 103.GB2 103.CO2 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 1 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.KP1 ... 103.LQ1 \\\n",
+ "Taxon ... \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 ... 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 ... 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 ... 0 \n",
+ "\n",
+ " 103.HI1 103.RT1 103.HI2 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.DF1 103.CF3 103.AR1 \\\n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ " 103.TL1 103.HI4 103.BB1 \n",
+ "Taxon \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacilla... 0 0 0 \n",
+ "k__Bacteria;p__Proteobacteria;c__Alphaproteobac... 0 0 0 \n",
+ "\n",
+ "[3 rows x 89 columns]"
+ ]
+ },
+ "execution_count": 43,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom_sub.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(89, 2183)"
+ ]
+ },
+ "execution_count": 44,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# transpose the dataframe \n",
+ "soils_biom_sub_t = soils_biom_sub.T\n",
+ "soils_biom_sub_t.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " Taxon | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Paenibacillaceae;g__Paenibacillus;s__ | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__muralis | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium;s__ | \n",
+ " k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus Solibacter;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__Rhodoplanes;s__ | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Nocardioidaceae;g__Nocardioides;s__ | \n",
+ " k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__Flavobacterium;s__ | \n",
+ " k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptococcaceae;g__Desulfotomaculum;s__ | \n",
+ " k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Koribacteraceae;g__Candidatus Koribacter;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Methylibium;s__ | \n",
+ " ... | \n",
+ " k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Rhodoferax;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae;g__Dokdonella;s__ | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Methylophilales;f__Methylophilaceae;g__Methylotenera;s__mobilis | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Streptomycetaceae;g__Streptomyces;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae;g__Luteimonas;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__Rhodoplanes;s__ | \n",
+ " k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Planococcaceae;g__Solibacillus;s__ | \n",
+ " k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Ramlibacter;s__ | \n",
+ " k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Mycobacteriaceae;g__Mycobacterium;s__ | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 103.CA2 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 103.CO3 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " 103.SR3 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
3 rows × 2183 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ "Taxon k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Paenibacillaceae;g__Paenibacillus;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__muralis \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Acidobacteria;c__Solibacteres;o__Solibacterales;f__Solibacteraceae;g__Candidatus Solibacter;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__Rhodoplanes;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Nocardioidaceae;g__Nocardioides;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__Flavobacterium;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptococcaceae;g__Desulfotomaculum;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Acidobacteria;c__Acidobacteriia;o__Acidobacteriales;f__Koribacteraceae;g__Candidatus Koribacter;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Methylibium;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon ... \\\n",
+ "103.CA2 ... \n",
+ "103.CO3 ... \n",
+ "103.SR3 ... \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Rhodoferax;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae;g__Dokdonella;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 1 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Methylophilales;f__Methylophilaceae;g__Methylotenera;s__mobilis \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Streptomycetaceae;g__Streptomyces;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales;f__Xanthomonadaceae;g__Luteimonas;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__Rhodoplanes;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Planococcaceae;g__Solibacillus;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Ramlibacter;s__ \\\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 1 \n",
+ "\n",
+ "Taxon k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Mycobacteriaceae;g__Mycobacterium;s__ \n",
+ "103.CA2 0 \n",
+ "103.CO3 1 \n",
+ "103.SR3 1 \n",
+ "\n",
+ "[3 rows x 2183 columns]"
+ ]
+ },
+ "execution_count": 45,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "soils_biom_sub_t.head(3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "count 89.000000\n",
+ "mean 275.932584\n",
+ "std 115.508244\n",
+ "min 1.000000\n",
+ "25% 213.000000\n",
+ "50% 254.000000\n",
+ "75% 319.000000\n",
+ "max 805.000000\n",
+ "dtype: float64"
+ ]
+ },
+ "execution_count": 46,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# make sure that each genus exist in at least one sample\n",
+ "soils_biom_sub_t.sum(axis=1).describe() # column sum"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# export\n",
+ "soils_biom_sub_t.to_csv('../88soils/88soils_genus_table.txt', sep='\\t')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "count 2183.000000\n",
+ "mean 11.249656\n",
+ "std 33.869668\n",
+ "min 1.000000\n",
+ "25% 1.000000\n",
+ "50% 3.000000\n",
+ "75% 8.000000\n",
+ "max 690.000000\n",
+ "dtype: float64"
+ ]
+ },
+ "execution_count": 48,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# check that not rarefied\n",
+ "soils_biom_sub_t.sum(axis=0).describe() # row sum"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.2"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/data_application/notebooks_application/2.2.soils_dataPreparation.ipynb b/data_application/notebooks_application/2.2.soils_dataPreparation.ipynb
new file mode 100755
index 0000000..2217a77
--- /dev/null
+++ b/data_application/notebooks_application/2.2.soils_dataPreparation.ipynb
@@ -0,0 +1,1060 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../88soils/88soils_genus_table.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 2183
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 2183\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 2183\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 2183"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "count = soil_otu\n",
+ "dim(count)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__ | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__ | k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__ | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__ | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__ | k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__ | k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__ | k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__ | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__ | ... | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2 | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28 |
\n",
+ "\n",
+ "\t103.CA2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
\n",
+ "\t103.CO3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
\n",
+ "\t103.SR3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
\n",
+ "\t103.IE2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 1 |
\n",
+ "\t103.BP1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
\n",
+ "\t103.VC2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Paenibacillaceae.g\\_\\_Paenibacillus.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Bacillaceae.g\\_\\_Bacillus.s\\_\\_muralis & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Sphingomonadales.f\\_\\_Sphingomonadaceae.g\\_\\_Novosphingobium.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Acidobacteria.c\\_\\_Solibacteres.o\\_\\_Solibacterales.f\\_\\_Solibacteraceae.g\\_\\_Candidatus.Solibacter.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Rhizobiales.f\\_\\_Hyphomicrobiaceae.g\\_\\_Rhodoplanes.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Nocardioidaceae.g\\_\\_Nocardioides.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Bacteroidetes.c\\_\\_Flavobacteriia.o\\_\\_Flavobacteriales.f\\_\\_Flavobacteriaceae.g\\_\\_Flavobacterium.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Clostridia.o\\_\\_Clostridiales.f\\_\\_Peptococcaceae.g\\_\\_Desulfotomaculum.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Acidobacteria.c\\_\\_Acidobacteriia.o\\_\\_Acidobacteriales.f\\_\\_Koribacteraceae.g\\_\\_Candidatus.Koribacter.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Methylibium.s\\_\\_ & ... & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Rhodoferax.s\\_\\_.4 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Gammaproteobacteria.o\\_\\_Xanthomonadales.f\\_\\_Xanthomonadaceae.g\\_\\_Dokdonella.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Lactobacillales.f\\_\\_Streptococcaceae.g\\_\\_Streptococcus.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Methylophilales.f\\_\\_Methylophilaceae.g\\_\\_Methylotenera.s\\_\\_mobilis.2 & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Streptomycetaceae.g\\_\\_Streptomyces.s\\_\\_.44 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Gammaproteobacteria.o\\_\\_Xanthomonadales.f\\_\\_Xanthomonadaceae.g\\_\\_Luteimonas.s\\_\\_.2 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Rhizobiales.f\\_\\_Hyphomicrobiaceae.g\\_\\_Rhodoplanes.s\\_\\_.142 & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Planococcaceae.g\\_\\_Solibacillus.s\\_\\_.3 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Ramlibacter.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Mycobacteriaceae.g\\_\\_Mycobacterium.s\\_\\_.28\\\\\n",
+ "\\hline\n",
+ "\t103.CA2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
+ "\t103.CO3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\\\\n",
+ "\t103.SR3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 \\\\\n",
+ "\t103.IE2 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 1 \\\\\n",
+ "\t103.BP1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\\\\n",
+ "\t103.VC2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & ... & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__ | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__ | k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__ | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__ | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__ | k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__ | k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__ | k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__ | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__ | ... | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2 | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28 |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 103.CA2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| 103.CO3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |\n",
+ "| 103.SR3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |\n",
+ "| 103.IE2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 1 |\n",
+ "| 103.BP1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |\n",
+ "| 103.VC2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 1 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 2 \n",
+ " ...\n",
+ "103.CA2 ...\n",
+ "103.CO3 ...\n",
+ "103.SR3 ...\n",
+ "103.IE2 ...\n",
+ "103.BP1 ...\n",
+ "103.VC2 ...\n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 1 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 2 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 0 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9\n",
+ "103.CA2 0 \n",
+ "103.CO3 0 \n",
+ "103.SR3 1 \n",
+ "103.IE2 0 \n",
+ "103.BP1 0 \n",
+ "103.VC2 0 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28\n",
+ "103.CA2 0 \n",
+ "103.CO3 1 \n",
+ "103.SR3 1 \n",
+ "103.IE2 1 \n",
+ "103.BP1 1 \n",
+ "103.VC2 1 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(count)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 2183
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 2183\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 2183\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 2183"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dim(count[, colMeans(count > 0) >= 0.5/100]) # 0.5%"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 2183
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 2183\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 2183\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 2183"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dim(count[, colMeans(count > 0) >= 1/100]) # 1%"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 1297
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 1297\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 1297\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 1297"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dim(count[, colMeans(count > 0) >= 2/100]) # 2%"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 576
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 576\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 576\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 576"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dim(count[, colMeans(count > 0) >= 5/100]) # 5%"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 89
\n",
+ "\t- 2183
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 89\n",
+ "\\item 2183\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 89\n",
+ "2. 2183\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 89 2183"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__ | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__ | k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__ | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__ | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__ | k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__ | k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__ | k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__ | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__ | ... | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2 | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28 |
\n",
+ "\n",
+ "\t103.CA2 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | ... | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 |
\n",
+ "\t103.CO3 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | ... | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.199305 |
\n",
+ "\t103.SR3 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | ... | -7.942718 | -7.249570 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.249570 | -7.249570 |
\n",
+ "\t103.IE2 | -7.208230 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | ... | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -6.515083 | -7.901377 | -7.901377 | -7.208230 |
\n",
+ "\t103.BP1 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | ... | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.087991 |
\n",
+ "\t103.VC2 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -6.439350 | ... | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.132498 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n",
+ " & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Paenibacillaceae.g\\_\\_Paenibacillus.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Bacillaceae.g\\_\\_Bacillus.s\\_\\_muralis & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Sphingomonadales.f\\_\\_Sphingomonadaceae.g\\_\\_Novosphingobium.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Acidobacteria.c\\_\\_Solibacteres.o\\_\\_Solibacterales.f\\_\\_Solibacteraceae.g\\_\\_Candidatus.Solibacter.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Rhizobiales.f\\_\\_Hyphomicrobiaceae.g\\_\\_Rhodoplanes.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Nocardioidaceae.g\\_\\_Nocardioides.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Bacteroidetes.c\\_\\_Flavobacteriia.o\\_\\_Flavobacteriales.f\\_\\_Flavobacteriaceae.g\\_\\_Flavobacterium.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Clostridia.o\\_\\_Clostridiales.f\\_\\_Peptococcaceae.g\\_\\_Desulfotomaculum.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Acidobacteria.c\\_\\_Acidobacteriia.o\\_\\_Acidobacteriales.f\\_\\_Koribacteraceae.g\\_\\_Candidatus.Koribacter.s\\_\\_ & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Methylibium.s\\_\\_ & ... & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Rhodoferax.s\\_\\_.4 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Gammaproteobacteria.o\\_\\_Xanthomonadales.f\\_\\_Xanthomonadaceae.g\\_\\_Dokdonella.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Lactobacillales.f\\_\\_Streptococcaceae.g\\_\\_Streptococcus.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Methylophilales.f\\_\\_Methylophilaceae.g\\_\\_Methylotenera.s\\_\\_mobilis.2 & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Streptomycetaceae.g\\_\\_Streptomyces.s\\_\\_.44 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Gammaproteobacteria.o\\_\\_Xanthomonadales.f\\_\\_Xanthomonadaceae.g\\_\\_Luteimonas.s\\_\\_.2 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Alphaproteobacteria.o\\_\\_Rhizobiales.f\\_\\_Hyphomicrobiaceae.g\\_\\_Rhodoplanes.s\\_\\_.142 & k\\_\\_Bacteria.p\\_\\_Firmicutes.c\\_\\_Bacilli.o\\_\\_Bacillales.f\\_\\_Planococcaceae.g\\_\\_Solibacillus.s\\_\\_.3 & k\\_\\_Bacteria.p\\_\\_Proteobacteria.c\\_\\_Betaproteobacteria.o\\_\\_Burkholderiales.f\\_\\_Comamonadaceae.g\\_\\_Ramlibacter.s\\_\\_.9 & k\\_\\_Bacteria.p\\_\\_Actinobacteria.c\\_\\_Actinobacteria.o\\_\\_Actinomycetales.f\\_\\_Mycobacteriaceae.g\\_\\_Mycobacterium.s\\_\\_.28\\\\\n",
+ "\\hline\n",
+ "\t103.CA2 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & ... & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436 & -7.828436\\\\\n",
+ "\t103.CO3 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & ... & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.892452 & -7.199305\\\\\n",
+ "\t103.SR3 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & ... & -7.942718 & -7.249570 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.942718 & -7.249570 & -7.249570\\\\\n",
+ "\t103.IE2 & -7.208230 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & ... & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -7.901377 & -6.515083 & -7.901377 & -7.901377 & -7.208230\\\\\n",
+ "\t103.BP1 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & ... & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.781139 & -7.087991\\\\\n",
+ "\t103.VC2 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -6.439350 & ... & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.825645 & -7.132498\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__ | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__ | k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__ | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__ | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__ | k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__ | k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__ | k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__ | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__ | ... | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44 | k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2 | k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142 | k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3 | k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9 | k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28 |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 103.CA2 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | ... | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 | -7.828436 |\n",
+ "| 103.CO3 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | ... | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.892452 | -7.199305 |\n",
+ "| 103.SR3 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | ... | -7.942718 | -7.249570 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.942718 | -7.249570 | -7.249570 |\n",
+ "| 103.IE2 | -7.208230 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | ... | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -7.901377 | -6.515083 | -7.901377 | -7.901377 | -7.208230 |\n",
+ "| 103.BP1 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | ... | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.781139 | -7.087991 |\n",
+ "| 103.VC2 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -6.439350 | ... | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.825645 | -7.132498 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Paenibacillaceae.g__Paenibacillus.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.208230 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Bacillaceae.g__Bacillus.s__muralis\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Sphingomonadales.f__Sphingomonadaceae.g__Novosphingobium.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Acidobacteria.c__Solibacteres.o__Solibacterales.f__Solibacteraceae.g__Candidatus.Solibacter.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Nocardioidaceae.g__Nocardioides.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Bacteroidetes.c__Flavobacteriia.o__Flavobacteriales.f__Flavobacteriaceae.g__Flavobacterium.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Peptococcaceae.g__Desulfotomaculum.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Acidobacteria.c__Acidobacteriia.o__Acidobacteriales.f__Koribacteraceae.g__Candidatus.Koribacter.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Methylibium.s__\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -6.439350 \n",
+ " ...\n",
+ "103.CA2 ...\n",
+ "103.CO3 ...\n",
+ "103.SR3 ...\n",
+ "103.IE2 ...\n",
+ "103.BP1 ...\n",
+ "103.VC2 ...\n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Rhodoferax.s__.4\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Dokdonella.s__.9\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.249570 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Lactobacillales.f__Streptococcaceae.g__Streptococcus.s__.9\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Methylophilales.f__Methylophilaceae.g__Methylotenera.s__mobilis.2\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Streptomycetaceae.g__Streptomyces.s__.44\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Xanthomonadales.f__Xanthomonadaceae.g__Luteimonas.s__.2\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Alphaproteobacteria.o__Rhizobiales.f__Hyphomicrobiaceae.g__Rhodoplanes.s__.142\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -6.515083 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Firmicutes.c__Bacilli.o__Bacillales.f__Planococcaceae.g__Solibacillus.s__.3\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.942718 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Proteobacteria.c__Betaproteobacteria.o__Burkholderiales.f__Comamonadaceae.g__Ramlibacter.s__.9\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.892452 \n",
+ "103.SR3 -7.249570 \n",
+ "103.IE2 -7.901377 \n",
+ "103.BP1 -7.781139 \n",
+ "103.VC2 -7.825645 \n",
+ " k__Bacteria.p__Actinobacteria.c__Actinobacteria.o__Actinomycetales.f__Mycobacteriaceae.g__Mycobacterium.s__.28\n",
+ "103.CA2 -7.828436 \n",
+ "103.CO3 -7.199305 \n",
+ "103.SR3 -7.249570 \n",
+ "103.IE2 -7.208230 \n",
+ "103.BP1 -7.087991 \n",
+ "103.VC2 -7.132498 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# add pesudo count 0.5\n",
+ "x = count # preprossed done already\n",
+ "x[x == 0] <- 0.5\n",
+ "x <- x/rowSums(x) # relative abundance\n",
+ "taxa <- log(x)\n",
+ "dim(taxa)\n",
+ "head(taxa)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 'BarcodeSequence'
\n",
+ "\t- 'LinkerPrimerSequence'
\n",
+ "\t- 'barcode_read_group_tag'
\n",
+ "\t- 'dna_extracted_prep'
\n",
+ "\t- 'experiment_alias'
\n",
+ "\t- 'experiment_center'
\n",
+ "\t- 'experiment_design_description'
\n",
+ "\t- 'experiment_title'
\n",
+ "\t- 'instrument_name'
\n",
+ "\t- 'key_seq'
\n",
+ "\t- 'library_construction_protocol'
\n",
+ "\t- 'linker'
\n",
+ "\t- 'pcr_primers'
\n",
+ "\t- 'physical_specimen_remaining_prep'
\n",
+ "\t- 'platform'
\n",
+ "\t- 'pool_member_name'
\n",
+ "\t- 'pool_proportion'
\n",
+ "\t- 'primer_read_group_tag'
\n",
+ "\t- 'region'
\n",
+ "\t- 'run_alias'
\n",
+ "\t- 'run_center'
\n",
+ "\t- 'run_date'
\n",
+ "\t- 'run_prefix'
\n",
+ "\t- 'samp_size'
\n",
+ "\t- 'sample_center'
\n",
+ "\t- 'sample_type_prep'
\n",
+ "\t- 'sequencing_meth'
\n",
+ "\t- 'study_center'
\n",
+ "\t- 'study_ref'
\n",
+ "\t- 'target_gene'
\n",
+ "\t- 'target_subfragment'
\n",
+ "\t- 'altitude'
\n",
+ "\t- 'annual_season_precpt'
\n",
+ "\t- 'annual_season_temp'
\n",
+ "\t- 'anonymized_name'
\n",
+ "\t- 'assigned_from_geo'
\n",
+ "\t- 'carb_nitro_ratio'
\n",
+ "\t- 'cmin_rate'
\n",
+ "\t- 'collection_date'
\n",
+ "\t- 'common_name'
\n",
+ "\t- 'country'
\n",
+ "\t- 'depth'
\n",
+ "\t- 'dna_extracted'
\n",
+ "\t- 'elevation'
\n",
+ "\t- 'env_biome'
\n",
+ "\t- 'env_feature'
\n",
+ "\t- 'env_matter'
\n",
+ "\t- 'host_subject_id'
\n",
+ "\t- 'latitude'
\n",
+ "\t- 'longitude'
\n",
+ "\t- 'ph'
\n",
+ "\t- 'physical_specimen_remaining'
\n",
+ "\t- 'project_name'
\n",
+ "\t- 'public'
\n",
+ "\t- 'sample_type'
\n",
+ "\t- 'silt_clay'
\n",
+ "\t- 'soil_moisture_deficit'
\n",
+ "\t- 'soil_type'
\n",
+ "\t- 'specific_location'
\n",
+ "\t- 'taxon_id'
\n",
+ "\t- 'texture'
\n",
+ "\t- 'title'
\n",
+ "\t- 'tot_org_carb'
\n",
+ "\t- 'tot_org_nitro'
\n",
+ "\t- 'Description'
\n",
+ "\t- 'ph2'
\n",
+ "\t- 'ph3'
\n",
+ "\t- 'ph4'
\n",
+ "\t- 'ph_rounded'
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 'BarcodeSequence'\n",
+ "\\item 'LinkerPrimerSequence'\n",
+ "\\item 'barcode\\_read\\_group\\_tag'\n",
+ "\\item 'dna\\_extracted\\_prep'\n",
+ "\\item 'experiment\\_alias'\n",
+ "\\item 'experiment\\_center'\n",
+ "\\item 'experiment\\_design\\_description'\n",
+ "\\item 'experiment\\_title'\n",
+ "\\item 'instrument\\_name'\n",
+ "\\item 'key\\_seq'\n",
+ "\\item 'library\\_construction\\_protocol'\n",
+ "\\item 'linker'\n",
+ "\\item 'pcr\\_primers'\n",
+ "\\item 'physical\\_specimen\\_remaining\\_prep'\n",
+ "\\item 'platform'\n",
+ "\\item 'pool\\_member\\_name'\n",
+ "\\item 'pool\\_proportion'\n",
+ "\\item 'primer\\_read\\_group\\_tag'\n",
+ "\\item 'region'\n",
+ "\\item 'run\\_alias'\n",
+ "\\item 'run\\_center'\n",
+ "\\item 'run\\_date'\n",
+ "\\item 'run\\_prefix'\n",
+ "\\item 'samp\\_size'\n",
+ "\\item 'sample\\_center'\n",
+ "\\item 'sample\\_type\\_prep'\n",
+ "\\item 'sequencing\\_meth'\n",
+ "\\item 'study\\_center'\n",
+ "\\item 'study\\_ref'\n",
+ "\\item 'target\\_gene'\n",
+ "\\item 'target\\_subfragment'\n",
+ "\\item 'altitude'\n",
+ "\\item 'annual\\_season\\_precpt'\n",
+ "\\item 'annual\\_season\\_temp'\n",
+ "\\item 'anonymized\\_name'\n",
+ "\\item 'assigned\\_from\\_geo'\n",
+ "\\item 'carb\\_nitro\\_ratio'\n",
+ "\\item 'cmin\\_rate'\n",
+ "\\item 'collection\\_date'\n",
+ "\\item 'common\\_name'\n",
+ "\\item 'country'\n",
+ "\\item 'depth'\n",
+ "\\item 'dna\\_extracted'\n",
+ "\\item 'elevation'\n",
+ "\\item 'env\\_biome'\n",
+ "\\item 'env\\_feature'\n",
+ "\\item 'env\\_matter'\n",
+ "\\item 'host\\_subject\\_id'\n",
+ "\\item 'latitude'\n",
+ "\\item 'longitude'\n",
+ "\\item 'ph'\n",
+ "\\item 'physical\\_specimen\\_remaining'\n",
+ "\\item 'project\\_name'\n",
+ "\\item 'public'\n",
+ "\\item 'sample\\_type'\n",
+ "\\item 'silt\\_clay'\n",
+ "\\item 'soil\\_moisture\\_deficit'\n",
+ "\\item 'soil\\_type'\n",
+ "\\item 'specific\\_location'\n",
+ "\\item 'taxon\\_id'\n",
+ "\\item 'texture'\n",
+ "\\item 'title'\n",
+ "\\item 'tot\\_org\\_carb'\n",
+ "\\item 'tot\\_org\\_nitro'\n",
+ "\\item 'Description'\n",
+ "\\item 'ph2'\n",
+ "\\item 'ph3'\n",
+ "\\item 'ph4'\n",
+ "\\item 'ph\\_rounded'\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 'BarcodeSequence'\n",
+ "2. 'LinkerPrimerSequence'\n",
+ "3. 'barcode_read_group_tag'\n",
+ "4. 'dna_extracted_prep'\n",
+ "5. 'experiment_alias'\n",
+ "6. 'experiment_center'\n",
+ "7. 'experiment_design_description'\n",
+ "8. 'experiment_title'\n",
+ "9. 'instrument_name'\n",
+ "10. 'key_seq'\n",
+ "11. 'library_construction_protocol'\n",
+ "12. 'linker'\n",
+ "13. 'pcr_primers'\n",
+ "14. 'physical_specimen_remaining_prep'\n",
+ "15. 'platform'\n",
+ "16. 'pool_member_name'\n",
+ "17. 'pool_proportion'\n",
+ "18. 'primer_read_group_tag'\n",
+ "19. 'region'\n",
+ "20. 'run_alias'\n",
+ "21. 'run_center'\n",
+ "22. 'run_date'\n",
+ "23. 'run_prefix'\n",
+ "24. 'samp_size'\n",
+ "25. 'sample_center'\n",
+ "26. 'sample_type_prep'\n",
+ "27. 'sequencing_meth'\n",
+ "28. 'study_center'\n",
+ "29. 'study_ref'\n",
+ "30. 'target_gene'\n",
+ "31. 'target_subfragment'\n",
+ "32. 'altitude'\n",
+ "33. 'annual_season_precpt'\n",
+ "34. 'annual_season_temp'\n",
+ "35. 'anonymized_name'\n",
+ "36. 'assigned_from_geo'\n",
+ "37. 'carb_nitro_ratio'\n",
+ "38. 'cmin_rate'\n",
+ "39. 'collection_date'\n",
+ "40. 'common_name'\n",
+ "41. 'country'\n",
+ "42. 'depth'\n",
+ "43. 'dna_extracted'\n",
+ "44. 'elevation'\n",
+ "45. 'env_biome'\n",
+ "46. 'env_feature'\n",
+ "47. 'env_matter'\n",
+ "48. 'host_subject_id'\n",
+ "49. 'latitude'\n",
+ "50. 'longitude'\n",
+ "51. 'ph'\n",
+ "52. 'physical_specimen_remaining'\n",
+ "53. 'project_name'\n",
+ "54. 'public'\n",
+ "55. 'sample_type'\n",
+ "56. 'silt_clay'\n",
+ "57. 'soil_moisture_deficit'\n",
+ "58. 'soil_type'\n",
+ "59. 'specific_location'\n",
+ "60. 'taxon_id'\n",
+ "61. 'texture'\n",
+ "62. 'title'\n",
+ "63. 'tot_org_carb'\n",
+ "64. 'tot_org_nitro'\n",
+ "65. 'Description'\n",
+ "66. 'ph2'\n",
+ "67. 'ph3'\n",
+ "68. 'ph4'\n",
+ "69. 'ph_rounded'\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ " [1] \"BarcodeSequence\" \"LinkerPrimerSequence\" \n",
+ " [3] \"barcode_read_group_tag\" \"dna_extracted_prep\" \n",
+ " [5] \"experiment_alias\" \"experiment_center\" \n",
+ " [7] \"experiment_design_description\" \"experiment_title\" \n",
+ " [9] \"instrument_name\" \"key_seq\" \n",
+ "[11] \"library_construction_protocol\" \"linker\" \n",
+ "[13] \"pcr_primers\" \"physical_specimen_remaining_prep\"\n",
+ "[15] \"platform\" \"pool_member_name\" \n",
+ "[17] \"pool_proportion\" \"primer_read_group_tag\" \n",
+ "[19] \"region\" \"run_alias\" \n",
+ "[21] \"run_center\" \"run_date\" \n",
+ "[23] \"run_prefix\" \"samp_size\" \n",
+ "[25] \"sample_center\" \"sample_type_prep\" \n",
+ "[27] \"sequencing_meth\" \"study_center\" \n",
+ "[29] \"study_ref\" \"target_gene\" \n",
+ "[31] \"target_subfragment\" \"altitude\" \n",
+ "[33] \"annual_season_precpt\" \"annual_season_temp\" \n",
+ "[35] \"anonymized_name\" \"assigned_from_geo\" \n",
+ "[37] \"carb_nitro_ratio\" \"cmin_rate\" \n",
+ "[39] \"collection_date\" \"common_name\" \n",
+ "[41] \"country\" \"depth\" \n",
+ "[43] \"dna_extracted\" \"elevation\" \n",
+ "[45] \"env_biome\" \"env_feature\" \n",
+ "[47] \"env_matter\" \"host_subject_id\" \n",
+ "[49] \"latitude\" \"longitude\" \n",
+ "[51] \"ph\" \"physical_specimen_remaining\" \n",
+ "[53] \"project_name\" \"public\" \n",
+ "[55] \"sample_type\" \"silt_clay\" \n",
+ "[57] \"soil_moisture_deficit\" \"soil_type\" \n",
+ "[59] \"specific_location\" \"taxon_id\" \n",
+ "[61] \"texture\" \"title\" \n",
+ "[63] \"tot_org_carb\" \"tot_org_nitro\" \n",
+ "[65] \"Description\" \"ph2\" \n",
+ "[67] \"ph3\" \"ph4\" \n",
+ "[69] \"ph_rounded\" "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "colnames(demo)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "89"
+ ],
+ "text/latex": [
+ "89"
+ ],
+ "text/markdown": [
+ "89"
+ ],
+ "text/plain": [
+ "[1] 89"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 8.02
\n",
+ "\t- 6.02
\n",
+ "\t- 6.95
\n",
+ "\t- 5.52
\n",
+ "\t- 7.53
\n",
+ "\t- 5.99
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 8.02\n",
+ "\\item 6.02\n",
+ "\\item 6.95\n",
+ "\\item 5.52\n",
+ "\\item 7.53\n",
+ "\\item 5.99\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 8.02\n",
+ "2. 6.02\n",
+ "3. 6.95\n",
+ "4. 5.52\n",
+ "5. 7.53\n",
+ "6. 5.99\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 8.02 6.02 6.95 5.52 7.53 5.99"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# metadata\n",
+ "mf <- read.csv(\"../88soils/88soils_modified_metadata.txt\", sep='\\t', row.names=1)\n",
+ "y <- mf$ph[match(rownames(count), rownames(mf))]\n",
+ "length(y)\n",
+ "head(y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "'matrix'"
+ ],
+ "text/latex": [
+ "'matrix'"
+ ],
+ "text/markdown": [
+ "'matrix'"
+ ],
+ "text/plain": [
+ "[1] \"matrix\""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "'numeric'"
+ ],
+ "text/latex": [
+ "'numeric'"
+ ],
+ "text/markdown": [
+ "'numeric'"
+ ],
+ "text/plain": [
+ "[1] \"numeric\""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# check datatype\n",
+ "class(taxa); class(y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# save processed data\n",
+ "save(y, taxa, file='../88soils/soils_ph.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data_application/notebooks_application/2.3 soils_results_cts.ipynb b/data_application/notebooks_application/2.3 soils_results_cts.ipynb
new file mode 100755
index 0000000..34fdcc6
--- /dev/null
+++ b/data_application/notebooks_application/2.3 soils_results_cts.ipynb
@@ -0,0 +1,402 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Soil microbiome data application results for continuous outcome"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### method comparisons"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../88soils/results_cts/soils_ph_compLasso.RData')\n",
+ "load('../88soils/results_cts/soils_ph_elnet.RData')\n",
+ "load('../88soils/results_cts/soils_ph_lasso.RData')\n",
+ "load('../88soils/results_cts/soils_ph_rf.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.39
\n",
+ "\t- 0.46
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.39\n",
+ "\\item 0.46\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.39\n",
+ "2. 0.46\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.39 0.46"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_compLasso$stab_index, out_compLasso$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.31
\n",
+ "\t- 0.34
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.31\n",
+ "\\item 0.34\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.31\n",
+ "2. 0.34\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.31 0.34"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_lasso$stab_index, out_lasso$MSE_mean) "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.16
\n",
+ "\t- 0.23
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.16\n",
+ "\\item 0.23\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.16\n",
+ "2. 0.23\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.16 0.23"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_elnet$stab_index, out_elnet$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 0.04
\n",
+ "\t- 0.26
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{enumerate*}\n",
+ "\\item 0.04\n",
+ "\\item 0.26\n",
+ "\\end{enumerate*}\n"
+ ],
+ "text/markdown": [
+ "1. 0.04\n",
+ "2. 0.26\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "[1] 0.04 0.26"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "c(out_rf$stab_index, out_rf$MSE_mean)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "dataset | method | mse | stability |
\n",
+ "\n",
+ "\tsoil_88 | lasso | 0.34 | 0.31 |
\n",
+ "\tsoil_88 | elent | 0.23 | 0.16 |
\n",
+ "\tsoil_88 | rf | 0.26 | 0.04 |
\n",
+ "\tsoil_88 | compLasso | 0.46 | 0.39 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{tabular}{r|llll}\n",
+ " dataset & method & mse & stability\\\\\n",
+ "\\hline\n",
+ "\t soil\\_88 & lasso & 0.34 & 0.31 \\\\\n",
+ "\t soil\\_88 & elent & 0.23 & 0.16 \\\\\n",
+ "\t soil\\_88 & rf & 0.26 & 0.04 \\\\\n",
+ "\t soil\\_88 & compLasso & 0.46 & 0.39 \\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "| dataset | method | mse | stability |\n",
+ "|---|---|---|---|\n",
+ "| soil_88 | lasso | 0.34 | 0.31 |\n",
+ "| soil_88 | elent | 0.23 | 0.16 |\n",
+ "| soil_88 | rf | 0.26 | 0.04 |\n",
+ "| soil_88 | compLasso | 0.46 | 0.39 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " dataset method mse stability\n",
+ "1 soil_88 lasso 0.34 0.31 \n",
+ "2 soil_88 elent 0.23 0.16 \n",
+ "3 soil_88 rf 0.26 0.04 \n",
+ "4 soil_88 compLasso 0.46 0.39 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# combine and export results\n",
+ "soil_88 = as.data.frame(matrix(NA, nrow=4, ncol=4))\n",
+ "colnames(soil_88) = c('dataset', 'method', 'mse', 'stability')\n",
+ "soil_88$dataset = 'soil_88'\n",
+ "soil_88$method = c('lasso', 'elent', 'rf', 'compLasso')\n",
+ "soil_88$mse = c(out_lasso$MSE_mean, out_elnet$MSE_mean, out_rf$MSE_mean, out_compLasso$MSE_mean)\n",
+ "soil_88$stability = c(out_lasso$stab_index, out_elnet$stab_index, out_rf$stab_index, out_compLasso$stab_index)\n",
+ "soil_88"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### hypothesis testing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load('../88soils/results_cts/soils_ph_boot_compLasso.RData')\n",
+ "load('../88soils/results_cts/soils_ph_boot_rf.RData')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.3595"
+ ],
+ "text/latex": [
+ "0.3595"
+ ],
+ "text/markdown": [
+ "0.3595"
+ ],
+ "text/plain": [
+ "[1] 0.3595"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 2.5%
\n",
+ "\t\t- 0.28
\n",
+ "\t- 97.5%
\n",
+ "\t\t- 0.44
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] 0.28\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.44\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": 0.2897.5%\n",
+ ": 0.44\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ " 0.28 0.44 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_stab = (boot_compLasso$stab_index - boot_rf$stab_index)\n",
+ "mean(diff_stab)\n",
+ "quantile(diff_stab, probs = c(0.025, 0.975)) \n",
+ "# CI doesn't contain zero: compLasso is significantly more stable than RF"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.0811150164524993"
+ ],
+ "text/latex": [
+ "0.0811150164524993"
+ ],
+ "text/markdown": [
+ "0.0811150164524993"
+ ],
+ "text/plain": [
+ "[1] 0.08111502"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\t- 2.5%
\n",
+ "\t\t- -0.283912044397921
\n",
+ "\t- 97.5%
\n",
+ "\t\t- 0.945808111396662
\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] -0.283912044397921\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.945808111396662\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": -0.28391204439792197.5%\n",
+ ": 0.945808111396662\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "-0.2839120 0.9458081 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_mse = (unlist(boot_compLasso$MSE_list) - unlist(boot_rf$MSE_list)) # use all 100*100 MSEs\n",
+ "mean(diff_mse)\n",
+ "quantile(diff_mse, probs = c(0.025, 0.975)) \n",
+ "# CI contain zero: compLasso is not significantly different from RF based on MSE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data_application/notebooks_application/3_applications_bin_results.ipynb b/data_application/notebooks_application/3_applications_bin_results.ipynb
new file mode 100644
index 0000000..fa46c90
--- /dev/null
+++ b/data_application/notebooks_application/3_applications_bin_results.ipynb
@@ -0,0 +1,505 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Results for real microbiome data applications"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dir = '/panfs/panfs1.ucsd.edu/panscratch/lij014/Stability_2020/data_applications/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### BMI dataset application"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load(paste0(dir, '/BMI_binary_GenCompLasso.RData'))\n",
+ "load(paste0(dir, '/BMI_binary_lasso.RData', sep=''))\n",
+ "load(paste0(dir, '/BMI_binary_elnet.RData', sep=''))\n",
+ "load(paste0(dir, '/BMI_binary_rf.RData', sep=''))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 4 × 4\n",
+ "\n",
+ "\tdataset | method | ROC | stability |
\n",
+ "\t<chr> | <chr> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\tbmi_gut | lasso | 0.63 | 0.14 |
\n",
+ "\tbmi_gut | elent | 0.78 | 0.19 |
\n",
+ "\tbmi_gut | rf | 1.00 | 0.01 |
\n",
+ "\tbmi_gut | compLasso | 0.85 | 0.29 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 4 × 4\n",
+ "\\begin{tabular}{llll}\n",
+ " dataset & method & ROC & stability\\\\\n",
+ " & & & \\\\\n",
+ "\\hline\n",
+ "\t bmi\\_gut & lasso & 0.63 & 0.14\\\\\n",
+ "\t bmi\\_gut & elent & 0.78 & 0.19\\\\\n",
+ "\t bmi\\_gut & rf & 1.00 & 0.01\\\\\n",
+ "\t bmi\\_gut & compLasso & 0.85 & 0.29\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 4 × 4\n",
+ "\n",
+ "| dataset <chr> | method <chr> | ROC <dbl> | stability <dbl> |\n",
+ "|---|---|---|---|\n",
+ "| bmi_gut | lasso | 0.63 | 0.14 |\n",
+ "| bmi_gut | elent | 0.78 | 0.19 |\n",
+ "| bmi_gut | rf | 1.00 | 0.01 |\n",
+ "| bmi_gut | compLasso | 0.85 | 0.29 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " dataset method ROC stability\n",
+ "1 bmi_gut lasso 0.63 0.14 \n",
+ "2 bmi_gut elent 0.78 0.19 \n",
+ "3 bmi_gut rf 1.00 0.01 \n",
+ "4 bmi_gut compLasso 0.85 0.29 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "bmi_gut = as.data.frame(matrix(NA, nrow=4, ncol=4))\n",
+ "colnames(bmi_gut) = c('dataset', 'method', 'ROC', 'stability')\n",
+ "bmi_gut$dataset = 'bmi_gut'\n",
+ "bmi_gut$method = c('lasso', 'elent', 'rf', 'compLasso')\n",
+ "bmi_gut$ROC = c(out_lasso$ROC_mean, out_elnet$ROC_mean, out_rf$ROC_mean, out_GenCompLasso$ROC_mean)\n",
+ "bmi_gut$stability = c(out_lasso$stab_index, out_elnet$stab_index, out_rf$stab_index, out_GenCompLasso$stab_index)\n",
+ "bmi_gut"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# testing\n",
+ "load(paste0(dir, '/BMI_binary_boot_rf.RData'))\n",
+ "load(paste0(dir, '/BMI_binary_boot_compLasso.RData'))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.2997"
+ ],
+ "text/latex": [
+ "0.2997"
+ ],
+ "text/markdown": [
+ "0.2997"
+ ],
+ "text/plain": [
+ "[1] 0.2997"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "- 2.5%
- 0.11
- 97.5%
- 0.41525
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] 0.11\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.41525\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": 0.1197.5%\n",
+ ": 0.41525\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "0.11000 0.41525 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_stab = (boot_compLasso$stab_index - boot_rf$stab_index)\n",
+ "mean(diff_stab)\n",
+ "quantile(diff_stab, probs = c(0.025, 0.975)) \n",
+ "# CI doesn't contain zero: compLasso is significantly more stable than RF"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "-0.0942798529453605"
+ ],
+ "text/latex": [
+ "-0.0942798529453605"
+ ],
+ "text/markdown": [
+ "-0.0942798529453605"
+ ],
+ "text/plain": [
+ "[1] -0.09427985"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "- 2.5%
- -0.19020132947925
- 97.5%
- -0.020967051775781
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] -0.19020132947925\n",
+ "\\item[97.5\\textbackslash{}\\%] -0.020967051775781\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": -0.1902013294792597.5%\n",
+ ": -0.020967051775781\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "-0.19020133 -0.02096705 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_ROC = (unlist(boot_compLasso$ROC_list) - unlist(boot_rf$ROC_list)) # use all 100*100 ROCs\n",
+ "mean(diff_ROC)\n",
+ "quantile(diff_ROC, probs = c(0.025, 0.975)) \n",
+ "### CI doesn't contain zero: compLasso is significantly different from RF based on ROC\n",
+ "### although very close to zero (now ROC has similar scale as Stability, Stability still better differentiation)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 88 soils dataset application"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "load(paste0(dir, '/soils_binary_ph_GenCompLasso.RData', sep=''))\n",
+ "load(paste0(dir, '/soils_binary_ph_lasso.RData', sep=''))\n",
+ "load(paste0(dir, '/soils_binary_ph_elnet.RData', sep=''))\n",
+ "load(paste0(dir, '/soils_binary_ph_rf.RData', sep=''))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 4 × 4\n",
+ "\n",
+ "\tdataset | method | ROC | stability |
\n",
+ "\t<chr> | <chr> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\tsoil_88 | lasso | 0.90 | 0.28 |
\n",
+ "\tsoil_88 | elent | 0.94 | 0.32 |
\n",
+ "\tsoil_88 | rf | 1.00 | 0.03 |
\n",
+ "\tsoil_88 | compLasso | 0.96 | 0.46 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 4 × 4\n",
+ "\\begin{tabular}{llll}\n",
+ " dataset & method & ROC & stability\\\\\n",
+ " & & & \\\\\n",
+ "\\hline\n",
+ "\t soil\\_88 & lasso & 0.90 & 0.28\\\\\n",
+ "\t soil\\_88 & elent & 0.94 & 0.32\\\\\n",
+ "\t soil\\_88 & rf & 1.00 & 0.03\\\\\n",
+ "\t soil\\_88 & compLasso & 0.96 & 0.46\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 4 × 4\n",
+ "\n",
+ "| dataset <chr> | method <chr> | ROC <dbl> | stability <dbl> |\n",
+ "|---|---|---|---|\n",
+ "| soil_88 | lasso | 0.90 | 0.28 |\n",
+ "| soil_88 | elent | 0.94 | 0.32 |\n",
+ "| soil_88 | rf | 1.00 | 0.03 |\n",
+ "| soil_88 | compLasso | 0.96 | 0.46 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " dataset method ROC stability\n",
+ "1 soil_88 lasso 0.90 0.28 \n",
+ "2 soil_88 elent 0.94 0.32 \n",
+ "3 soil_88 rf 1.00 0.03 \n",
+ "4 soil_88 compLasso 0.96 0.46 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "soil_88 = as.data.frame(matrix(NA, nrow=4, ncol=4))\n",
+ "colnames(soil_88) = c('dataset', 'method', 'ROC', 'stability')\n",
+ "soil_88$dataset = 'soil_88'\n",
+ "soil_88$method = c('lasso', 'elent', 'rf', 'compLasso')\n",
+ "soil_88$ROC = c(out_lasso$ROC_mean, out_elnet$ROC_mean, out_rf$ROC_mean, out_GenCompLasso$ROC_mean)\n",
+ "soil_88$stability = c(out_lasso$stab_index, out_elnet$stab_index, out_rf$stab_index, out_GenCompLasso$stab_index)\n",
+ "soil_88"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# testing\n",
+ "load(paste0(dir, '/soils_binary_ph_boot_rf.RData'))\n",
+ "load(paste0(dir, '/soils_binary_ph_boot_compLasso.RData'))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "0.4323"
+ ],
+ "text/latex": [
+ "0.4323"
+ ],
+ "text/markdown": [
+ "0.4323"
+ ],
+ "text/plain": [
+ "[1] 0.4323"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "- 2.5%
- 0.37
- 97.5%
- 0.5
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] 0.37\n",
+ "\\item[97.5\\textbackslash{}\\%] 0.5\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": 0.3797.5%\n",
+ ": 0.5\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ " 0.37 0.50 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_stab = (boot_compLasso$stab_index - boot_rf$stab_index)\n",
+ "mean(diff_stab)\n",
+ "quantile(diff_stab, probs = c(0.025, 0.975)) \n",
+ "# CI doesn't contain zero: compLasso is significantly more stable than RF"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "-0.0247120820917325"
+ ],
+ "text/latex": [
+ "-0.0247120820917325"
+ ],
+ "text/markdown": [
+ "-0.0247120820917325"
+ ],
+ "text/plain": [
+ "[1] -0.02471208"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "- 2.5%
- -0.0838574423480084
- 97.5%
- 0
\n"
+ ],
+ "text/latex": [
+ "\\begin{description*}\n",
+ "\\item[2.5\\textbackslash{}\\%] -0.0838574423480084\n",
+ "\\item[97.5\\textbackslash{}\\%] 0\n",
+ "\\end{description*}\n"
+ ],
+ "text/markdown": [
+ "2.5%\n",
+ ": -0.083857442348008497.5%\n",
+ ": 0\n",
+ "\n"
+ ],
+ "text/plain": [
+ " 2.5% 97.5% \n",
+ "-0.08385744 0.00000000 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "diff_ROC = (unlist(boot_compLasso$ROC_list) - unlist(boot_rf$ROC_list)) # use all 100*100 ROCs\n",
+ "mean(diff_ROC)\n",
+ "quantile(diff_ROC, probs = c(0.025, 0.975)) \n",
+ "# CI contain zero: compLasso is not significantly different from RF based on ROC"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ " Min. 1st Qu. Median Mean 3rd Qu. Max. \n",
+ "-0.24949 -0.04206 -0.02250 -0.02471 0.00000 0.03769 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "summary(diff_ROC)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/ms_writing_figs/.DS_Store b/ms_writing_figs/.DS_Store
deleted file mode 100755
index 5008ddf..0000000
Binary files a/ms_writing_figs/.DS_Store and /dev/null differ
diff --git a/simulations/.DS_Store b/simulations/.DS_Store
deleted file mode 100755
index 33cb786..0000000
Binary files a/simulations/.DS_Store and /dev/null differ
diff --git a/simulations/code_sim_bin/.Rhistory b/simulations/code_sim_bin/.Rhistory
new file mode 100644
index 0000000..b24e8d4
--- /dev/null
+++ b/simulations/code_sim_bin/.Rhistory
@@ -0,0 +1,58 @@
+6000.02/200.89
+512.18+438.53+430.93+403.07+161.12+94.60+72.56+24.66+0.60
+2138.25-1480.6
+85.02+243.51
+10+85.53+55.85+26.47+7.92+107.19+33.69
+110+85.02+250
+2138.25-143.32
+2188.21
+2188.21/201.16
+2188.21/201.1
+2188.21/201.2
+175.21
+175.21/87.6
+2806.70+1245+444.69
+7479.45-1121.20-4496.39
+5000-1625*2
+813.98+1750
+2940*2+2560
+4500+1500+1000
+1440*12
+x <- "hello xxx other stuff"
+x
+sub(" xxx.*", "", x)
+x <- "hello (xxx) other stuff"
+x
+sub(" (.*", "", x)
+sub(" (*", "", x)
+x.split(' ')
+fake_variable <- ' Country name is (FR)'
+modified_fake_variable <- stringr::str_extract(string = fake_variable,
+pattern = "(?<=\\().*(?=\\))")
+modified_fake_variable
+modified_fake_variable <- stringr::str_extract(string = fake_variable,
+pattern = "(?>=\\()")
+modified_fake_variable
+sub("\\|.*", "", x)
+x
+sub("\\(.*", "", x)
+5.5*3+6*2
+174.21/87.11
+174.21/87
+761.62/12
+setwd("~/Documents/Stability/github_code_history/stability-analyses/data_application/code_applications/code_cts")
+source('../../code_method/cv_method.R')
+source('../../code_method/getStability.R')
+source('../../code_method/stab_data_applications.R')
+source('../../code_method/bootstrap_test_compLasso_rf.R')
+#####################################
+##### data preparation ##############
+#####################################
+load('../../88soils/88soils_genus_table.RData')
+source('../../../code_method/cv_method.R')
+load('../../88soils/88soils_genus_table.RData')
+setwd("~/Documents/Stability/github_code_history/stability-analyses/simulations/code_sim_bin")
+source('../../code_method/cv_method_binary_update.R')
+source('../../../code_method/cv_method_binary_update.R')
+source('../../../code_method/getStability.R')
+source('../../code_method/getStability.R')
diff --git a/simulations/code_sim_bin/block_results_binary_update.R b/simulations/code_sim_bin/block_results_binary_update.R
new file mode 100644
index 0000000..90817d8
--- /dev/null
+++ b/simulations/code_sim_bin/block_results_binary_update.R
@@ -0,0 +1,102 @@
+#####################################################################################
+### run all methods on simulated data with block correlation ##################
+#####################################################################################
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+library(FSA)
+source('../../code_method/getStability.R')
+source('../../code_method/cv_method_binary_update.R')
+source('cv_sim_apply_binary_update.R')
+
+dim.list = list()
+size = c(50, 100, 500, 1000)
+idx = 0
+for (P in size){
+ for (N in size){
+ idx = idx + 1
+ dim.list[[idx]] = c(P=P, N=N)
+ }
+}
+
+## correlation strength
+rou.list = seq(0.1, 0.9, 0.2)
+
+files = NULL
+for (rou in rou.list){
+ for (dim in dim.list){
+ p = dim[1]
+ n = dim[2]
+ files = cbind(files, paste0(dir, '/sim_block_corr', rou, paste('P', p, 'N', n, sep='_'), '.RData', sep=''))
+ }
+}
+
+#------------------------
+# compare methods
+#------------------------
+i <- as.numeric(Sys.getenv("PBS_ARRAYID"))
+
+###################
+#### Lasso ########
+###################
+print('Lasso')
+file_name = files[[i]]
+results_block_lasso = sim_evaluate_cv(sim_file=file_name, method='lasso', family='binomial')
+
+# when i = 73, sim_idx = 4, ROC cannot be calculated as y.test has all 0s
+# thus change seed number to avoid this extreme example
+# results_block_lasso = sim_evaluate_cv(sim_file=file_name, method='lasso',
+# family='binomial', seednum=5231)
+
+save(file=paste0(dir, '/binary_update/block_Lasso_binary_', i, '.RData'), results_block_lasso, file_name)
+
+
+#########################
+#### Elastic Net ########
+#########################
+print('Elnet')
+file_name = files[[i]]
+results_block_elnet= sim_evaluate_cv(sim_file=file_name, method='elnet', family='binomial')
+
+when i = 73, sim_idx = 4, ROC cannot be calculated as y.test has all 0s
+thus change seed number to avoid this extreme example
+results_block_elnet= sim_evaluate_cv(sim_file=file_name, method='elnet',
+ family='binomial', seednum=5231)
+
+save(file=paste0(dir, '/binary_update/block_Elnet_binary_', i, '.RData'), results_block_elnet, file_name)
+
+############################
+#### Random Forests ########
+############################
+print('Random Forests')
+file_name = files[[i]]
+results_block_rf = sim_evaluate_cv(sim_file=file_name, method='RF')
+
+when i = 73, sim_idx = 4, ROC cannot be calculated as y.test has all 0s
+thus change seed number to avoid this extreme example
+results_block_rf = sim_evaluate_cv(sim_file=file_name, method='RF', seednum=5231)
+
+save(file=paste0(dir, '/binary_update/block_RF_binary_', i, '.RData'), results_block_rf, file_name)
+
+# #########################################################
+# #### Generalized compositional Lasso ########
+# #########################################################
+print('Generalized Compositional Lasso')
+file_name = files[[i]]
+results_block_GenCompLasso = sim_evaluate_cv(sim_file=file_name,
+ method='GenCompLasso')
+
+save(file=paste0(dir, '/binary_update/block_GenCompLasso_binary_', i, '.RData'),
+ results_block_GenCompLasso, file_name)
+
+results_block_GenCompLasso_dataSplit = sim_evaluate_cv(sim_file=file_name,
+ method='GenCompLasso',
+ data.split=TRUE)
+
+# when i = 73, sim_idx = 4, ROC cannot be calculated as y.test has all 0s
+# thus change seed number to avoid this extreme example
+# results_block_GenCompLasso_dataSplit = sim_evaluate_cv(sim_file=file_name,
+# method='GenCompLasso',
+# data.split=TRUE, seednum=5231)
+
diff --git a/simulations/code_sim_bin/boot_sim_binary.R b/simulations/code_sim_bin/boot_sim_binary.R
new file mode 100644
index 0000000..92167e0
--- /dev/null
+++ b/simulations/code_sim_bin/boot_sim_binary.R
@@ -0,0 +1,46 @@
+#-----------------------------------------------------------------------------------------------
+# hypothesis testing based on boostrapped confidence interval in selected simulation scenarios
+#-----------------------------------------------------------------------------------------------
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+library(FSA)
+source('../../code_method/cv_method_binary_update.R')
+source('../../code_method/getStability.R')
+source('../../code_method/bootstrap_test_compLasso_rf_binary.R')
+
+M <- as.numeric(Sys.getenv("PBS_ARRAYID"))
+
+if (M == 1){
+ print('toeplitz: random forests')
+ toe_rf = boot_stab_sim(num_boot=100, sim_file= paste0(dir, '/sim_toeplitz_corr0.5P_1000_N_100.RData', sep=''),
+ method= 'RF', seednum=31, ratio.training=0.8, fold.cv=10,
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05)
+ save(toe_rf, file=paste0(dir, '/binary_update/boot_toe_RF_binary.RData'))
+}else if (M == 2){
+ print('toeplitz: generalized compositional lasso')
+ toe_genCompLasso = boot_stab_sim(num_boot=100, sim_file= paste0(dir, '/sim_toeplitz_corr0.5P_1000_N_100.RData', sep=''),
+ method= 'GenCompLasso', seednum=31, ratio.training=0.8, fold.cv=10,
+ lambda.coda=seq(0.1, 0.2, 0.01))
+ save(toe_genCompLasso, file=paste0(dir, '/binary_update/boot_toe_genCompLasso_binary.RData'))
+}else if (M == 3){
+ print('block: random forests')
+ block_rf = boot_stab_sim(num_boot=100, sim_file= paste0(dir, '/sim_block_corr0.5P_1000_N_100.RData', sep=''),
+ method= 'RF', seednum=31, ratio.training=0.8, fold.cv=10,
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05)
+
+ save(block_rf, file=paste0(dir, '/binary_update/boot_block_RF_binary.RData'))
+}else if (M == 4){
+ print('block: generalized compositional lasso')
+ block_genCompLasso = boot_stab_sim(num_boot=100, sim_file= paste0(dir, '/sim_block_corr0.5P_1000_N_100.RData', sep=''),
+ method= 'GenCompLasso', seednum=31, ratio.training=0.8, fold.cv=10,
+ lambda.coda=seq(0.1, 0.2, 0.01))
+ save(block_genCompLasso, file=paste0(dir, '/binary_update/boot_block_genCompLasso_binary.RData'))
+}
+
+
+
+
+
+
diff --git a/simulations/code_sim_bin/cv_sim_apply_binary_update.R b/simulations/code_sim_bin/cv_sim_apply_binary_update.R
new file mode 100644
index 0000000..855c352
--- /dev/null
+++ b/simulations/code_sim_bin/cv_sim_apply_binary_update.R
@@ -0,0 +1,82 @@
+#######################################################################################
+### apply different feature selection methods to simulated data #######################
+#######################################################################################
+
+# library(FSA) # for se()
+# source('cv_method_binary_update.R')
+# source('getStability.R')
+
+sim_evaluate_cv = function(sim_file, method, seednum=31, ratio.training=0.8,
+ fold.cv=10, family='binomial',
+ lambda.grid=exp(seq(-4, -2, 0.2)),
+ alpha.grid=seq(0.1, 0.9, 0.1),
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05,
+ method.perm='altmann',
+ lambda.coda=seq(0.1, 0.2, 0.01), data.split=FALSE){
+ # load simulated data
+ load(sim_file, dat <- new.env())
+
+ idx.start = 1; idx.stop = 100 # 100 repetitions for each simulated scenario
+ rou = dat$sim_array[[1]]$rou # rou, n, p are same across all repetitions
+ n = dat$sim_array[[1]]$n
+ p = dat$sim_array[[1]]$p
+
+ # evaluating different methods
+ fp = fn = roc = NULL
+ stability.table = matrix(rep(0, (idx.stop - idx.start + 1) * p), ncol=p)
+ colnames(stability.table) = paste('V', seq(1:p), sep='')
+
+ for (i in idx.start:idx.stop){
+ print(paste0('simulation idx: ', i))
+ sub = dat$sim_array[[i]]
+ coef = sub$beta
+ coef.true = which(coef != 0)
+ y_binary = as.factor(ifelse(sub$Y >= median(sub$Y), 1, 0))
+
+ if (method == 'lasso'){
+ result = lasso_cv(y=y_binary, datx=sub$Z, seednum=seednum,family=family, lambda.choice='lambda.1se',
+ ratio.training=ratio.training, fold.cv=fold.cv, lambda.grid=lambda.grid)
+ select.features = result$coef.chosen
+ } else if (method == 'elnet'){
+ result = elnet_cv(y=y_binary, datx=sub$Z, seednum=seednum,family=family, alpha.grid=alpha.grid,
+ ratio.training=ratio.training, fold.cv=fold.cv, lambda.grid=lambda.grid)
+ select.features = result$coef.chosen
+ } else if (method == 'RF'){
+ result = randomForest_cv(y=y_binary, datx=sub$Z, seednum=seednum, fold.cv=fold.cv,
+ num_trees=num_trees, mtry.grid = mtry.grid, pval_thr=pval_thr, method.perm=method.perm)
+ select.features = result$coef.chosen
+ } else if (method == 'GenCompLasso'){
+ # note that generalized compositonal lasso do X to Z transformation within
+ result = gen_cons_lasso_cv (y=y_binary, datx=sub$X, seednum=seednum,
+ data.split=data.split,
+ ratio.training=ratio.training,
+ lambda.coda=lambda.coda)
+ select.features = result$coef.chosen
+ }
+
+ # false positives: shouldn't be chosen but chosen
+ fp = c(fp, length(setdiff(select.features, coef.true)))
+
+ # false negatives: # should be chosen yet not
+ fn = c(fn, length(setdiff(coef.true, select.features)))
+
+ # classification error
+ roc = c(roc, result$ROC)
+
+ # stability table
+ stability.table[i, select.features] = 1
+ }
+
+ # store results
+ FP = paste(mean(fp), '(', round(se(fp),2), ')')
+ FN = paste(mean(fn), '(', round(se(fn),2), ')')
+ ROC = paste(round(mean(roc, na.rm=T),2), '(', round(se(roc, na.rm=T),2), ')')
+ Stab = round(getStability(stability.table)$stability, 2)
+
+ results=list(rou=rou, n=n, p=p, FP=FP, FN=FN, ROC=ROC, Stab=Stab,
+ Stab.table=stability.table, FP.list=fp, FN.list=fn, ROC.list=roc)
+
+}
+
+
+
diff --git a/simulations/code_sim_bin/ind_results_binary_update.R b/simulations/code_sim_bin/ind_results_binary_update.R
new file mode 100644
index 0000000..478b295
--- /dev/null
+++ b/simulations/code_sim_bin/ind_results_binary_update.R
@@ -0,0 +1,72 @@
+#####################################################################################
+### run all methods on simulated data with independent correlation ##################
+#####################################################################################
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+#dir = '../sim_data'
+
+library(FSA)
+source('../../code_method/cv_method_binary_update.R')
+source('../../code_method/getStability.R')
+source('cv_sim_apply_binary_update.R')
+
+
+dim.list = list()
+size = c(50, 100, 500, 1000)
+idx = 0
+for (P in size){
+ for (N in size){
+ idx = idx + 1
+ dim.list[[idx]] = c(P=P, N=N)
+ }
+}
+
+files = NULL
+for (dim in dim.list){
+ p = dim[1]
+ n = dim[2]
+ files = cbind(files, paste0(dir, '/sim_independent_', paste('P', p, 'N', n, sep='_'), '.RData'))
+}
+
+#------------------------
+# compare methods
+#------------------------
+i <- as.numeric(Sys.getenv("PBS_ARRAYID"))
+
+# ###################
+# #### Lasso ########
+# ###################
+print('Lasso')
+file_name = files[[i]]
+results_ind_lasso = sim_evaluate_cv(sim_file=file_name, method='lasso', family='binomial')
+
+save(file=paste0(dir, '/binary_update/ind_Lasso_binary_', i, '.RData'), results_ind_lasso, file_name)
+
+# # #########################
+# # #### Elastic Net ########
+# # #########################
+print('Elnet')
+file_name = files[[i]]
+results_ind_elnet = sim_evaluate_cv(sim_file=file_name, method='elnet', family='binomial')
+
+save(file=paste0(dir, '/binary_update/ind_Elnet_binary_', i, '.RData'), results_ind_elnet, file_name)
+
+# ############################
+# #### Random Forests ########
+# ############################
+print('Random Forests')
+file_name = files[[i]]
+results_ind_rf = sim_evaluate_cv(sim_file=file_name, method='RF')
+save(file=paste0(dir, '/binary_update/ind_RF_binary_', i, '.RData'), results_ind_rf, file_name)
+
+# #########################################################
+# #### Generalized compositional Lasso ########
+# #########################################################
+print('Generalized Compositional Lasso')
+file_name = files[[i]]
+results_ind_GenCompLasso = sim_evaluate_cv (sim_file=file_name, method='GenCompLasso')
+
+save(file=paste0(dir, '/binary_update/ind_GenCompLasso_binary_', i, '.RData'),
+ results_ind_GenCompLasso, file_name)
+
diff --git a/simulations/code_sim_bin/run_sim_bin.sh b/simulations/code_sim_bin/run_sim_bin.sh
new file mode 100644
index 0000000..f7327c8
--- /dev/null
+++ b/simulations/code_sim_bin/run_sim_bin.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+
+#PBS -N sim_stab
+#PBS -l walltime=500:00:00
+#PBS -l nodes=1:ppn=10
+#PBS -l mem=20gb
+#PBS -V
+#PBS -j oe
+#PBS -d .
+#PBS -t 1-4%4
+#PBS -o messages_outputs/
+#PBS -e messages_errors/
+
+set -e
+cpus=$PBS_NUM_PPN
+
+export TMPDIR=/panfs/panfs1.ucsd.edu/panscratch/$USER/Stability_2020
+[ ! -d $TMPDIR ] && mkdir $TMPDIR
+export TMPDIR=$TMPDIR/sim_data
+[ ! -d $TMPDIR ] && mkdir $TMPDIR
+#tmp=$(mktemp -d --tmpdir)
+#export TMPDIR=$tmp
+#trap "rm -r $tmp; unset TMPDIR" EXIT
+
+# do something
+source activate r-env
+Rscript ind_results_binary_update.R $TMPDIR # t = 1-16%10
+Rscript toe_results_binary_update.R $TMPDIR # t = 1-80%10
+Rscript block_results_binary_update.R $TMPDIR # t = 1-80%10
+Rscript boot_sim_binary.R $TMPDIR # t = 1-4%4
+source deactivate r-env
+
+#mv $tmp/outdir ./outdir
diff --git a/simulations/code_sim_bin/toe_results_binary_update.R b/simulations/code_sim_bin/toe_results_binary_update.R
new file mode 100644
index 0000000..1a4986d
--- /dev/null
+++ b/simulations/code_sim_bin/toe_results_binary_update.R
@@ -0,0 +1,100 @@
+#####################################################################################
+### run all methods on simulated data with Toeplitz correlation ##################
+#####################################################################################
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+library(FSA)
+source('../../code_method/cv_method_binary_update.R')
+source('../../code_method/getStability.R')
+source('cv_sim_apply_binary_update.R')
+
+
+dim.list = list()
+size = c(50, 100, 500, 1000)
+idx = 0
+for (P in size){
+ for (N in size){
+ idx = idx + 1
+ dim.list[[idx]] = c(P=P, N=N)
+ }
+}
+
+## correlation strength
+rou.list = seq(0.1, 0.9, 0.2)
+
+files = NULL
+for (rou in rou.list){
+ for (dim in dim.list){
+ p = dim[1]
+ n = dim[2]
+ files = cbind(files, paste0(dir, '/sim_toeplitz_corr', rou, paste('P', p, 'N', n, sep='_'), '.RData', sep=''))
+ }
+}
+
+
+#------------------------
+# compare methods
+#------------------------
+i <- as.numeric(Sys.getenv("PBS_ARRAYID"))
+
+# ###################
+# #### Lasso ########
+# ###################
+print('Lasso')
+file_name = files[[i]]
+results_toe_lasso = sim_evaluate_cv(sim_file=file_name, method='lasso', family='binomial')
+
+when i = 61, sim_idx = 16, ROC cannot be calculated as y.test has all 0s
+thus change seed number to avoid this extreme example
+results_toe_lasso = sim_evaluate_cv(sim_file=file_name, method='lasso', family='binomial', seednum=5231)
+
+save(file=paste0(dir, '/binary_update/toe_Lasso_binary_', i, '.RData'), results_toe_lasso, file_name)
+
+# # #########################
+# # #### Elastic Net ########
+# # #########################
+print('Elnet')
+file_name = files[[i]]
+results_toe_elnet = sim_evaluate_cv(sim_file=file_name, method='elnet', family='binomial')
+
+when i = 61, sim_idx = 16, ROC cannot be calculated as y.test has all 0s
+thus change seed number to avoid this extreme example
+results_toe_elnet = sim_evaluate_cv(sim_file=file_name, method='elnet', family='binomial', seednum=5231)
+
+save(file=paste0(dir, '/binary_update/toe_Elnet_binary_', i, '.RData'), results_toe_elnet, file_name)
+
+# ############################
+# #### Random Forests ########
+# ############################
+print('Random Forests')
+file_name = files[[i]]
+results_toe_rf = sim_evaluate_cv(sim_file=file_name, method='RF')
+
+when i = 61, sim_idx = 16, ROC cannot be calculated as y.test has all 0s
+thus change seed number to avoid this extreme example
+results_toe_rf = sim_evaluate_cv(sim_file=file_name, method='RF', seednum=5231)
+
+save(file=paste0(dir, '/binary_update/toe_RF_binary_', i, '.RData'), results_toe_rf, file_name)
+
+# #########################################################
+# #### Generalized compositional Lasso ########
+# #########################################################
+print('Generalized Compositional Lasso')
+file_name = files[[i]]
+results_toe_GenCompLasso = sim_evaluate_cv(sim_file=file_name, method='GenCompLasso')
+
+save(file=paste0(dir, '/binary_update/toe_GenCompLasso_binary_', i, '.RData'),
+ results_toe_GenCompLasso, file_name)
+
+results_toe_GenCompLasso_dataSplit = sim_evaluate_cv(sim_file=file_name,
+ method='GenCompLasso',
+ data.split=TRUE)
+
+# when i = 61, sim_idx = 16, ROC cannot be calculated as y.test has all 0s
+# thus change seed number to avoid this extreme example
+# results_toe_GenCompLasso_dataSplit = sim_evaluate_cv(sim_file=file_name,
+# method='GenCompLasso',
+# data.split=TRUE, seednum=5231)
+
diff --git a/simulations/code_sim_cts/CL_sim_apply.R b/simulations/code_sim_cts/CL_sim_apply.R
new file mode 100755
index 0000000..5f909ec
--- /dev/null
+++ b/simulations/code_sim_cts/CL_sim_apply.R
@@ -0,0 +1,87 @@
+#####################################################################
+#### Compositional Lasso on simulation results ######################
+#####################################################################
+library(FSA) # for se()
+source('../../code_method/cv_method.R')
+source('../../code_method/getStability.R')
+source('cv_sim_apply.R')
+
+dir = '../sim_data'
+dim.list = list()
+size = c(50, 100, 500, 1000)
+idx = 0
+for (P in size){
+ for (N in size){
+ idx = idx + 1
+ dim.list[[idx]] = c(P=P, N=N)
+ }
+}
+
+###########################################
+#### Independent simulations ##############
+###########################################
+files = NULL
+for (dim in dim.list){
+ p = dim[1]
+ n = dim[2]
+ files = cbind(files, paste0(dir, '/sim_independent_', paste('P', p, 'N', n, sep='_'), '.RData'))
+}
+
+results_ind_compLasso = NULL
+for (i in 1:length(files)){ # parallel computing not working
+ print(i)
+ results_ind_compLasso[[i]] = sim_evaluate_cv(sim_file=files[i], method='compLasso')
+}
+save(file=paste0(dir, '/independent_compLasso.RData'), results_ind_compLasso)
+
+###########################################
+#### Toeplitz simulations ##############
+###########################################
+## correlation strength
+rou.list = seq(0.1, 0.9, 0.2)
+
+files = NULL
+for (rou in rou.list){
+ for (dim in dim.list){
+ p = dim[1]
+ n = dim[2]
+ files = cbind(files, paste0(dir, '/sim_toeplitz_corr', rou, paste('P', p, 'N', n, sep='_'), '.RData', sep=''))
+ }
+}
+
+results_toe_compLasso = NULL
+for (i in 1:length(files)){ # parallel computing not working
+ print(i)
+ results_toe_compLasso[[i]] = sim_evaluate_cv(sim_file=files[i], method='compLasso')
+}
+save(file=paste0(dir, '/toe_compLasso.RData'), results_toe_compLasso)
+
+###########################################
+#### Block simulations ####################
+###########################################
+rou.list = seq(0.1, 0.9, 0.2)
+
+files = NULL
+for (rou in rou.list){
+ for (dim in dim.list){
+ p = dim[1]
+ n = dim[2]
+ files = cbind(files, paste0(dir, '/sim_block_corr', rou, paste('P', p, 'N', n, sep='_'), '.RData', sep=''))
+ }
+}
+
+results_block_compLasso = NULL
+for (i in 1:length(files)){ # parallel computing not working
+ print(i)
+ results_block_compLasso[[i]] = sim_evaluate_cv(sim_file=files[i], method='compLasso')
+}
+save(file=paste0(dir, '/block_compLasso.RData'), results_block_compLasso)
+
+
+
+
+
+
+
+
+
diff --git a/simulations/code_sim_cts/block_results.R b/simulations/code_sim_cts/block_results.R
new file mode 100755
index 0000000..22c8818
--- /dev/null
+++ b/simulations/code_sim_cts/block_results.R
@@ -0,0 +1,75 @@
+#####################################################################################
+### run all methods on simulated data with block correlation ##################
+#####################################################################################
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+source('../../code_method/cv_method.R')
+source('../../code_method/getStability.R')
+source('cv_sim_apply.R')
+
+
+library(FSA)
+library(foreach)
+library(doParallel)
+numCores <- detectCores() - 2
+registerDoParallel(numCores)
+
+dim.list = list()
+size = c(50, 100, 500, 1000)
+idx = 0
+for (P in size){
+ for (N in size){
+ idx = idx + 1
+ dim.list[[idx]] = c(P=P, N=N)
+ }
+}
+
+## correlation strength
+rou.list = seq(0.1, 0.9, 0.2)
+
+files = NULL
+for (rou in rou.list){
+ for (dim in dim.list){
+ p = dim[1]
+ n = dim[2]
+ files = cbind(files, paste0(dir, '/sim_block_corr', rou, paste('P', p, 'N', n, sep='_'), '.RData', sep=''))
+ }
+}
+
+
+##################
+### Lasso ########
+##################
+print('Lasso')
+results_block_lasso = foreach(i = iter(files)) %dopar%{
+ print(i)
+ sim_evaluate_cv(sim_file=i, method='lasso')
+}
+
+save(file=paste0(dir, '/block_Lasso.RData'), results_block_lasso)
+
+
+#########################
+#### Elastic Net ########
+#########################
+print('Elnet')
+results_block_elnet= foreach(i = iter(files)) %dopar%{
+ print(i)
+ sim_evaluate_cv(sim_file=i, method='elnet')
+}
+
+save(file=paste0(dir, '/block_Elnet.RData'), results_block_elnet)
+
+############################
+#### Random Forests ########
+############################
+print('Random Forests')
+results_block_rf = foreach(i = iter(files)) %dopar%{
+ print(i)
+ sim_evaluate_cv(sim_file=i, method='RF')
+}
+
+save(file=paste0(dir, '/block_RF.RData'), results_block_rf)
+
diff --git a/simulations/code_sim_cts/boot_CL_testing.R b/simulations/code_sim_cts/boot_CL_testing.R
new file mode 100755
index 0000000..494efbf
--- /dev/null
+++ b/simulations/code_sim_cts/boot_CL_testing.R
@@ -0,0 +1,31 @@
+#########################################################################################################
+### This is to use bootstrap for testing on compositional lasso #################
+#########################################################################################################
+
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+source('../../code_method/cv_method.R')
+source('../../code_method/getStability.R')
+source('../../code_method/bootstrap_test_compLasso_rf.R')
+
+dir = '../sim_data'
+
+toe_lin = boot_stab(num_boot=100, sim_file= paste0(dir, '/sim_toeplitz_corr0.5P_1000_N_100.RData', sep=''),
+ method= 'compLasso', seednum=31, ratio.training=0.8, fold.cv=10,
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05)
+
+save(toe_lin, file=paste0(dir, '/boot_toe_compLasso.RData'))
+
+block_lin = boot_stab(num_boot=100, sim_file= paste0(dir, '/sim_block_corr0.5P_1000_N_100.RData', sep=''),
+ method= 'compLasso', seednum=31, ratio.training=0.8, fold.cv=10,
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05)
+
+save(block_lin, file=paste0(dir, '/boot_block_compLasso.RData'))
+
+
+
+
+
+
diff --git a/simulations/code_sim_cts/boot_RF_testing.R b/simulations/code_sim_cts/boot_RF_testing.R
new file mode 100755
index 0000000..c2fc990
--- /dev/null
+++ b/simulations/code_sim_cts/boot_RF_testing.R
@@ -0,0 +1,32 @@
+#########################################################################################################
+### This is to use bootstrap for testing on random forests #################
+#########################################################################################################
+
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+source('../../code_method/cv_method.R')
+source('../../code_method/getStability.R')
+source('../../code_method/bootstrap_test_compLasso_rf.R')
+
+# the function boot_stab is probably upddated to be boot_stab_sim() now
+
+toe_rf = boot_stab(num_boot=100, sim_file= paste0(dir, '/sim_toeplitz_corr0.5P_1000_N_100.RData', sep=''),
+ method= 'RF', seednum=31, ratio.training=0.8, fold.cv=10,
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05)
+
+save(toe_rf, file=paste0(dir, '/boot_toe_RF.RData'))
+
+block_rf = boot_stab(num_boot=100, sim_file= paste0(dir, '/sim_block_corr0.5P_1000_N_100.RData', sep=''),
+ method= 'RF', seednum=31, ratio.training=0.8, fold.cv=10,
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05)
+
+save(block_rf, file=paste0(dir, '/boot_block_RF.RData'))
+
+
+
+
+
+
+
diff --git a/simulations/code_sim_cts/cv_sim_apply.R b/simulations/code_sim_cts/cv_sim_apply.R
new file mode 100755
index 0000000..aa717e8
--- /dev/null
+++ b/simulations/code_sim_cts/cv_sim_apply.R
@@ -0,0 +1,91 @@
+#######################################################################################
+### apply different feature selection methods to simulated data #######################
+#######################################################################################
+
+#' library(FSA) # for se()
+#' source('cv_method.R')
+#' source('getStability.R')
+#' @title sim_evaluate_cv
+#' @description
+#' @param sim_file The path of a .RData file includes the simulation data pre-generated.
+#' @param method The machine learning algorithms to be used. This should be one of "lasso", "elnet", "RF", and "compLosso".
+#' @param seednum The seed number (default=31).
+#' @param ratio.training The ratio of the whole data assigned for model training (default=0.8).
+#' @param fold.cv The number of folds for cross-validation.(default=10)
+#' @param family The family of linear regression models (only related to "lasso" or "elnet").
+#' @param lambda.grid The tuning range for the regularization parameter "lambda" of "lasso" or "elnet".
+#' @param alpha.grid The tuning range for the elastic-net mixing parameter "alpha".
+#' @param mtry.grid The tuning range for the "RF" hyperparameter "mtry", that is, number of variables to possibly split at in each node.
+#' @param num_trees The number of decision trees built in the RF.
+#' @param pval_thr The threshold for the estimated p value of RF importance scores.
+#' @param method.perm The permutation method for estimating the p value of RF importance scores.
+#' @details
+
+sim_evaluate_cv = function(sim_file, method, seednum=31, ratio.training=0.8, fold.cv=10, family='gaussian',
+ lambda.grid=exp(seq(-4, -2, 0.2)), alpha.grid=seq(0.1, 0.9, 0.1),
+ mtry.grid=seq(5, 25, 5), num_trees = 500, pval_thr = 0.05, method.perm='altmann'){
+ # load simulated data
+ load(sim_file, dat <- new.env())
+
+ idx.start = 1; idx.stop = 100 # 100 repetitions for each simulated scenario
+ rou = dat$sim_array[[1]]$rou # rou, n, p are same across all repetitions
+ n = dat$sim_array[[1]]$n
+ p = dat$sim_array[[1]]$p
+
+ # evaluating different methods
+ fp = fn = mse = OOB_rf = NULL
+ stability.table = matrix(rep(0, (idx.stop - idx.start + 1) * p), ncol=p)
+ colnames(stability.table) = paste('V', seq(1:p), sep='')
+
+ for (i in idx.start:idx.stop){
+ sub = dat$sim_array[[i]]
+ coef = sub$beta
+ coef.true = which(coef != 0)
+
+ if (method == 'lasso'){
+ result = lasso_cv(y=sub$Y, datx=sub$Z, seednum=seednum,family=family, lambda.choice='lambda.1se',
+ ratio.training=ratio.training, fold.cv=fold.cv, lambda.grid=lambda.grid)
+ select.features = result$coef.chosen
+ } else if (method == 'elnet'){
+ result = elnet_cv(y=sub$Y, datx=sub$Z, seednum=seednum,family=family, alpha.grid=alpha.grid,
+ ratio.training=ratio.training, fold.cv=fold.cv, lambda.grid=lambda.grid)
+ select.features = result$coef.chosen
+ } else if (method == 'RF'){
+ result = randomForest_cv(y=sub$Y, datx=sub$Z, seednum=seednum, fold.cv=fold.cv,
+ num_trees=num_trees, mtry.grid = mtry.grid, pval_thr=pval_thr, method.perm=method.perm)
+ select.features = result$coef.chosen
+ } else if (method == 'compLasso'){
+ result = cons_lasso_cv(datx=sub$Z, y=sub$Y, seednum=seednum, ratio.training=ratio.training)
+ select.features = result$coef.chosen
+ }
+
+ # false positives: shouldn't be chosen but chosen
+ fp = c(fp, length(setdiff(select.features, coef.true)))
+
+ # false negatives: # should be chosen yet not
+ fn = c(fn, length(setdiff(coef.true, select.features)))
+
+ # prediction error
+ mse = c(mse, result$MSE)
+
+ if (method == 'RF'){
+ OOB_rf = c(OOB_rf, result$OOB)
+ }
+
+ # stability table
+ stability.table[i, select.features] = 1
+ }
+
+ # store results
+ FP = paste(mean(fp), '(', round(se(fp),2), ')') # FPR?
+ FN = paste(mean(fn), '(', round(se(fn),2), ')') # FNR?
+ MSE = paste(round(mean(mse, na.rm=T),2), '(', round(se(mse, na.rm=T),2), ')')
+ Stab = round(getStability(stability.table)$stability, 2)
+
+ results=list(rou=rou, n=n, p=p, FP=FP, FN=FN, MSE=MSE, Stab=Stab,
+ Stab.table=stability.table, FP.list=fp, FN.list=fn, MSE.list=mse, OOB.list=OOB_rf)
+
+}
+
+
+
diff --git a/simulations/code_sim_cts/ind_results.R b/simulations/code_sim_cts/ind_results.R
new file mode 100755
index 0000000..0544fbd
--- /dev/null
+++ b/simulations/code_sim_cts/ind_results.R
@@ -0,0 +1,68 @@
+#####################################################################################
+### run all methods on simulated data with independent correlation ##################
+#####################################################################################
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+source('../../code_method/cv_method.R')
+source('../../code_method/getStability.R')
+source('cv_sim_apply.R')
+
+
+library(FSA)
+library(foreach)
+library(doParallel)
+numCores <- detectCores() - 2
+registerDoParallel(numCores)
+
+dim.list = list()
+size = c(50, 100, 500, 1000)
+idx = 0
+for (P in size){
+ for (N in size){
+ idx = idx + 1
+ dim.list[[idx]] = c(P=P, N=N)
+ }
+}
+
+files = NULL
+for (dim in dim.list){
+ p = dim[1]
+ n = dim[2]
+ files = cbind(files, paste0(dir, '/sim_independent_', paste('P', p, 'N', n, sep='_'), '.RData'))
+}
+
+##################
+### Lasso ########
+##################
+print('Lasso')
+results_ind_lasso = foreach(i = iter(files)) %dopar%{
+ sim_evaluate_cv(sim_file=i, method='lasso')
+}
+
+save(file=paste0(dir, '/independent_Lasso.RData'), results_ind_lasso)
+
+
+#########################
+#### Elastic Net ########
+#########################
+print('Elnet')
+results_ind_elnet= foreach(i = iter(files)) %dopar%{
+ print(i)
+ sim_evaluate_cv(sim_file=i, method='elnet')
+}
+
+save(file=paste0(dir, '/independent_Elnet.RData'), results_ind_elnet)
+
+############################
+#### Random Forests ########
+############################
+print('RF')
+results_ind_rf = foreach(i = iter(files)) %dopar%{
+ print(i)
+ sim_evaluate_cv(sim_file=i, method='RF')
+}
+
+save(file=paste0(dir, '/independent_RF.RData'), results_ind_rf)
+
diff --git a/simulations/code_sim_cts/run_sim_cts.sh b/simulations/code_sim_cts/run_sim_cts.sh
new file mode 100755
index 0000000..af06b77
--- /dev/null
+++ b/simulations/code_sim_cts/run_sim_cts.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+
+#PBS -N ind_cv
+#PBS -l walltime=500:00:00
+#PBS -l nodes=1:ppn=10
+#PBS -l mem=50gb
+#PBS -V
+#PBS -j oe
+#PBS -d .
+
+set -e
+cpus=$PBS_NUM_PPN
+
+export TMPDIR=/panfs/panfs1.ucsd.edu/panscratch/$USER/Stability_2020
+[ ! -d $TMPDIR ] && mkdir $TMPDIR
+export TMPDIR=$TMPDIR/sim_data
+[ ! -d $TMPDIR ] && mkdir $TMPDIR
+#tmp=$(mktemp -d --tmpdir)
+#export TMPDIR=$tmp
+#trap "rm -r $tmp; unset TMPDIR" EXIT
+
+# do something
+source activate r-env
+Rscript ind_results.R $TMPDIR
+Rscript block_results.R $TMPDIR
+Rscript toe_results.R $TMPDIR
+Rscript boot_RF_testing.R $TMPDIR
+source deactivate r-env
+
+#mv $tmp/outdir ./outdir
diff --git a/simulations/code_sim_cts/toe_results.R b/simulations/code_sim_cts/toe_results.R
new file mode 100755
index 0000000..b582db8
--- /dev/null
+++ b/simulations/code_sim_cts/toe_results.R
@@ -0,0 +1,73 @@
+#####################################################################################
+### run all methods on simulated data with Toeplitz correlation ##################
+#####################################################################################
+args = commandArgs(trailingOnly=TRUE)
+print(args)
+dir = args[1]
+
+source('../../code_method/cv_method.R')
+source('../../code_method/getStability.R')
+source('cv_sim_apply.R')
+
+
+library(FSA)
+library(foreach)
+library(doParallel)
+numCores <- detectCores() - 2
+registerDoParallel(numCores)
+
+dim.list = list()
+size = c(50, 100, 500, 1000)
+idx = 0
+for (P in size){
+ for (N in size){
+ idx = idx + 1
+ dim.list[[idx]] = c(P=P, N=N)
+ }
+}
+
+## correlation strength
+rou.list = seq(0.1, 0.9, 0.2)
+
+files = NULL
+for (rou in rou.list){
+ for (dim in dim.list){
+ p = dim[1]
+ n = dim[2]
+ files = cbind(files, paste0(dir, '/sim_toeplitz_corr', rou, paste('P', p, 'N', n, sep='_'), '.RData', sep=''))
+ }
+}
+
+
+##################
+### Lasso ########
+##################
+print('Lasso')
+results_toe_lasso = foreach(i = iter(files)) %dopar%{
+ print(i)
+ sim_evaluate_cv(sim_file=i, method='lasso')
+}
+
+save(file=paste0(dir, '/toe_Lasso.RData'), results_toe_lasso)
+
+
+#########################
+#### Elastic Net ########
+#########################
+print('Elnet')
+results_toe_elnet= foreach(i = iter(files)) %dopar%{
+ sim_evaluate_cv(sim_file=i, method='elnet')
+}
+
+save(file=paste0(dir, '/toe_Elnet.RData'), results_toe_elnet)
+
+############################
+#### Random Forests ########
+############################
+print('Random Forests')
+results_toe_rf = foreach(i = iter(files)) %dopar%{
+ sim_evaluate_cv(sim_file=i, method='RF')
+}
+
+save(file=paste0(dir, '/toe_RF.RData'), results_toe_rf)
+
diff --git a/simulations/figures_combined/fig1_combined.png b/simulations/figures_combined/fig1_combined.png
new file mode 100644
index 0000000..96ad4ef
Binary files /dev/null and b/simulations/figures_combined/fig1_combined.png differ
diff --git a/simulations/figures_combined/fig2_combined.png b/simulations/figures_combined/fig2_combined.png
new file mode 100644
index 0000000..f03faff
Binary files /dev/null and b/simulations/figures_combined/fig2_combined.png differ
diff --git a/simulations/figures_combined/fig3_combined.png b/simulations/figures_combined/fig3_combined.png
new file mode 100644
index 0000000..d6f9bfb
Binary files /dev/null and b/simulations/figures_combined/fig3_combined.png differ
diff --git a/simulations/figures_combined/figS1_combined.png b/simulations/figures_combined/figS1_combined.png
new file mode 100644
index 0000000..f5faaf6
Binary files /dev/null and b/simulations/figures_combined/figS1_combined.png differ
diff --git a/simulations/figures_combined/figS2_combined.png b/simulations/figures_combined/figS2_combined.png
new file mode 100644
index 0000000..7a9e36c
Binary files /dev/null and b/simulations/figures_combined/figS2_combined.png differ
diff --git a/simulations/figures_combined/figS3_combined.png b/simulations/figures_combined/figS3_combined.png
new file mode 100644
index 0000000..6050d79
Binary files /dev/null and b/simulations/figures_combined/figS3_combined.png differ
diff --git a/simulations/figures_combined/figS4_combined.png b/simulations/figures_combined/figS4_combined.png
new file mode 100644
index 0000000..08917eb
Binary files /dev/null and b/simulations/figures_combined/figS4_combined.png differ
diff --git a/simulations/figures_combined/fig_cts_num_select.png b/simulations/figures_combined/fig_cts_num_select.png
new file mode 100644
index 0000000..645e0bc
Binary files /dev/null and b/simulations/figures_combined/fig_cts_num_select.png differ
diff --git a/simulations/figures_sim/.DS_Store b/simulations/figures_sim/.DS_Store
deleted file mode 100755
index 5008ddf..0000000
Binary files a/simulations/figures_sim/.DS_Store and /dev/null differ
diff --git a/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.1_sim_ind_lasso_binary_update-checkpoint.ipynb b/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.1_sim_ind_lasso_binary_update-checkpoint.ipynb
new file mode 100644
index 0000000..7532ddd
--- /dev/null
+++ b/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.1_sim_ind_lasso_binary_update-checkpoint.ipynb
@@ -0,0 +1,683 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### summarize lasso results on Independent Simulation Scenarios for binary outcome"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dir = '/panfs/panfs1.ucsd.edu/panscratch/lij014/Stability_2020/sim_data'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dim.list = list()\n",
+ "size = c(50, 100, 500, 1000)\n",
+ "idx = 0\n",
+ "for (P in size){\n",
+ " for (N in size){\n",
+ " idx = idx + 1\n",
+ " dim.list[[idx]] = c(P=P, N=N)\n",
+ " }\n",
+ "}\n",
+ "\n",
+ "files = NULL\n",
+ "for (dim in dim.list){\n",
+ " p = dim[1]\n",
+ " n = dim[2]\n",
+ " files = cbind(files, paste0(dir, '/sim_independent_', paste('P', p, 'N', n, sep='_'), '.RData'))\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "16"
+ ],
+ "text/latex": [
+ "16"
+ ],
+ "text/markdown": [
+ "16"
+ ],
+ "text/plain": [
+ "[1] 16"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "length(files)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[1] \"indx: 1\"\n",
+ "[1] \"indx: 2\"\n",
+ "[1] \"indx: 3\"\n",
+ "[1] \"indx: 4\"\n",
+ "[1] \"indx: 5\"\n",
+ "[1] \"indx: 6\"\n",
+ "[1] \"indx: 7\"\n",
+ "[1] \"indx: 8\"\n",
+ "[1] \"indx: 9\"\n",
+ "[1] \"indx: 10\"\n",
+ "[1] \"indx: 11\"\n",
+ "[1] \"indx: 12\"\n",
+ "[1] \"indx: 13\"\n",
+ "[1] \"indx: 14\"\n",
+ "[1] \"indx: 15\"\n",
+ "[1] \"indx: 16\"\n"
+ ]
+ }
+ ],
+ "source": [
+ "avg_FDR = NULL\n",
+ "table_toe = NULL\n",
+ "tmp_num_select = rep(0, length(files))\n",
+ "for (i in 1:length(files)){\n",
+ " print(paste0('indx: ', i))\n",
+ " load(paste0(dir, '/binary_update/ind_Lasso_binary_', i, '.RData')) \n",
+ " \n",
+ " table_toe = rbind(table_toe, results_ind_lasso[c('n', 'p', 'rou', 'FP', 'FN', 'ROC', 'Stab')])\n",
+ " tmp_num_select[i] = mean(rowSums(results_ind_lasso$Stab.table))\n",
+ " \n",
+ " # calculate FDR\n",
+ " load(file_name, dat <- new.env())\n",
+ " sub = dat$sim_array[[i]]\n",
+ " p = sub$p # take true values from 1st replicate of each simulated data\n",
+ " coef = sub$beta\n",
+ " coef.true = which(coef != 0)\n",
+ " \n",
+ " tt = results_ind_lasso$Stab.table\n",
+ " FDR = NULL # false positive rate\n",
+ " for (r in 1:nrow(tt)){\n",
+ " FDR = c(FDR, length(setdiff(which(tt[r, ] !=0), coef.true))/sum(tt[r, ]))\n",
+ "\n",
+ " }\n",
+ " \n",
+ " avg_FDR = c(avg_FDR, mean(FDR, na.rm=T))\n",
+ "}\n",
+ "table_toe = as.data.frame(table_toe)\n",
+ "table_toe$num_select = tmp_num_select\n",
+ "table_toe$FDR = round(avg_FDR,2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "\t | n | p | rou | FP | FN | ROC | Stab | num_select | FDR |
\n",
+ "\t | <list> | <list> | <list> | <list> | <list> | <list> | <list> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 3.69 ( 0.25 ) | 2.5 ( 0.12 ) | 0.74 ( 0.01 ) | 0.33 | 6.19 | 0.37 |
\n",
+ "\t2 | 100 | 50 | 0 | 7.16 ( 0.37 ) | 0.64 ( 0.08 ) | 0.85 ( 0.01 ) | 0.35 | 11.52 | 0.48 |
\n",
+ "\t3 | 500 | 50 | 0 | 5.21 ( 0.21 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.53 | 10.21 | 0.39 |
\n",
+ "\t4 | 1000 | 50 | 0 | 2.25 ( 0.11 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.8 | 7.25 | 0.15 |
\n",
+ "\t5 | 50 | 100 | 0 | 5.13 ( 0.34 ) | 2.93 ( 0.1 ) | 0.73 ( 0.01 ) | 0.25 | 7.20 | 0.50 |
\n",
+ "\t6 | 100 | 100 | 0 | 8.22 ( 0.44 ) | 1.03 ( 0.09 ) | 0.83 ( 0.01 ) | 0.32 | 12.19 | 0.54 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 9\n",
+ "\\begin{tabular}{r|lllllllll}\n",
+ " & n & p & rou & FP & FN & ROC & Stab & num\\_select & FDR\\\\\n",
+ " & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 3.69 ( 0.25 ) & 2.5 ( 0.12 ) & 0.74 ( 0.01 ) & 0.33 & 6.19 & 0.37\\\\\n",
+ "\t2 & 100 & 50 & 0 & 7.16 ( 0.37 ) & 0.64 ( 0.08 ) & 0.85 ( 0.01 ) & 0.35 & 11.52 & 0.48\\\\\n",
+ "\t3 & 500 & 50 & 0 & 5.21 ( 0.21 ) & 0 ( 0 ) & 0.93 ( 0 ) & 0.53 & 10.21 & 0.39\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 2.25 ( 0.11 ) & 0 ( 0 ) & 0.93 ( 0 ) & 0.8 & 7.25 & 0.15\\\\\n",
+ "\t5 & 50 & 100 & 0 & 5.13 ( 0.34 ) & 2.93 ( 0.1 ) & 0.73 ( 0.01 ) & 0.25 & 7.20 & 0.50\\\\\n",
+ "\t6 & 100 & 100 & 0 & 8.22 ( 0.44 ) & 1.03 ( 0.09 ) & 0.83 ( 0.01 ) & 0.32 & 12.19 & 0.54\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "| | n <list> | p <list> | rou <list> | FP <list> | FN <list> | ROC <list> | Stab <list> | num_select <dbl> | FDR <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 3.69 ( 0.25 ) | 2.5 ( 0.12 ) | 0.74 ( 0.01 ) | 0.33 | 6.19 | 0.37 |\n",
+ "| 2 | 100 | 50 | 0 | 7.16 ( 0.37 ) | 0.64 ( 0.08 ) | 0.85 ( 0.01 ) | 0.35 | 11.52 | 0.48 |\n",
+ "| 3 | 500 | 50 | 0 | 5.21 ( 0.21 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.53 | 10.21 | 0.39 |\n",
+ "| 4 | 1000 | 50 | 0 | 2.25 ( 0.11 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.8 | 7.25 | 0.15 |\n",
+ "| 5 | 50 | 100 | 0 | 5.13 ( 0.34 ) | 2.93 ( 0.1 ) | 0.73 ( 0.01 ) | 0.25 | 7.20 | 0.50 |\n",
+ "| 6 | 100 | 100 | 0 | 8.22 ( 0.44 ) | 1.03 ( 0.09 ) | 0.83 ( 0.01 ) | 0.32 | 12.19 | 0.54 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " n p rou FP FN ROC Stab num_select FDR \n",
+ "1 50 50 0 3.69 ( 0.25 ) 2.5 ( 0.12 ) 0.74 ( 0.01 ) 0.33 6.19 0.37\n",
+ "2 100 50 0 7.16 ( 0.37 ) 0.64 ( 0.08 ) 0.85 ( 0.01 ) 0.35 11.52 0.48\n",
+ "3 500 50 0 5.21 ( 0.21 ) 0 ( 0 ) 0.93 ( 0 ) 0.53 10.21 0.39\n",
+ "4 1000 50 0 2.25 ( 0.11 ) 0 ( 0 ) 0.93 ( 0 ) 0.8 7.25 0.15\n",
+ "5 50 100 0 5.13 ( 0.34 ) 2.93 ( 0.1 ) 0.73 ( 0.01 ) 0.25 7.20 0.50\n",
+ "6 100 100 0 8.22 ( 0.44 ) 1.03 ( 0.09 ) 0.83 ( 0.01 ) 0.32 12.19 0.54"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "\t | n | p | rou | FP | FN | ROC | Stab | num_select | FDR |
\n",
+ "\t | <list> | <list> | <list> | <list> | <list> | <list> | <list> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t11 | 500 | 500 | 0 | 23.83 ( 0.69 ) | 0 ( 0 ) | 0.92 ( 0 ) | 0.2 | 28.83 | 0.78 |
\n",
+ "\t12 | 1000 | 500 | 0 | 11.38 ( 0.41 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.36 | 16.38 | 0.60 |
\n",
+ "\t13 | 50 | 1000 | 0 | 9.77 ( 0.24 ) | 4.03 ( 0.08 ) | 0.67 ( 0.01 ) | 0.09 | 10.74 | 0.81 |
\n",
+ "\t14 | 100 | 1000 | 0 | 11.91 ( 0.8 ) | 2.32 ( 0.1 ) | 0.78 ( 0.01 ) | 0.2 | 14.59 | 0.66 |
\n",
+ "\t15 | 500 | 1000 | 0 | 30.84 ( 0.92 ) | 0 ( 0 ) | 0.91 ( 0 ) | 0.16 | 35.84 | 0.82 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 19.96 ( 0.61 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.24 | 24.96 | 0.74 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 9\n",
+ "\\begin{tabular}{r|lllllllll}\n",
+ " & n & p & rou & FP & FN & ROC & Stab & num\\_select & FDR\\\\\n",
+ " & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t11 & 500 & 500 & 0 & 23.83 ( 0.69 ) & 0 ( 0 ) & 0.92 ( 0 ) & 0.2 & 28.83 & 0.78\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 11.38 ( 0.41 ) & 0 ( 0 ) & 0.93 ( 0 ) & 0.36 & 16.38 & 0.60\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 9.77 ( 0.24 ) & 4.03 ( 0.08 ) & 0.67 ( 0.01 ) & 0.09 & 10.74 & 0.81\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 11.91 ( 0.8 ) & 2.32 ( 0.1 ) & 0.78 ( 0.01 ) & 0.2 & 14.59 & 0.66\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 30.84 ( 0.92 ) & 0 ( 0 ) & 0.91 ( 0 ) & 0.16 & 35.84 & 0.82\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 19.96 ( 0.61 ) & 0 ( 0 ) & 0.93 ( 0 ) & 0.24 & 24.96 & 0.74\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "| | n <list> | p <list> | rou <list> | FP <list> | FN <list> | ROC <list> | Stab <list> | num_select <dbl> | FDR <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 11 | 500 | 500 | 0 | 23.83 ( 0.69 ) | 0 ( 0 ) | 0.92 ( 0 ) | 0.2 | 28.83 | 0.78 |\n",
+ "| 12 | 1000 | 500 | 0 | 11.38 ( 0.41 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.36 | 16.38 | 0.60 |\n",
+ "| 13 | 50 | 1000 | 0 | 9.77 ( 0.24 ) | 4.03 ( 0.08 ) | 0.67 ( 0.01 ) | 0.09 | 10.74 | 0.81 |\n",
+ "| 14 | 100 | 1000 | 0 | 11.91 ( 0.8 ) | 2.32 ( 0.1 ) | 0.78 ( 0.01 ) | 0.2 | 14.59 | 0.66 |\n",
+ "| 15 | 500 | 1000 | 0 | 30.84 ( 0.92 ) | 0 ( 0 ) | 0.91 ( 0 ) | 0.16 | 35.84 | 0.82 |\n",
+ "| 16 | 1000 | 1000 | 0 | 19.96 ( 0.61 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.24 | 24.96 | 0.74 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " n p rou FP FN ROC Stab num_select\n",
+ "11 500 500 0 23.83 ( 0.69 ) 0 ( 0 ) 0.92 ( 0 ) 0.2 28.83 \n",
+ "12 1000 500 0 11.38 ( 0.41 ) 0 ( 0 ) 0.93 ( 0 ) 0.36 16.38 \n",
+ "13 50 1000 0 9.77 ( 0.24 ) 4.03 ( 0.08 ) 0.67 ( 0.01 ) 0.09 10.74 \n",
+ "14 100 1000 0 11.91 ( 0.8 ) 2.32 ( 0.1 ) 0.78 ( 0.01 ) 0.2 14.59 \n",
+ "15 500 1000 0 30.84 ( 0.92 ) 0 ( 0 ) 0.91 ( 0 ) 0.16 35.84 \n",
+ "16 1000 1000 0 19.96 ( 0.61 ) 0 ( 0 ) 0.93 ( 0 ) 0.24 24.96 \n",
+ " FDR \n",
+ "11 0.78\n",
+ "12 0.60\n",
+ "13 0.81\n",
+ "14 0.66\n",
+ "15 0.82\n",
+ "16 0.74"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "tail(table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "extract_numeric() is deprecated: please use readr::parse_number() instead\n",
+ "\n",
+ "extract_numeric() is deprecated: please use readr::parse_number() instead\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "# export result\n",
+ "result.table_toe <- apply(table_toe,2,as.character)\n",
+ "rownames(result.table_toe) = rownames(table_toe)\n",
+ "result.table_toe = as.data.frame(result.table_toe)\n",
+ "\n",
+ "# extract numbers only for 'n' & 'p'\n",
+ "result.table_toe$n = tidyr::extract_numeric(result.table_toe$n)\n",
+ "result.table_toe$p = tidyr::extract_numeric(result.table_toe$p)\n",
+ "result.table_toe$ratio = result.table_toe$p / result.table_toe$n\n",
+ "\n",
+ "result.table_toe = result.table_toe[c('n', 'p', 'rou', 'ratio', 'Stab', 'ROC', 'FP', 'FN', 'num_select', 'FDR')]\n",
+ "colnames(result.table_toe)[1:4] = c('N', 'P', 'Corr', 'Ratio')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# convert interested measurements to be numeric\n",
+ "result.table_toe$Stab = as.numeric(as.character(result.table_toe$Stab))\n",
+ "result.table_toe$num_select = as.numeric(as.character(result.table_toe$num_select))\n",
+ "\n",
+ "result.table_toe$ROC_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$ROC))\n",
+ "result.table_toe$FP_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$FP))\n",
+ "result.table_toe$FN_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$FN))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 0 × 13\n",
+ "\n",
+ "\tN | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t<dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 0 × 13\n",
+ "\\begin{tabular}{lllllllllllll}\n",
+ " N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 0 × 13\n",
+ "\n",
+ "| N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN num_select FDR ROC_mean FP_mean FN_mean"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# check whether missing values exists\n",
+ "result.table_toe[rowSums(is.na(result.table_toe)) > 0,]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 1.00 | 0.33 | 0.74 ( 0.01 ) | 3.69 ( 0.25 ) | 2.5 ( 0.12 ) | 6.19 | 0.37 | 0.74 | 3.69 | 2.50 |
\n",
+ "\t2 | 100 | 50 | 0 | 0.50 | 0.35 | 0.85 ( 0.01 ) | 7.16 ( 0.37 ) | 0.64 ( 0.08 ) | 11.52 | 0.48 | 0.85 | 7.16 | 0.64 |
\n",
+ "\t3 | 500 | 50 | 0 | 0.10 | 0.53 | 0.93 ( 0 ) | 5.21 ( 0.21 ) | 0 ( 0 ) | 10.21 | 0.39 | 0.93 | 5.21 | 0.00 |
\n",
+ "\t4 | 1000 | 50 | 0 | 0.05 | 0.80 | 0.93 ( 0 ) | 2.25 ( 0.11 ) | 0 ( 0 ) | 7.25 | 0.15 | 0.93 | 2.25 | 0.00 |
\n",
+ "\t5 | 50 | 100 | 0 | 2.00 | 0.25 | 0.73 ( 0.01 ) | 5.13 ( 0.34 ) | 2.93 ( 0.1 ) | 7.20 | 0.5 | 0.73 | 5.13 | 2.93 |
\n",
+ "\t6 | 100 | 100 | 0 | 1.00 | 0.32 | 0.83 ( 0.01 ) | 8.22 ( 0.44 ) | 1.03 ( 0.09 ) | 12.19 | 0.54 | 0.83 | 8.22 | 1.03 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 1.00 & 0.33 & 0.74 ( 0.01 ) & 3.69 ( 0.25 ) & 2.5 ( 0.12 ) & 6.19 & 0.37 & 0.74 & 3.69 & 2.50\\\\\n",
+ "\t2 & 100 & 50 & 0 & 0.50 & 0.35 & 0.85 ( 0.01 ) & 7.16 ( 0.37 ) & 0.64 ( 0.08 ) & 11.52 & 0.48 & 0.85 & 7.16 & 0.64\\\\\n",
+ "\t3 & 500 & 50 & 0 & 0.10 & 0.53 & 0.93 ( 0 ) & 5.21 ( 0.21 ) & 0 ( 0 ) & 10.21 & 0.39 & 0.93 & 5.21 & 0.00\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 0.05 & 0.80 & 0.93 ( 0 ) & 2.25 ( 0.11 ) & 0 ( 0 ) & 7.25 & 0.15 & 0.93 & 2.25 & 0.00\\\\\n",
+ "\t5 & 50 & 100 & 0 & 2.00 & 0.25 & 0.73 ( 0.01 ) & 5.13 ( 0.34 ) & 2.93 ( 0.1 ) & 7.20 & 0.5 & 0.73 & 5.13 & 2.93\\\\\n",
+ "\t6 & 100 & 100 & 0 & 1.00 & 0.32 & 0.83 ( 0.01 ) & 8.22 ( 0.44 ) & 1.03 ( 0.09 ) & 12.19 & 0.54 & 0.83 & 8.22 & 1.03\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 1.00 | 0.33 | 0.74 ( 0.01 ) | 3.69 ( 0.25 ) | 2.5 ( 0.12 ) | 6.19 | 0.37 | 0.74 | 3.69 | 2.50 |\n",
+ "| 2 | 100 | 50 | 0 | 0.50 | 0.35 | 0.85 ( 0.01 ) | 7.16 ( 0.37 ) | 0.64 ( 0.08 ) | 11.52 | 0.48 | 0.85 | 7.16 | 0.64 |\n",
+ "| 3 | 500 | 50 | 0 | 0.10 | 0.53 | 0.93 ( 0 ) | 5.21 ( 0.21 ) | 0 ( 0 ) | 10.21 | 0.39 | 0.93 | 5.21 | 0.00 |\n",
+ "| 4 | 1000 | 50 | 0 | 0.05 | 0.80 | 0.93 ( 0 ) | 2.25 ( 0.11 ) | 0 ( 0 ) | 7.25 | 0.15 | 0.93 | 2.25 | 0.00 |\n",
+ "| 5 | 50 | 100 | 0 | 2.00 | 0.25 | 0.73 ( 0.01 ) | 5.13 ( 0.34 ) | 2.93 ( 0.1 ) | 7.20 | 0.5 | 0.73 | 5.13 | 2.93 |\n",
+ "| 6 | 100 | 100 | 0 | 1.00 | 0.32 | 0.83 ( 0.01 ) | 8.22 ( 0.44 ) | 1.03 ( 0.09 ) | 12.19 | 0.54 | 0.83 | 8.22 | 1.03 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN num_select\n",
+ "1 50 50 0 1.00 0.33 0.74 ( 0.01 ) 3.69 ( 0.25 ) 2.5 ( 0.12 ) 6.19 \n",
+ "2 100 50 0 0.50 0.35 0.85 ( 0.01 ) 7.16 ( 0.37 ) 0.64 ( 0.08 ) 11.52 \n",
+ "3 500 50 0 0.10 0.53 0.93 ( 0 ) 5.21 ( 0.21 ) 0 ( 0 ) 10.21 \n",
+ "4 1000 50 0 0.05 0.80 0.93 ( 0 ) 2.25 ( 0.11 ) 0 ( 0 ) 7.25 \n",
+ "5 50 100 0 2.00 0.25 0.73 ( 0.01 ) 5.13 ( 0.34 ) 2.93 ( 0.1 ) 7.20 \n",
+ "6 100 100 0 1.00 0.32 0.83 ( 0.01 ) 8.22 ( 0.44 ) 1.03 ( 0.09 ) 12.19 \n",
+ " FDR ROC_mean FP_mean FN_mean\n",
+ "1 0.37 0.74 3.69 2.50 \n",
+ "2 0.48 0.85 7.16 0.64 \n",
+ "3 0.39 0.93 5.21 0.00 \n",
+ "4 0.15 0.93 2.25 0.00 \n",
+ "5 0.5 0.73 5.13 2.93 \n",
+ "6 0.54 0.83 8.22 1.03 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(result.table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t11 | 500 | 500 | 0 | 1.0 | 0.20 | 0.92 ( 0 ) | 23.83 ( 0.69 ) | 0 ( 0 ) | 28.83 | 0.78 | 0.92 | 23.83 | 0.00 |
\n",
+ "\t12 | 1000 | 500 | 0 | 0.5 | 0.36 | 0.93 ( 0 ) | 11.38 ( 0.41 ) | 0 ( 0 ) | 16.38 | 0.6 | 0.93 | 11.38 | 0.00 |
\n",
+ "\t13 | 50 | 1000 | 0 | 20.0 | 0.09 | 0.67 ( 0.01 ) | 9.77 ( 0.24 ) | 4.03 ( 0.08 ) | 10.74 | 0.81 | 0.67 | 9.77 | 4.03 |
\n",
+ "\t14 | 100 | 1000 | 0 | 10.0 | 0.20 | 0.78 ( 0.01 ) | 11.91 ( 0.8 ) | 2.32 ( 0.1 ) | 14.59 | 0.66 | 0.78 | 11.91 | 2.32 |
\n",
+ "\t15 | 500 | 1000 | 0 | 2.0 | 0.16 | 0.91 ( 0 ) | 30.84 ( 0.92 ) | 0 ( 0 ) | 35.84 | 0.82 | 0.91 | 30.84 | 0.00 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 1.0 | 0.24 | 0.93 ( 0 ) | 19.96 ( 0.61 ) | 0 ( 0 ) | 24.96 | 0.74 | 0.93 | 19.96 | 0.00 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t11 & 500 & 500 & 0 & 1.0 & 0.20 & 0.92 ( 0 ) & 23.83 ( 0.69 ) & 0 ( 0 ) & 28.83 & 0.78 & 0.92 & 23.83 & 0.00\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 0.5 & 0.36 & 0.93 ( 0 ) & 11.38 ( 0.41 ) & 0 ( 0 ) & 16.38 & 0.6 & 0.93 & 11.38 & 0.00\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 20.0 & 0.09 & 0.67 ( 0.01 ) & 9.77 ( 0.24 ) & 4.03 ( 0.08 ) & 10.74 & 0.81 & 0.67 & 9.77 & 4.03\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 10.0 & 0.20 & 0.78 ( 0.01 ) & 11.91 ( 0.8 ) & 2.32 ( 0.1 ) & 14.59 & 0.66 & 0.78 & 11.91 & 2.32\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 2.0 & 0.16 & 0.91 ( 0 ) & 30.84 ( 0.92 ) & 0 ( 0 ) & 35.84 & 0.82 & 0.91 & 30.84 & 0.00\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 1.0 & 0.24 & 0.93 ( 0 ) & 19.96 ( 0.61 ) & 0 ( 0 ) & 24.96 & 0.74 & 0.93 & 19.96 & 0.00\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 11 | 500 | 500 | 0 | 1.0 | 0.20 | 0.92 ( 0 ) | 23.83 ( 0.69 ) | 0 ( 0 ) | 28.83 | 0.78 | 0.92 | 23.83 | 0.00 |\n",
+ "| 12 | 1000 | 500 | 0 | 0.5 | 0.36 | 0.93 ( 0 ) | 11.38 ( 0.41 ) | 0 ( 0 ) | 16.38 | 0.6 | 0.93 | 11.38 | 0.00 |\n",
+ "| 13 | 50 | 1000 | 0 | 20.0 | 0.09 | 0.67 ( 0.01 ) | 9.77 ( 0.24 ) | 4.03 ( 0.08 ) | 10.74 | 0.81 | 0.67 | 9.77 | 4.03 |\n",
+ "| 14 | 100 | 1000 | 0 | 10.0 | 0.20 | 0.78 ( 0.01 ) | 11.91 ( 0.8 ) | 2.32 ( 0.1 ) | 14.59 | 0.66 | 0.78 | 11.91 | 2.32 |\n",
+ "| 15 | 500 | 1000 | 0 | 2.0 | 0.16 | 0.91 ( 0 ) | 30.84 ( 0.92 ) | 0 ( 0 ) | 35.84 | 0.82 | 0.91 | 30.84 | 0.00 |\n",
+ "| 16 | 1000 | 1000 | 0 | 1.0 | 0.24 | 0.93 ( 0 ) | 19.96 ( 0.61 ) | 0 ( 0 ) | 24.96 | 0.74 | 0.93 | 19.96 | 0.00 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN \n",
+ "11 500 500 0 1.0 0.20 0.92 ( 0 ) 23.83 ( 0.69 ) 0 ( 0 ) \n",
+ "12 1000 500 0 0.5 0.36 0.93 ( 0 ) 11.38 ( 0.41 ) 0 ( 0 ) \n",
+ "13 50 1000 0 20.0 0.09 0.67 ( 0.01 ) 9.77 ( 0.24 ) 4.03 ( 0.08 )\n",
+ "14 100 1000 0 10.0 0.20 0.78 ( 0.01 ) 11.91 ( 0.8 ) 2.32 ( 0.1 ) \n",
+ "15 500 1000 0 2.0 0.16 0.91 ( 0 ) 30.84 ( 0.92 ) 0 ( 0 ) \n",
+ "16 1000 1000 0 1.0 0.24 0.93 ( 0 ) 19.96 ( 0.61 ) 0 ( 0 ) \n",
+ " num_select FDR ROC_mean FP_mean FN_mean\n",
+ "11 28.83 0.78 0.92 23.83 0.00 \n",
+ "12 16.38 0.6 0.93 11.38 0.00 \n",
+ "13 10.74 0.81 0.67 9.77 4.03 \n",
+ "14 14.59 0.66 0.78 11.91 2.32 \n",
+ "15 35.84 0.82 0.91 30.84 0.00 \n",
+ "16 24.96 0.74 0.93 19.96 0.00 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "tail(result.table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 16 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 1.00 | 0.33 | 0.74 ( 0.01 ) | 3.69 ( 0.25 ) | 2.5 ( 0.12 ) | 6.19 | 0.37 | 0.74 | 3.69 | 2.50 |
\n",
+ "\t2 | 100 | 50 | 0 | 0.50 | 0.35 | 0.85 ( 0.01 ) | 7.16 ( 0.37 ) | 0.64 ( 0.08 ) | 11.52 | 0.48 | 0.85 | 7.16 | 0.64 |
\n",
+ "\t3 | 500 | 50 | 0 | 0.10 | 0.53 | 0.93 ( 0 ) | 5.21 ( 0.21 ) | 0 ( 0 ) | 10.21 | 0.39 | 0.93 | 5.21 | 0.00 |
\n",
+ "\t4 | 1000 | 50 | 0 | 0.05 | 0.80 | 0.93 ( 0 ) | 2.25 ( 0.11 ) | 0 ( 0 ) | 7.25 | 0.15 | 0.93 | 2.25 | 0.00 |
\n",
+ "\t5 | 50 | 100 | 0 | 2.00 | 0.25 | 0.73 ( 0.01 ) | 5.13 ( 0.34 ) | 2.93 ( 0.1 ) | 7.20 | 0.5 | 0.73 | 5.13 | 2.93 |
\n",
+ "\t6 | 100 | 100 | 0 | 1.00 | 0.32 | 0.83 ( 0.01 ) | 8.22 ( 0.44 ) | 1.03 ( 0.09 ) | 12.19 | 0.54 | 0.83 | 8.22 | 1.03 |
\n",
+ "\t7 | 500 | 100 | 0 | 0.20 | 0.40 | 0.92 ( 0 ) | 8.89 ( 0.29 ) | 0 ( 0 ) | 13.89 | 0.55 | 0.92 | 8.89 | 0.00 |
\n",
+ "\t8 | 1000 | 100 | 0 | 0.10 | 0.69 | 0.93 ( 0 ) | 3.53 ( 0.16 ) | 0 ( 0 ) | 8.53 | 0.27 | 0.93 | 3.53 | 0.00 |
\n",
+ "\t9 | 50 | 500 | 0 | 10.00 | 0.13 | 0.7 ( 0.01 ) | 8.05 ( 0.28 ) | 3.74 ( 0.09 ) | 9.31 | 0.74 | 0.70 | 8.05 | 3.74 |
\n",
+ "\t10 | 100 | 500 | 0 | 5.00 | 0.24 | 0.79 ( 0.01 ) | 11.14 ( 0.64 ) | 1.71 ( 0.1 ) | 14.43 | 0.63 | 0.79 | 11.14 | 1.71 |
\n",
+ "\t11 | 500 | 500 | 0 | 1.00 | 0.20 | 0.92 ( 0 ) | 23.83 ( 0.69 ) | 0 ( 0 ) | 28.83 | 0.78 | 0.92 | 23.83 | 0.00 |
\n",
+ "\t12 | 1000 | 500 | 0 | 0.50 | 0.36 | 0.93 ( 0 ) | 11.38 ( 0.41 ) | 0 ( 0 ) | 16.38 | 0.6 | 0.93 | 11.38 | 0.00 |
\n",
+ "\t13 | 50 | 1000 | 0 | 20.00 | 0.09 | 0.67 ( 0.01 ) | 9.77 ( 0.24 ) | 4.03 ( 0.08 ) | 10.74 | 0.81 | 0.67 | 9.77 | 4.03 |
\n",
+ "\t14 | 100 | 1000 | 0 | 10.00 | 0.20 | 0.78 ( 0.01 ) | 11.91 ( 0.8 ) | 2.32 ( 0.1 ) | 14.59 | 0.66 | 0.78 | 11.91 | 2.32 |
\n",
+ "\t15 | 500 | 1000 | 0 | 2.00 | 0.16 | 0.91 ( 0 ) | 30.84 ( 0.92 ) | 0 ( 0 ) | 35.84 | 0.82 | 0.91 | 30.84 | 0.00 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 1.00 | 0.24 | 0.93 ( 0 ) | 19.96 ( 0.61 ) | 0 ( 0 ) | 24.96 | 0.74 | 0.93 | 19.96 | 0.00 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 16 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 1.00 & 0.33 & 0.74 ( 0.01 ) & 3.69 ( 0.25 ) & 2.5 ( 0.12 ) & 6.19 & 0.37 & 0.74 & 3.69 & 2.50\\\\\n",
+ "\t2 & 100 & 50 & 0 & 0.50 & 0.35 & 0.85 ( 0.01 ) & 7.16 ( 0.37 ) & 0.64 ( 0.08 ) & 11.52 & 0.48 & 0.85 & 7.16 & 0.64\\\\\n",
+ "\t3 & 500 & 50 & 0 & 0.10 & 0.53 & 0.93 ( 0 ) & 5.21 ( 0.21 ) & 0 ( 0 ) & 10.21 & 0.39 & 0.93 & 5.21 & 0.00\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 0.05 & 0.80 & 0.93 ( 0 ) & 2.25 ( 0.11 ) & 0 ( 0 ) & 7.25 & 0.15 & 0.93 & 2.25 & 0.00\\\\\n",
+ "\t5 & 50 & 100 & 0 & 2.00 & 0.25 & 0.73 ( 0.01 ) & 5.13 ( 0.34 ) & 2.93 ( 0.1 ) & 7.20 & 0.5 & 0.73 & 5.13 & 2.93\\\\\n",
+ "\t6 & 100 & 100 & 0 & 1.00 & 0.32 & 0.83 ( 0.01 ) & 8.22 ( 0.44 ) & 1.03 ( 0.09 ) & 12.19 & 0.54 & 0.83 & 8.22 & 1.03\\\\\n",
+ "\t7 & 500 & 100 & 0 & 0.20 & 0.40 & 0.92 ( 0 ) & 8.89 ( 0.29 ) & 0 ( 0 ) & 13.89 & 0.55 & 0.92 & 8.89 & 0.00\\\\\n",
+ "\t8 & 1000 & 100 & 0 & 0.10 & 0.69 & 0.93 ( 0 ) & 3.53 ( 0.16 ) & 0 ( 0 ) & 8.53 & 0.27 & 0.93 & 3.53 & 0.00\\\\\n",
+ "\t9 & 50 & 500 & 0 & 10.00 & 0.13 & 0.7 ( 0.01 ) & 8.05 ( 0.28 ) & 3.74 ( 0.09 ) & 9.31 & 0.74 & 0.70 & 8.05 & 3.74\\\\\n",
+ "\t10 & 100 & 500 & 0 & 5.00 & 0.24 & 0.79 ( 0.01 ) & 11.14 ( 0.64 ) & 1.71 ( 0.1 ) & 14.43 & 0.63 & 0.79 & 11.14 & 1.71\\\\\n",
+ "\t11 & 500 & 500 & 0 & 1.00 & 0.20 & 0.92 ( 0 ) & 23.83 ( 0.69 ) & 0 ( 0 ) & 28.83 & 0.78 & 0.92 & 23.83 & 0.00\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 0.50 & 0.36 & 0.93 ( 0 ) & 11.38 ( 0.41 ) & 0 ( 0 ) & 16.38 & 0.6 & 0.93 & 11.38 & 0.00\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 20.00 & 0.09 & 0.67 ( 0.01 ) & 9.77 ( 0.24 ) & 4.03 ( 0.08 ) & 10.74 & 0.81 & 0.67 & 9.77 & 4.03\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 10.00 & 0.20 & 0.78 ( 0.01 ) & 11.91 ( 0.8 ) & 2.32 ( 0.1 ) & 14.59 & 0.66 & 0.78 & 11.91 & 2.32\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 2.00 & 0.16 & 0.91 ( 0 ) & 30.84 ( 0.92 ) & 0 ( 0 ) & 35.84 & 0.82 & 0.91 & 30.84 & 0.00\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 1.00 & 0.24 & 0.93 ( 0 ) & 19.96 ( 0.61 ) & 0 ( 0 ) & 24.96 & 0.74 & 0.93 & 19.96 & 0.00\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 16 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 1.00 | 0.33 | 0.74 ( 0.01 ) | 3.69 ( 0.25 ) | 2.5 ( 0.12 ) | 6.19 | 0.37 | 0.74 | 3.69 | 2.50 |\n",
+ "| 2 | 100 | 50 | 0 | 0.50 | 0.35 | 0.85 ( 0.01 ) | 7.16 ( 0.37 ) | 0.64 ( 0.08 ) | 11.52 | 0.48 | 0.85 | 7.16 | 0.64 |\n",
+ "| 3 | 500 | 50 | 0 | 0.10 | 0.53 | 0.93 ( 0 ) | 5.21 ( 0.21 ) | 0 ( 0 ) | 10.21 | 0.39 | 0.93 | 5.21 | 0.00 |\n",
+ "| 4 | 1000 | 50 | 0 | 0.05 | 0.80 | 0.93 ( 0 ) | 2.25 ( 0.11 ) | 0 ( 0 ) | 7.25 | 0.15 | 0.93 | 2.25 | 0.00 |\n",
+ "| 5 | 50 | 100 | 0 | 2.00 | 0.25 | 0.73 ( 0.01 ) | 5.13 ( 0.34 ) | 2.93 ( 0.1 ) | 7.20 | 0.5 | 0.73 | 5.13 | 2.93 |\n",
+ "| 6 | 100 | 100 | 0 | 1.00 | 0.32 | 0.83 ( 0.01 ) | 8.22 ( 0.44 ) | 1.03 ( 0.09 ) | 12.19 | 0.54 | 0.83 | 8.22 | 1.03 |\n",
+ "| 7 | 500 | 100 | 0 | 0.20 | 0.40 | 0.92 ( 0 ) | 8.89 ( 0.29 ) | 0 ( 0 ) | 13.89 | 0.55 | 0.92 | 8.89 | 0.00 |\n",
+ "| 8 | 1000 | 100 | 0 | 0.10 | 0.69 | 0.93 ( 0 ) | 3.53 ( 0.16 ) | 0 ( 0 ) | 8.53 | 0.27 | 0.93 | 3.53 | 0.00 |\n",
+ "| 9 | 50 | 500 | 0 | 10.00 | 0.13 | 0.7 ( 0.01 ) | 8.05 ( 0.28 ) | 3.74 ( 0.09 ) | 9.31 | 0.74 | 0.70 | 8.05 | 3.74 |\n",
+ "| 10 | 100 | 500 | 0 | 5.00 | 0.24 | 0.79 ( 0.01 ) | 11.14 ( 0.64 ) | 1.71 ( 0.1 ) | 14.43 | 0.63 | 0.79 | 11.14 | 1.71 |\n",
+ "| 11 | 500 | 500 | 0 | 1.00 | 0.20 | 0.92 ( 0 ) | 23.83 ( 0.69 ) | 0 ( 0 ) | 28.83 | 0.78 | 0.92 | 23.83 | 0.00 |\n",
+ "| 12 | 1000 | 500 | 0 | 0.50 | 0.36 | 0.93 ( 0 ) | 11.38 ( 0.41 ) | 0 ( 0 ) | 16.38 | 0.6 | 0.93 | 11.38 | 0.00 |\n",
+ "| 13 | 50 | 1000 | 0 | 20.00 | 0.09 | 0.67 ( 0.01 ) | 9.77 ( 0.24 ) | 4.03 ( 0.08 ) | 10.74 | 0.81 | 0.67 | 9.77 | 4.03 |\n",
+ "| 14 | 100 | 1000 | 0 | 10.00 | 0.20 | 0.78 ( 0.01 ) | 11.91 ( 0.8 ) | 2.32 ( 0.1 ) | 14.59 | 0.66 | 0.78 | 11.91 | 2.32 |\n",
+ "| 15 | 500 | 1000 | 0 | 2.00 | 0.16 | 0.91 ( 0 ) | 30.84 ( 0.92 ) | 0 ( 0 ) | 35.84 | 0.82 | 0.91 | 30.84 | 0.00 |\n",
+ "| 16 | 1000 | 1000 | 0 | 1.00 | 0.24 | 0.93 ( 0 ) | 19.96 ( 0.61 ) | 0 ( 0 ) | 24.96 | 0.74 | 0.93 | 19.96 | 0.00 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN \n",
+ "1 50 50 0 1.00 0.33 0.74 ( 0.01 ) 3.69 ( 0.25 ) 2.5 ( 0.12 ) \n",
+ "2 100 50 0 0.50 0.35 0.85 ( 0.01 ) 7.16 ( 0.37 ) 0.64 ( 0.08 )\n",
+ "3 500 50 0 0.10 0.53 0.93 ( 0 ) 5.21 ( 0.21 ) 0 ( 0 ) \n",
+ "4 1000 50 0 0.05 0.80 0.93 ( 0 ) 2.25 ( 0.11 ) 0 ( 0 ) \n",
+ "5 50 100 0 2.00 0.25 0.73 ( 0.01 ) 5.13 ( 0.34 ) 2.93 ( 0.1 ) \n",
+ "6 100 100 0 1.00 0.32 0.83 ( 0.01 ) 8.22 ( 0.44 ) 1.03 ( 0.09 )\n",
+ "7 500 100 0 0.20 0.40 0.92 ( 0 ) 8.89 ( 0.29 ) 0 ( 0 ) \n",
+ "8 1000 100 0 0.10 0.69 0.93 ( 0 ) 3.53 ( 0.16 ) 0 ( 0 ) \n",
+ "9 50 500 0 10.00 0.13 0.7 ( 0.01 ) 8.05 ( 0.28 ) 3.74 ( 0.09 )\n",
+ "10 100 500 0 5.00 0.24 0.79 ( 0.01 ) 11.14 ( 0.64 ) 1.71 ( 0.1 ) \n",
+ "11 500 500 0 1.00 0.20 0.92 ( 0 ) 23.83 ( 0.69 ) 0 ( 0 ) \n",
+ "12 1000 500 0 0.50 0.36 0.93 ( 0 ) 11.38 ( 0.41 ) 0 ( 0 ) \n",
+ "13 50 1000 0 20.00 0.09 0.67 ( 0.01 ) 9.77 ( 0.24 ) 4.03 ( 0.08 )\n",
+ "14 100 1000 0 10.00 0.20 0.78 ( 0.01 ) 11.91 ( 0.8 ) 2.32 ( 0.1 ) \n",
+ "15 500 1000 0 2.00 0.16 0.91 ( 0 ) 30.84 ( 0.92 ) 0 ( 0 ) \n",
+ "16 1000 1000 0 1.00 0.24 0.93 ( 0 ) 19.96 ( 0.61 ) 0 ( 0 ) \n",
+ " num_select FDR ROC_mean FP_mean FN_mean\n",
+ "1 6.19 0.37 0.74 3.69 2.50 \n",
+ "2 11.52 0.48 0.85 7.16 0.64 \n",
+ "3 10.21 0.39 0.93 5.21 0.00 \n",
+ "4 7.25 0.15 0.93 2.25 0.00 \n",
+ "5 7.20 0.5 0.73 5.13 2.93 \n",
+ "6 12.19 0.54 0.83 8.22 1.03 \n",
+ "7 13.89 0.55 0.92 8.89 0.00 \n",
+ "8 8.53 0.27 0.93 3.53 0.00 \n",
+ "9 9.31 0.74 0.70 8.05 3.74 \n",
+ "10 14.43 0.63 0.79 11.14 1.71 \n",
+ "11 28.83 0.78 0.92 23.83 0.00 \n",
+ "12 16.38 0.6 0.93 11.38 0.00 \n",
+ "13 10.74 0.81 0.67 9.77 4.03 \n",
+ "14 14.59 0.66 0.78 11.91 2.32 \n",
+ "15 35.84 0.82 0.91 30.84 0.00 \n",
+ "16 24.96 0.74 0.93 19.96 0.00 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "result.table_toe\n",
+ "\n",
+ "## export\n",
+ "write.table(result.table_toe, '../results_summary_bin/sim_ind_lasso_binary.txt', \n",
+ " sep='\\t', row.names=F)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.2_sim_ind_elnet_binary_update-checkpoint.ipynb b/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.2_sim_ind_elnet_binary_update-checkpoint.ipynb
new file mode 100644
index 0000000..da8ff21
--- /dev/null
+++ b/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.2_sim_ind_elnet_binary_update-checkpoint.ipynb
@@ -0,0 +1,680 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### summarize elastic net results on Independent Simulation Scenarios for binary outcome"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dir = '/panfs/panfs1.ucsd.edu/panscratch/lij014/Stability_2020/sim_data'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dim.list = list()\n",
+ "size = c(50, 100, 500, 1000)\n",
+ "idx = 0\n",
+ "for (P in size){\n",
+ " for (N in size){\n",
+ " idx = idx + 1\n",
+ " dim.list[[idx]] = c(P=P, N=N)\n",
+ " }\n",
+ "}\n",
+ "\n",
+ "files = NULL\n",
+ "for (dim in dim.list){\n",
+ " p = dim[1]\n",
+ " n = dim[2]\n",
+ " files = cbind(files, paste0(dir, '/sim_independent_', paste('P', p, 'N', n, sep='_'), '.RData'))\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "16"
+ ],
+ "text/latex": [
+ "16"
+ ],
+ "text/markdown": [
+ "16"
+ ],
+ "text/plain": [
+ "[1] 16"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "length(files)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[1] \"indx: 1\"\n",
+ "[1] \"indx: 2\"\n",
+ "[1] \"indx: 3\"\n",
+ "[1] \"indx: 4\"\n",
+ "[1] \"indx: 5\"\n",
+ "[1] \"indx: 6\"\n",
+ "[1] \"indx: 7\"\n",
+ "[1] \"indx: 8\"\n",
+ "[1] \"indx: 9\"\n",
+ "[1] \"indx: 10\"\n",
+ "[1] \"indx: 11\"\n",
+ "[1] \"indx: 12\"\n",
+ "[1] \"indx: 13\"\n",
+ "[1] \"indx: 14\"\n",
+ "[1] \"indx: 15\"\n",
+ "[1] \"indx: 16\"\n"
+ ]
+ }
+ ],
+ "source": [
+ "avg_FDR = NULL\n",
+ "table_toe = NULL\n",
+ "tmp_num_select = rep(0, length(files))\n",
+ "for (i in 1:length(files)){\n",
+ " print(paste0('indx: ', i))\n",
+ " load(paste0(dir, '/binary_update/ind_Elnet_binary_', i, '.RData')) \n",
+ " \n",
+ " table_toe = rbind(table_toe, results_ind_elnet[c('n', 'p', 'rou', 'FP', 'FN', 'ROC', 'Stab')])\n",
+ " tmp_num_select[i] = mean(rowSums(results_ind_elnet$Stab.table))\n",
+ " \n",
+ " # calculate FDR\n",
+ " load(file_name, dat <- new.env())\n",
+ " sub = dat$sim_array[[i]]\n",
+ " p = sub$p # take true values from 1st replicate of each simulated data\n",
+ " coef = sub$beta\n",
+ " coef.true = which(coef != 0)\n",
+ " \n",
+ " tt = results_ind_elnet$Stab.table\n",
+ " FDR = NULL # false positive rate\n",
+ " for (r in 1:nrow(tt)){\n",
+ " FDR = c(FDR, length(setdiff(which(tt[r, ] !=0), coef.true))/sum(tt[r, ]))\n",
+ "\n",
+ " }\n",
+ " \n",
+ " avg_FDR = c(avg_FDR, mean(FDR, na.rm=T))\n",
+ "}\n",
+ "table_toe = as.data.frame(table_toe)\n",
+ "table_toe$num_select = tmp_num_select\n",
+ "table_toe$FDR = round(avg_FDR,2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "\t | n | p | rou | FP | FN | ROC | Stab | num_select | FDR |
\n",
+ "\t | <list> | <list> | <list> | <list> | <list> | <list> | <list> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 12.04 ( 0.89 ) | 1.42 ( 0.11 ) | 0.76 ( 0.01 ) | 0.15 | 15.62 | 0.61 |
\n",
+ "\t2 | 100 | 50 | 0 | 11.44 ( 0.76 ) | 0.46 ( 0.07 ) | 0.84 ( 0.01 ) | 0.23 | 15.98 | 0.56 |
\n",
+ "\t3 | 500 | 50 | 0 | 8.74 ( 0.72 ) | 0 ( 0 ) | 0.92 ( 0 ) | 0.36 | 13.74 | 0.45 |
\n",
+ "\t4 | 1000 | 50 | 0 | 7.56 ( 0.72 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.41 | 12.56 | 0.40 |
\n",
+ "\t5 | 50 | 100 | 0 | 18.92 ( 1.7 ) | 1.81 ( 0.12 ) | 0.72 ( 0.01 ) | 0.1 | 22.11 | 0.72 |
\n",
+ "\t6 | 100 | 100 | 0 | 12.48 ( 0.9 ) | 0.96 ( 0.09 ) | 0.82 ( 0.01 ) | 0.22 | 16.52 | 0.58 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 9\n",
+ "\\begin{tabular}{r|lllllllll}\n",
+ " & n & p & rou & FP & FN & ROC & Stab & num\\_select & FDR\\\\\n",
+ " & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 12.04 ( 0.89 ) & 1.42 ( 0.11 ) & 0.76 ( 0.01 ) & 0.15 & 15.62 & 0.61\\\\\n",
+ "\t2 & 100 & 50 & 0 & 11.44 ( 0.76 ) & 0.46 ( 0.07 ) & 0.84 ( 0.01 ) & 0.23 & 15.98 & 0.56\\\\\n",
+ "\t3 & 500 & 50 & 0 & 8.74 ( 0.72 ) & 0 ( 0 ) & 0.92 ( 0 ) & 0.36 & 13.74 & 0.45\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 7.56 ( 0.72 ) & 0 ( 0 ) & 0.93 ( 0 ) & 0.41 & 12.56 & 0.40\\\\\n",
+ "\t5 & 50 & 100 & 0 & 18.92 ( 1.7 ) & 1.81 ( 0.12 ) & 0.72 ( 0.01 ) & 0.1 & 22.11 & 0.72\\\\\n",
+ "\t6 & 100 & 100 & 0 & 12.48 ( 0.9 ) & 0.96 ( 0.09 ) & 0.82 ( 0.01 ) & 0.22 & 16.52 & 0.58\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "| | n <list> | p <list> | rou <list> | FP <list> | FN <list> | ROC <list> | Stab <list> | num_select <dbl> | FDR <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 12.04 ( 0.89 ) | 1.42 ( 0.11 ) | 0.76 ( 0.01 ) | 0.15 | 15.62 | 0.61 |\n",
+ "| 2 | 100 | 50 | 0 | 11.44 ( 0.76 ) | 0.46 ( 0.07 ) | 0.84 ( 0.01 ) | 0.23 | 15.98 | 0.56 |\n",
+ "| 3 | 500 | 50 | 0 | 8.74 ( 0.72 ) | 0 ( 0 ) | 0.92 ( 0 ) | 0.36 | 13.74 | 0.45 |\n",
+ "| 4 | 1000 | 50 | 0 | 7.56 ( 0.72 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.41 | 12.56 | 0.40 |\n",
+ "| 5 | 50 | 100 | 0 | 18.92 ( 1.7 ) | 1.81 ( 0.12 ) | 0.72 ( 0.01 ) | 0.1 | 22.11 | 0.72 |\n",
+ "| 6 | 100 | 100 | 0 | 12.48 ( 0.9 ) | 0.96 ( 0.09 ) | 0.82 ( 0.01 ) | 0.22 | 16.52 | 0.58 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " n p rou FP FN ROC Stab num_select FDR \n",
+ "1 50 50 0 12.04 ( 0.89 ) 1.42 ( 0.11 ) 0.76 ( 0.01 ) 0.15 15.62 0.61\n",
+ "2 100 50 0 11.44 ( 0.76 ) 0.46 ( 0.07 ) 0.84 ( 0.01 ) 0.23 15.98 0.56\n",
+ "3 500 50 0 8.74 ( 0.72 ) 0 ( 0 ) 0.92 ( 0 ) 0.36 13.74 0.45\n",
+ "4 1000 50 0 7.56 ( 0.72 ) 0 ( 0 ) 0.93 ( 0 ) 0.41 12.56 0.40\n",
+ "5 50 100 0 18.92 ( 1.7 ) 1.81 ( 0.12 ) 0.72 ( 0.01 ) 0.1 22.11 0.72\n",
+ "6 100 100 0 12.48 ( 0.9 ) 0.96 ( 0.09 ) 0.82 ( 0.01 ) 0.22 16.52 0.58"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "\t | n | p | rou | FP | FN | ROC | Stab | num_select | FDR |
\n",
+ "\t | <list> | <list> | <list> | <list> | <list> | <list> | <list> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t11 | 500 | 500 | 0 | 16.79 ( 1.47 ) | 0 ( 0 ) | 0.92 ( 0 ) | 0.27 | 21.79 | 0.61 |
\n",
+ "\t12 | 1000 | 500 | 0 | 11.47 ( 1.14 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.36 | 16.47 | 0.48 |
\n",
+ "\t13 | 50 | 1000 | 0 | 72.88 ( 8.55 ) | 2.89 ( 0.13 ) | 0.67 ( 0.01 ) | 0.02 | 74.99 | 0.91 |
\n",
+ "\t14 | 100 | 1000 | 0 | 29.95 ( 4.05 ) | 2 ( 0.1 ) | 0.76 ( 0.01 ) | 0.09 | 32.95 | 0.77 |
\n",
+ "\t15 | 500 | 1000 | 0 | 19.47 ( 1.76 ) | 0 ( 0 ) | 0.91 ( 0 ) | 0.24 | 24.47 | 0.62 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 18.47 ( 1.76 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.25 | 23.47 | 0.58 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 9\n",
+ "\\begin{tabular}{r|lllllllll}\n",
+ " & n & p & rou & FP & FN & ROC & Stab & num\\_select & FDR\\\\\n",
+ " & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t11 & 500 & 500 & 0 & 16.79 ( 1.47 ) & 0 ( 0 ) & 0.92 ( 0 ) & 0.27 & 21.79 & 0.61\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 11.47 ( 1.14 ) & 0 ( 0 ) & 0.93 ( 0 ) & 0.36 & 16.47 & 0.48\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 72.88 ( 8.55 ) & 2.89 ( 0.13 ) & 0.67 ( 0.01 ) & 0.02 & 74.99 & 0.91\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 29.95 ( 4.05 ) & 2 ( 0.1 ) & 0.76 ( 0.01 ) & 0.09 & 32.95 & 0.77\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 19.47 ( 1.76 ) & 0 ( 0 ) & 0.91 ( 0 ) & 0.24 & 24.47 & 0.62\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 18.47 ( 1.76 ) & 0 ( 0 ) & 0.93 ( 0 ) & 0.25 & 23.47 & 0.58\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "| | n <list> | p <list> | rou <list> | FP <list> | FN <list> | ROC <list> | Stab <list> | num_select <dbl> | FDR <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 11 | 500 | 500 | 0 | 16.79 ( 1.47 ) | 0 ( 0 ) | 0.92 ( 0 ) | 0.27 | 21.79 | 0.61 |\n",
+ "| 12 | 1000 | 500 | 0 | 11.47 ( 1.14 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.36 | 16.47 | 0.48 |\n",
+ "| 13 | 50 | 1000 | 0 | 72.88 ( 8.55 ) | 2.89 ( 0.13 ) | 0.67 ( 0.01 ) | 0.02 | 74.99 | 0.91 |\n",
+ "| 14 | 100 | 1000 | 0 | 29.95 ( 4.05 ) | 2 ( 0.1 ) | 0.76 ( 0.01 ) | 0.09 | 32.95 | 0.77 |\n",
+ "| 15 | 500 | 1000 | 0 | 19.47 ( 1.76 ) | 0 ( 0 ) | 0.91 ( 0 ) | 0.24 | 24.47 | 0.62 |\n",
+ "| 16 | 1000 | 1000 | 0 | 18.47 ( 1.76 ) | 0 ( 0 ) | 0.93 ( 0 ) | 0.25 | 23.47 | 0.58 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " n p rou FP FN ROC Stab num_select\n",
+ "11 500 500 0 16.79 ( 1.47 ) 0 ( 0 ) 0.92 ( 0 ) 0.27 21.79 \n",
+ "12 1000 500 0 11.47 ( 1.14 ) 0 ( 0 ) 0.93 ( 0 ) 0.36 16.47 \n",
+ "13 50 1000 0 72.88 ( 8.55 ) 2.89 ( 0.13 ) 0.67 ( 0.01 ) 0.02 74.99 \n",
+ "14 100 1000 0 29.95 ( 4.05 ) 2 ( 0.1 ) 0.76 ( 0.01 ) 0.09 32.95 \n",
+ "15 500 1000 0 19.47 ( 1.76 ) 0 ( 0 ) 0.91 ( 0 ) 0.24 24.47 \n",
+ "16 1000 1000 0 18.47 ( 1.76 ) 0 ( 0 ) 0.93 ( 0 ) 0.25 23.47 \n",
+ " FDR \n",
+ "11 0.61\n",
+ "12 0.48\n",
+ "13 0.91\n",
+ "14 0.77\n",
+ "15 0.62\n",
+ "16 0.58"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "tail(table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "extract_numeric() is deprecated: please use readr::parse_number() instead\n",
+ "\n",
+ "extract_numeric() is deprecated: please use readr::parse_number() instead\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "# export result\n",
+ "result.table_toe <- apply(table_toe,2,as.character)\n",
+ "rownames(result.table_toe) = rownames(table_toe)\n",
+ "result.table_toe = as.data.frame(result.table_toe)\n",
+ "\n",
+ "# extract numbers only for 'n' & 'p'\n",
+ "result.table_toe$n = tidyr::extract_numeric(result.table_toe$n)\n",
+ "result.table_toe$p = tidyr::extract_numeric(result.table_toe$p)\n",
+ "result.table_toe$ratio = result.table_toe$p / result.table_toe$n\n",
+ "\n",
+ "result.table_toe = result.table_toe[c('n', 'p', 'rou', 'ratio', 'Stab', 'ROC', 'FP', 'FN', 'num_select', 'FDR')]\n",
+ "colnames(result.table_toe)[1:4] = c('N', 'P', 'Corr', 'Ratio')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# convert interested measurements to be numeric\n",
+ "result.table_toe$Stab = as.numeric(as.character(result.table_toe$Stab))\n",
+ "result.table_toe$num_select = as.numeric(as.character(result.table_toe$num_select))\n",
+ "\n",
+ "result.table_toe$ROC_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$ROC))\n",
+ "result.table_toe$FP_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$FP))\n",
+ "result.table_toe$FN_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$FN))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 0 × 13\n",
+ "\n",
+ "\tN | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t<dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 0 × 13\n",
+ "\\begin{tabular}{lllllllllllll}\n",
+ " N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 0 × 13\n",
+ "\n",
+ "| N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN num_select FDR ROC_mean FP_mean FN_mean"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# check whether missing values exists\n",
+ "result.table_toe[rowSums(is.na(result.table_toe)) > 0,]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 1.00 | 0.15 | 0.76 ( 0.01 ) | 12.04 ( 0.89 ) | 1.42 ( 0.11 ) | 15.62 | 0.61 | 0.76 | 12.04 | 1.42 |
\n",
+ "\t2 | 100 | 50 | 0 | 0.50 | 0.23 | 0.84 ( 0.01 ) | 11.44 ( 0.76 ) | 0.46 ( 0.07 ) | 15.98 | 0.56 | 0.84 | 11.44 | 0.46 |
\n",
+ "\t3 | 500 | 50 | 0 | 0.10 | 0.36 | 0.92 ( 0 ) | 8.74 ( 0.72 ) | 0 ( 0 ) | 13.74 | 0.45 | 0.92 | 8.74 | 0.00 |
\n",
+ "\t4 | 1000 | 50 | 0 | 0.05 | 0.41 | 0.93 ( 0 ) | 7.56 ( 0.72 ) | 0 ( 0 ) | 12.56 | 0.4 | 0.93 | 7.56 | 0.00 |
\n",
+ "\t5 | 50 | 100 | 0 | 2.00 | 0.10 | 0.72 ( 0.01 ) | 18.92 ( 1.7 ) | 1.81 ( 0.12 ) | 22.11 | 0.72 | 0.72 | 18.92 | 1.81 |
\n",
+ "\t6 | 100 | 100 | 0 | 1.00 | 0.22 | 0.82 ( 0.01 ) | 12.48 ( 0.9 ) | 0.96 ( 0.09 ) | 16.52 | 0.58 | 0.82 | 12.48 | 0.96 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 1.00 & 0.15 & 0.76 ( 0.01 ) & 12.04 ( 0.89 ) & 1.42 ( 0.11 ) & 15.62 & 0.61 & 0.76 & 12.04 & 1.42\\\\\n",
+ "\t2 & 100 & 50 & 0 & 0.50 & 0.23 & 0.84 ( 0.01 ) & 11.44 ( 0.76 ) & 0.46 ( 0.07 ) & 15.98 & 0.56 & 0.84 & 11.44 & 0.46\\\\\n",
+ "\t3 & 500 & 50 & 0 & 0.10 & 0.36 & 0.92 ( 0 ) & 8.74 ( 0.72 ) & 0 ( 0 ) & 13.74 & 0.45 & 0.92 & 8.74 & 0.00\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 0.05 & 0.41 & 0.93 ( 0 ) & 7.56 ( 0.72 ) & 0 ( 0 ) & 12.56 & 0.4 & 0.93 & 7.56 & 0.00\\\\\n",
+ "\t5 & 50 & 100 & 0 & 2.00 & 0.10 & 0.72 ( 0.01 ) & 18.92 ( 1.7 ) & 1.81 ( 0.12 ) & 22.11 & 0.72 & 0.72 & 18.92 & 1.81\\\\\n",
+ "\t6 & 100 & 100 & 0 & 1.00 & 0.22 & 0.82 ( 0.01 ) & 12.48 ( 0.9 ) & 0.96 ( 0.09 ) & 16.52 & 0.58 & 0.82 & 12.48 & 0.96\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 1.00 | 0.15 | 0.76 ( 0.01 ) | 12.04 ( 0.89 ) | 1.42 ( 0.11 ) | 15.62 | 0.61 | 0.76 | 12.04 | 1.42 |\n",
+ "| 2 | 100 | 50 | 0 | 0.50 | 0.23 | 0.84 ( 0.01 ) | 11.44 ( 0.76 ) | 0.46 ( 0.07 ) | 15.98 | 0.56 | 0.84 | 11.44 | 0.46 |\n",
+ "| 3 | 500 | 50 | 0 | 0.10 | 0.36 | 0.92 ( 0 ) | 8.74 ( 0.72 ) | 0 ( 0 ) | 13.74 | 0.45 | 0.92 | 8.74 | 0.00 |\n",
+ "| 4 | 1000 | 50 | 0 | 0.05 | 0.41 | 0.93 ( 0 ) | 7.56 ( 0.72 ) | 0 ( 0 ) | 12.56 | 0.4 | 0.93 | 7.56 | 0.00 |\n",
+ "| 5 | 50 | 100 | 0 | 2.00 | 0.10 | 0.72 ( 0.01 ) | 18.92 ( 1.7 ) | 1.81 ( 0.12 ) | 22.11 | 0.72 | 0.72 | 18.92 | 1.81 |\n",
+ "| 6 | 100 | 100 | 0 | 1.00 | 0.22 | 0.82 ( 0.01 ) | 12.48 ( 0.9 ) | 0.96 ( 0.09 ) | 16.52 | 0.58 | 0.82 | 12.48 | 0.96 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN \n",
+ "1 50 50 0 1.00 0.15 0.76 ( 0.01 ) 12.04 ( 0.89 ) 1.42 ( 0.11 )\n",
+ "2 100 50 0 0.50 0.23 0.84 ( 0.01 ) 11.44 ( 0.76 ) 0.46 ( 0.07 )\n",
+ "3 500 50 0 0.10 0.36 0.92 ( 0 ) 8.74 ( 0.72 ) 0 ( 0 ) \n",
+ "4 1000 50 0 0.05 0.41 0.93 ( 0 ) 7.56 ( 0.72 ) 0 ( 0 ) \n",
+ "5 50 100 0 2.00 0.10 0.72 ( 0.01 ) 18.92 ( 1.7 ) 1.81 ( 0.12 )\n",
+ "6 100 100 0 1.00 0.22 0.82 ( 0.01 ) 12.48 ( 0.9 ) 0.96 ( 0.09 )\n",
+ " num_select FDR ROC_mean FP_mean FN_mean\n",
+ "1 15.62 0.61 0.76 12.04 1.42 \n",
+ "2 15.98 0.56 0.84 11.44 0.46 \n",
+ "3 13.74 0.45 0.92 8.74 0.00 \n",
+ "4 12.56 0.4 0.93 7.56 0.00 \n",
+ "5 22.11 0.72 0.72 18.92 1.81 \n",
+ "6 16.52 0.58 0.82 12.48 0.96 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(result.table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t11 | 500 | 500 | 0 | 1.0 | 0.27 | 0.92 ( 0 ) | 16.79 ( 1.47 ) | 0 ( 0 ) | 21.79 | 0.61 | 0.92 | 16.79 | 0.00 |
\n",
+ "\t12 | 1000 | 500 | 0 | 0.5 | 0.36 | 0.93 ( 0 ) | 11.47 ( 1.14 ) | 0 ( 0 ) | 16.47 | 0.48 | 0.93 | 11.47 | 0.00 |
\n",
+ "\t13 | 50 | 1000 | 0 | 20.0 | 0.02 | 0.67 ( 0.01 ) | 72.88 ( 8.55 ) | 2.89 ( 0.13 ) | 74.99 | 0.91 | 0.67 | 72.88 | 2.89 |
\n",
+ "\t14 | 100 | 1000 | 0 | 10.0 | 0.09 | 0.76 ( 0.01 ) | 29.95 ( 4.05 ) | 2 ( 0.1 ) | 32.95 | 0.77 | 0.76 | 29.95 | 2.00 |
\n",
+ "\t15 | 500 | 1000 | 0 | 2.0 | 0.24 | 0.91 ( 0 ) | 19.47 ( 1.76 ) | 0 ( 0 ) | 24.47 | 0.62 | 0.91 | 19.47 | 0.00 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 1.0 | 0.25 | 0.93 ( 0 ) | 18.47 ( 1.76 ) | 0 ( 0 ) | 23.47 | 0.58 | 0.93 | 18.47 | 0.00 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t11 & 500 & 500 & 0 & 1.0 & 0.27 & 0.92 ( 0 ) & 16.79 ( 1.47 ) & 0 ( 0 ) & 21.79 & 0.61 & 0.92 & 16.79 & 0.00\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 0.5 & 0.36 & 0.93 ( 0 ) & 11.47 ( 1.14 ) & 0 ( 0 ) & 16.47 & 0.48 & 0.93 & 11.47 & 0.00\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 20.0 & 0.02 & 0.67 ( 0.01 ) & 72.88 ( 8.55 ) & 2.89 ( 0.13 ) & 74.99 & 0.91 & 0.67 & 72.88 & 2.89\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 10.0 & 0.09 & 0.76 ( 0.01 ) & 29.95 ( 4.05 ) & 2 ( 0.1 ) & 32.95 & 0.77 & 0.76 & 29.95 & 2.00\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 2.0 & 0.24 & 0.91 ( 0 ) & 19.47 ( 1.76 ) & 0 ( 0 ) & 24.47 & 0.62 & 0.91 & 19.47 & 0.00\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 1.0 & 0.25 & 0.93 ( 0 ) & 18.47 ( 1.76 ) & 0 ( 0 ) & 23.47 & 0.58 & 0.93 & 18.47 & 0.00\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 11 | 500 | 500 | 0 | 1.0 | 0.27 | 0.92 ( 0 ) | 16.79 ( 1.47 ) | 0 ( 0 ) | 21.79 | 0.61 | 0.92 | 16.79 | 0.00 |\n",
+ "| 12 | 1000 | 500 | 0 | 0.5 | 0.36 | 0.93 ( 0 ) | 11.47 ( 1.14 ) | 0 ( 0 ) | 16.47 | 0.48 | 0.93 | 11.47 | 0.00 |\n",
+ "| 13 | 50 | 1000 | 0 | 20.0 | 0.02 | 0.67 ( 0.01 ) | 72.88 ( 8.55 ) | 2.89 ( 0.13 ) | 74.99 | 0.91 | 0.67 | 72.88 | 2.89 |\n",
+ "| 14 | 100 | 1000 | 0 | 10.0 | 0.09 | 0.76 ( 0.01 ) | 29.95 ( 4.05 ) | 2 ( 0.1 ) | 32.95 | 0.77 | 0.76 | 29.95 | 2.00 |\n",
+ "| 15 | 500 | 1000 | 0 | 2.0 | 0.24 | 0.91 ( 0 ) | 19.47 ( 1.76 ) | 0 ( 0 ) | 24.47 | 0.62 | 0.91 | 19.47 | 0.00 |\n",
+ "| 16 | 1000 | 1000 | 0 | 1.0 | 0.25 | 0.93 ( 0 ) | 18.47 ( 1.76 ) | 0 ( 0 ) | 23.47 | 0.58 | 0.93 | 18.47 | 0.00 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN \n",
+ "11 500 500 0 1.0 0.27 0.92 ( 0 ) 16.79 ( 1.47 ) 0 ( 0 ) \n",
+ "12 1000 500 0 0.5 0.36 0.93 ( 0 ) 11.47 ( 1.14 ) 0 ( 0 ) \n",
+ "13 50 1000 0 20.0 0.02 0.67 ( 0.01 ) 72.88 ( 8.55 ) 2.89 ( 0.13 )\n",
+ "14 100 1000 0 10.0 0.09 0.76 ( 0.01 ) 29.95 ( 4.05 ) 2 ( 0.1 ) \n",
+ "15 500 1000 0 2.0 0.24 0.91 ( 0 ) 19.47 ( 1.76 ) 0 ( 0 ) \n",
+ "16 1000 1000 0 1.0 0.25 0.93 ( 0 ) 18.47 ( 1.76 ) 0 ( 0 ) \n",
+ " num_select FDR ROC_mean FP_mean FN_mean\n",
+ "11 21.79 0.61 0.92 16.79 0.00 \n",
+ "12 16.47 0.48 0.93 11.47 0.00 \n",
+ "13 74.99 0.91 0.67 72.88 2.89 \n",
+ "14 32.95 0.77 0.76 29.95 2.00 \n",
+ "15 24.47 0.62 0.91 19.47 0.00 \n",
+ "16 23.47 0.58 0.93 18.47 0.00 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "tail(result.table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 16 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 1.00 | 0.15 | 0.76 ( 0.01 ) | 12.04 ( 0.89 ) | 1.42 ( 0.11 ) | 15.62 | 0.61 | 0.76 | 12.04 | 1.42 |
\n",
+ "\t2 | 100 | 50 | 0 | 0.50 | 0.23 | 0.84 ( 0.01 ) | 11.44 ( 0.76 ) | 0.46 ( 0.07 ) | 15.98 | 0.56 | 0.84 | 11.44 | 0.46 |
\n",
+ "\t3 | 500 | 50 | 0 | 0.10 | 0.36 | 0.92 ( 0 ) | 8.74 ( 0.72 ) | 0 ( 0 ) | 13.74 | 0.45 | 0.92 | 8.74 | 0.00 |
\n",
+ "\t4 | 1000 | 50 | 0 | 0.05 | 0.41 | 0.93 ( 0 ) | 7.56 ( 0.72 ) | 0 ( 0 ) | 12.56 | 0.4 | 0.93 | 7.56 | 0.00 |
\n",
+ "\t5 | 50 | 100 | 0 | 2.00 | 0.10 | 0.72 ( 0.01 ) | 18.92 ( 1.7 ) | 1.81 ( 0.12 ) | 22.11 | 0.72 | 0.72 | 18.92 | 1.81 |
\n",
+ "\t6 | 100 | 100 | 0 | 1.00 | 0.22 | 0.82 ( 0.01 ) | 12.48 ( 0.9 ) | 0.96 ( 0.09 ) | 16.52 | 0.58 | 0.82 | 12.48 | 0.96 |
\n",
+ "\t7 | 500 | 100 | 0 | 0.20 | 0.32 | 0.92 ( 0 ) | 11.78 ( 0.99 ) | 0 ( 0 ) | 16.78 | 0.52 | 0.92 | 11.78 | 0.00 |
\n",
+ "\t8 | 1000 | 100 | 0 | 0.10 | 0.34 | 0.93 ( 0 ) | 10.71 ( 1.03 ) | 0 ( 0 ) | 15.71 | 0.47 | 0.93 | 10.71 | 0.00 |
\n",
+ "\t9 | 50 | 500 | 0 | 10.00 | 0.03 | 0.68 ( 0.01 ) | 57.17 ( 5.72 ) | 2.27 ( 0.13 ) | 59.90 | 0.88 | 0.68 | 57.17 | 2.27 |
\n",
+ "\t10 | 100 | 500 | 0 | 5.00 | 0.15 | 0.77 ( 0.01 ) | 20.84 ( 2.08 ) | 1.49 ( 0.1 ) | 24.35 | 0.69 | 0.77 | 20.84 | 1.49 |
\n",
+ "\t11 | 500 | 500 | 0 | 1.00 | 0.27 | 0.92 ( 0 ) | 16.79 ( 1.47 ) | 0 ( 0 ) | 21.79 | 0.61 | 0.92 | 16.79 | 0.00 |
\n",
+ "\t12 | 1000 | 500 | 0 | 0.50 | 0.36 | 0.93 ( 0 ) | 11.47 ( 1.14 ) | 0 ( 0 ) | 16.47 | 0.48 | 0.93 | 11.47 | 0.00 |
\n",
+ "\t13 | 50 | 1000 | 0 | 20.00 | 0.02 | 0.67 ( 0.01 ) | 72.88 ( 8.55 ) | 2.89 ( 0.13 ) | 74.99 | 0.91 | 0.67 | 72.88 | 2.89 |
\n",
+ "\t14 | 100 | 1000 | 0 | 10.00 | 0.09 | 0.76 ( 0.01 ) | 29.95 ( 4.05 ) | 2 ( 0.1 ) | 32.95 | 0.77 | 0.76 | 29.95 | 2.00 |
\n",
+ "\t15 | 500 | 1000 | 0 | 2.00 | 0.24 | 0.91 ( 0 ) | 19.47 ( 1.76 ) | 0 ( 0 ) | 24.47 | 0.62 | 0.91 | 19.47 | 0.00 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 1.00 | 0.25 | 0.93 ( 0 ) | 18.47 ( 1.76 ) | 0 ( 0 ) | 23.47 | 0.58 | 0.93 | 18.47 | 0.00 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 16 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 1.00 & 0.15 & 0.76 ( 0.01 ) & 12.04 ( 0.89 ) & 1.42 ( 0.11 ) & 15.62 & 0.61 & 0.76 & 12.04 & 1.42\\\\\n",
+ "\t2 & 100 & 50 & 0 & 0.50 & 0.23 & 0.84 ( 0.01 ) & 11.44 ( 0.76 ) & 0.46 ( 0.07 ) & 15.98 & 0.56 & 0.84 & 11.44 & 0.46\\\\\n",
+ "\t3 & 500 & 50 & 0 & 0.10 & 0.36 & 0.92 ( 0 ) & 8.74 ( 0.72 ) & 0 ( 0 ) & 13.74 & 0.45 & 0.92 & 8.74 & 0.00\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 0.05 & 0.41 & 0.93 ( 0 ) & 7.56 ( 0.72 ) & 0 ( 0 ) & 12.56 & 0.4 & 0.93 & 7.56 & 0.00\\\\\n",
+ "\t5 & 50 & 100 & 0 & 2.00 & 0.10 & 0.72 ( 0.01 ) & 18.92 ( 1.7 ) & 1.81 ( 0.12 ) & 22.11 & 0.72 & 0.72 & 18.92 & 1.81\\\\\n",
+ "\t6 & 100 & 100 & 0 & 1.00 & 0.22 & 0.82 ( 0.01 ) & 12.48 ( 0.9 ) & 0.96 ( 0.09 ) & 16.52 & 0.58 & 0.82 & 12.48 & 0.96\\\\\n",
+ "\t7 & 500 & 100 & 0 & 0.20 & 0.32 & 0.92 ( 0 ) & 11.78 ( 0.99 ) & 0 ( 0 ) & 16.78 & 0.52 & 0.92 & 11.78 & 0.00\\\\\n",
+ "\t8 & 1000 & 100 & 0 & 0.10 & 0.34 & 0.93 ( 0 ) & 10.71 ( 1.03 ) & 0 ( 0 ) & 15.71 & 0.47 & 0.93 & 10.71 & 0.00\\\\\n",
+ "\t9 & 50 & 500 & 0 & 10.00 & 0.03 & 0.68 ( 0.01 ) & 57.17 ( 5.72 ) & 2.27 ( 0.13 ) & 59.90 & 0.88 & 0.68 & 57.17 & 2.27\\\\\n",
+ "\t10 & 100 & 500 & 0 & 5.00 & 0.15 & 0.77 ( 0.01 ) & 20.84 ( 2.08 ) & 1.49 ( 0.1 ) & 24.35 & 0.69 & 0.77 & 20.84 & 1.49\\\\\n",
+ "\t11 & 500 & 500 & 0 & 1.00 & 0.27 & 0.92 ( 0 ) & 16.79 ( 1.47 ) & 0 ( 0 ) & 21.79 & 0.61 & 0.92 & 16.79 & 0.00\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 0.50 & 0.36 & 0.93 ( 0 ) & 11.47 ( 1.14 ) & 0 ( 0 ) & 16.47 & 0.48 & 0.93 & 11.47 & 0.00\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 20.00 & 0.02 & 0.67 ( 0.01 ) & 72.88 ( 8.55 ) & 2.89 ( 0.13 ) & 74.99 & 0.91 & 0.67 & 72.88 & 2.89\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 10.00 & 0.09 & 0.76 ( 0.01 ) & 29.95 ( 4.05 ) & 2 ( 0.1 ) & 32.95 & 0.77 & 0.76 & 29.95 & 2.00\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 2.00 & 0.24 & 0.91 ( 0 ) & 19.47 ( 1.76 ) & 0 ( 0 ) & 24.47 & 0.62 & 0.91 & 19.47 & 0.00\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 1.00 & 0.25 & 0.93 ( 0 ) & 18.47 ( 1.76 ) & 0 ( 0 ) & 23.47 & 0.58 & 0.93 & 18.47 & 0.00\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 16 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 1.00 | 0.15 | 0.76 ( 0.01 ) | 12.04 ( 0.89 ) | 1.42 ( 0.11 ) | 15.62 | 0.61 | 0.76 | 12.04 | 1.42 |\n",
+ "| 2 | 100 | 50 | 0 | 0.50 | 0.23 | 0.84 ( 0.01 ) | 11.44 ( 0.76 ) | 0.46 ( 0.07 ) | 15.98 | 0.56 | 0.84 | 11.44 | 0.46 |\n",
+ "| 3 | 500 | 50 | 0 | 0.10 | 0.36 | 0.92 ( 0 ) | 8.74 ( 0.72 ) | 0 ( 0 ) | 13.74 | 0.45 | 0.92 | 8.74 | 0.00 |\n",
+ "| 4 | 1000 | 50 | 0 | 0.05 | 0.41 | 0.93 ( 0 ) | 7.56 ( 0.72 ) | 0 ( 0 ) | 12.56 | 0.4 | 0.93 | 7.56 | 0.00 |\n",
+ "| 5 | 50 | 100 | 0 | 2.00 | 0.10 | 0.72 ( 0.01 ) | 18.92 ( 1.7 ) | 1.81 ( 0.12 ) | 22.11 | 0.72 | 0.72 | 18.92 | 1.81 |\n",
+ "| 6 | 100 | 100 | 0 | 1.00 | 0.22 | 0.82 ( 0.01 ) | 12.48 ( 0.9 ) | 0.96 ( 0.09 ) | 16.52 | 0.58 | 0.82 | 12.48 | 0.96 |\n",
+ "| 7 | 500 | 100 | 0 | 0.20 | 0.32 | 0.92 ( 0 ) | 11.78 ( 0.99 ) | 0 ( 0 ) | 16.78 | 0.52 | 0.92 | 11.78 | 0.00 |\n",
+ "| 8 | 1000 | 100 | 0 | 0.10 | 0.34 | 0.93 ( 0 ) | 10.71 ( 1.03 ) | 0 ( 0 ) | 15.71 | 0.47 | 0.93 | 10.71 | 0.00 |\n",
+ "| 9 | 50 | 500 | 0 | 10.00 | 0.03 | 0.68 ( 0.01 ) | 57.17 ( 5.72 ) | 2.27 ( 0.13 ) | 59.90 | 0.88 | 0.68 | 57.17 | 2.27 |\n",
+ "| 10 | 100 | 500 | 0 | 5.00 | 0.15 | 0.77 ( 0.01 ) | 20.84 ( 2.08 ) | 1.49 ( 0.1 ) | 24.35 | 0.69 | 0.77 | 20.84 | 1.49 |\n",
+ "| 11 | 500 | 500 | 0 | 1.00 | 0.27 | 0.92 ( 0 ) | 16.79 ( 1.47 ) | 0 ( 0 ) | 21.79 | 0.61 | 0.92 | 16.79 | 0.00 |\n",
+ "| 12 | 1000 | 500 | 0 | 0.50 | 0.36 | 0.93 ( 0 ) | 11.47 ( 1.14 ) | 0 ( 0 ) | 16.47 | 0.48 | 0.93 | 11.47 | 0.00 |\n",
+ "| 13 | 50 | 1000 | 0 | 20.00 | 0.02 | 0.67 ( 0.01 ) | 72.88 ( 8.55 ) | 2.89 ( 0.13 ) | 74.99 | 0.91 | 0.67 | 72.88 | 2.89 |\n",
+ "| 14 | 100 | 1000 | 0 | 10.00 | 0.09 | 0.76 ( 0.01 ) | 29.95 ( 4.05 ) | 2 ( 0.1 ) | 32.95 | 0.77 | 0.76 | 29.95 | 2.00 |\n",
+ "| 15 | 500 | 1000 | 0 | 2.00 | 0.24 | 0.91 ( 0 ) | 19.47 ( 1.76 ) | 0 ( 0 ) | 24.47 | 0.62 | 0.91 | 19.47 | 0.00 |\n",
+ "| 16 | 1000 | 1000 | 0 | 1.00 | 0.25 | 0.93 ( 0 ) | 18.47 ( 1.76 ) | 0 ( 0 ) | 23.47 | 0.58 | 0.93 | 18.47 | 0.00 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN \n",
+ "1 50 50 0 1.00 0.15 0.76 ( 0.01 ) 12.04 ( 0.89 ) 1.42 ( 0.11 )\n",
+ "2 100 50 0 0.50 0.23 0.84 ( 0.01 ) 11.44 ( 0.76 ) 0.46 ( 0.07 )\n",
+ "3 500 50 0 0.10 0.36 0.92 ( 0 ) 8.74 ( 0.72 ) 0 ( 0 ) \n",
+ "4 1000 50 0 0.05 0.41 0.93 ( 0 ) 7.56 ( 0.72 ) 0 ( 0 ) \n",
+ "5 50 100 0 2.00 0.10 0.72 ( 0.01 ) 18.92 ( 1.7 ) 1.81 ( 0.12 )\n",
+ "6 100 100 0 1.00 0.22 0.82 ( 0.01 ) 12.48 ( 0.9 ) 0.96 ( 0.09 )\n",
+ "7 500 100 0 0.20 0.32 0.92 ( 0 ) 11.78 ( 0.99 ) 0 ( 0 ) \n",
+ "8 1000 100 0 0.10 0.34 0.93 ( 0 ) 10.71 ( 1.03 ) 0 ( 0 ) \n",
+ "9 50 500 0 10.00 0.03 0.68 ( 0.01 ) 57.17 ( 5.72 ) 2.27 ( 0.13 )\n",
+ "10 100 500 0 5.00 0.15 0.77 ( 0.01 ) 20.84 ( 2.08 ) 1.49 ( 0.1 ) \n",
+ "11 500 500 0 1.00 0.27 0.92 ( 0 ) 16.79 ( 1.47 ) 0 ( 0 ) \n",
+ "12 1000 500 0 0.50 0.36 0.93 ( 0 ) 11.47 ( 1.14 ) 0 ( 0 ) \n",
+ "13 50 1000 0 20.00 0.02 0.67 ( 0.01 ) 72.88 ( 8.55 ) 2.89 ( 0.13 )\n",
+ "14 100 1000 0 10.00 0.09 0.76 ( 0.01 ) 29.95 ( 4.05 ) 2 ( 0.1 ) \n",
+ "15 500 1000 0 2.00 0.24 0.91 ( 0 ) 19.47 ( 1.76 ) 0 ( 0 ) \n",
+ "16 1000 1000 0 1.00 0.25 0.93 ( 0 ) 18.47 ( 1.76 ) 0 ( 0 ) \n",
+ " num_select FDR ROC_mean FP_mean FN_mean\n",
+ "1 15.62 0.61 0.76 12.04 1.42 \n",
+ "2 15.98 0.56 0.84 11.44 0.46 \n",
+ "3 13.74 0.45 0.92 8.74 0.00 \n",
+ "4 12.56 0.4 0.93 7.56 0.00 \n",
+ "5 22.11 0.72 0.72 18.92 1.81 \n",
+ "6 16.52 0.58 0.82 12.48 0.96 \n",
+ "7 16.78 0.52 0.92 11.78 0.00 \n",
+ "8 15.71 0.47 0.93 10.71 0.00 \n",
+ "9 59.90 0.88 0.68 57.17 2.27 \n",
+ "10 24.35 0.69 0.77 20.84 1.49 \n",
+ "11 21.79 0.61 0.92 16.79 0.00 \n",
+ "12 16.47 0.48 0.93 11.47 0.00 \n",
+ "13 74.99 0.91 0.67 72.88 2.89 \n",
+ "14 32.95 0.77 0.76 29.95 2.00 \n",
+ "15 24.47 0.62 0.91 19.47 0.00 \n",
+ "16 23.47 0.58 0.93 18.47 0.00 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "result.table_toe\n",
+ "\n",
+ "## export\n",
+ "write.table(result.table_toe, '../results_summary_bin/sim_ind_Elnet_binary.txt', sep='\\t', row.names=F)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.3_sim_ind_rf_binary_update-checkpoint.ipynb b/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.3_sim_ind_rf_binary_update-checkpoint.ipynb
new file mode 100644
index 0000000..6dfa2c6
--- /dev/null
+++ b/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.3_sim_ind_rf_binary_update-checkpoint.ipynb
@@ -0,0 +1,694 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### summarize random forests results on Independent Simulation Scenarios for binary outcome"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dir = '/panfs/panfs1.ucsd.edu/panscratch/lij014/Stability_2020/sim_data'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dim.list = list()\n",
+ "size = c(50, 100, 500, 1000)\n",
+ "idx = 0\n",
+ "for (P in size){\n",
+ " for (N in size){\n",
+ " idx = idx + 1\n",
+ " dim.list[[idx]] = c(P=P, N=N)\n",
+ " }\n",
+ "}\n",
+ "\n",
+ "files = NULL\n",
+ "for (dim in dim.list){\n",
+ " p = dim[1]\n",
+ " n = dim[2]\n",
+ " files = cbind(files, paste0(dir, '/sim_independent_', paste('P', p, 'N', n, sep='_'), '.RData'))\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "16"
+ ],
+ "text/latex": [
+ "16"
+ ],
+ "text/markdown": [
+ "16"
+ ],
+ "text/plain": [
+ "[1] 16"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "length(files)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[1] \"indx: 1\"\n",
+ "[1] \"indx: 2\"\n",
+ "[1] \"indx: 3\"\n",
+ "[1] \"indx: 4\"\n",
+ "[1] \"indx: 5\"\n",
+ "[1] \"indx: 6\"\n",
+ "[1] \"indx: 7\"\n",
+ "[1] \"indx: 8\"\n",
+ "[1] \"indx: 9\"\n",
+ "[1] \"indx: 10\"\n",
+ "[1] \"indx: 11\"\n",
+ "[1] \"indx: 12\"\n",
+ "[1] \"indx: 13\"\n",
+ "[1] \"indx: 14\"\n",
+ "[1] \"indx: 15\"\n",
+ "[1] \"indx: 16\"\n"
+ ]
+ }
+ ],
+ "source": [
+ "avg_FDR = NULL\n",
+ "table_toe = NULL\n",
+ "tmp_num_select = rep(0, length(files))\n",
+ "for (i in 1:length(files)){\n",
+ " print(paste0('indx: ', i))\n",
+ " load(paste0(dir, '/binary_update/ind_RF_binary_', i, '.RData')) \n",
+ " \n",
+ " table_toe = rbind(table_toe, results_ind_rf[c('n', 'p', 'rou', 'FP', 'FN', 'ROC', 'Stab')])\n",
+ " tmp_num_select[i] = mean(rowSums(results_ind_rf$Stab.table))\n",
+ " \n",
+ " # calculate FDR\n",
+ " load(file_name, dat <- new.env())\n",
+ " sub = dat$sim_array[[i]]\n",
+ " p = sub$p # take true values from 1st replicate of each simulated data\n",
+ " coef = sub$beta\n",
+ " coef.true = which(coef != 0)\n",
+ " \n",
+ " tt = results_ind_rf$Stab.table\n",
+ " FDR = NULL # false positive rate\n",
+ " for (r in 1:nrow(tt)){\n",
+ " FDR = c(FDR, length(setdiff(which(tt[r, ] !=0), coef.true))/sum(tt[r, ]))\n",
+ "\n",
+ " }\n",
+ " \n",
+ " avg_FDR = c(avg_FDR, mean(FDR, na.rm=T))\n",
+ "}\n",
+ "table_toe = as.data.frame(table_toe)\n",
+ "table_toe$num_select = tmp_num_select\n",
+ "table_toe$FDR = round(avg_FDR,2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "\t | n | p | rou | FP | FN | ROC | Stab | num_select | FDR |
\n",
+ "\t | <list> | <list> | <list> | <list> | <list> | <list> | <list> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 1 ( 0 ) | 6 ( 0 ) | 1 ( 0 ) | NaN | 0.00 | NaN |
\n",
+ "\t2 | 100 | 50 | 0 | 1.97 ( 0.13 ) | 4.62 ( 0.08 ) | 1 ( 0 ) | 0.1 | 3.34 | 0.55 |
\n",
+ "\t3 | 500 | 50 | 0 | 1.26 ( 0.12 ) | 2.19 ( 0.08 ) | 1 ( 0 ) | 0.54 | 5.07 | 0.22 |
\n",
+ "\t4 | 1000 | 50 | 0 | 0.86 ( 0.09 ) | 1.12 ( 0.08 ) | 1 ( 0 ) | 0.73 | 5.74 | 0.13 |
\n",
+ "\t5 | 50 | 100 | 0 | 1 ( 0 ) | 6 ( 0 ) | 1 ( 0 ) | NaN | 0.00 | NaN |
\n",
+ "\t6 | 100 | 100 | 0 | 4.29 ( 0.17 ) | 4.79 ( 0.09 ) | 1 ( 0 ) | 0.05 | 5.50 | 0.77 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 9\n",
+ "\\begin{tabular}{r|lllllllll}\n",
+ " & n & p & rou & FP & FN & ROC & Stab & num\\_select & FDR\\\\\n",
+ " & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 1 ( 0 ) & 6 ( 0 ) & 1 ( 0 ) & NaN & 0.00 & NaN\\\\\n",
+ "\t2 & 100 & 50 & 0 & 1.97 ( 0.13 ) & 4.62 ( 0.08 ) & 1 ( 0 ) & 0.1 & 3.34 & 0.55\\\\\n",
+ "\t3 & 500 & 50 & 0 & 1.26 ( 0.12 ) & 2.19 ( 0.08 ) & 1 ( 0 ) & 0.54 & 5.07 & 0.22\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 0.86 ( 0.09 ) & 1.12 ( 0.08 ) & 1 ( 0 ) & 0.73 & 5.74 & 0.13\\\\\n",
+ "\t5 & 50 & 100 & 0 & 1 ( 0 ) & 6 ( 0 ) & 1 ( 0 ) & NaN & 0.00 & NaN\\\\\n",
+ "\t6 & 100 & 100 & 0 & 4.29 ( 0.17 ) & 4.79 ( 0.09 ) & 1 ( 0 ) & 0.05 & 5.50 & 0.77\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "| | n <list> | p <list> | rou <list> | FP <list> | FN <list> | ROC <list> | Stab <list> | num_select <dbl> | FDR <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 1 ( 0 ) | 6 ( 0 ) | 1 ( 0 ) | NaN | 0.00 | NaN |\n",
+ "| 2 | 100 | 50 | 0 | 1.97 ( 0.13 ) | 4.62 ( 0.08 ) | 1 ( 0 ) | 0.1 | 3.34 | 0.55 |\n",
+ "| 3 | 500 | 50 | 0 | 1.26 ( 0.12 ) | 2.19 ( 0.08 ) | 1 ( 0 ) | 0.54 | 5.07 | 0.22 |\n",
+ "| 4 | 1000 | 50 | 0 | 0.86 ( 0.09 ) | 1.12 ( 0.08 ) | 1 ( 0 ) | 0.73 | 5.74 | 0.13 |\n",
+ "| 5 | 50 | 100 | 0 | 1 ( 0 ) | 6 ( 0 ) | 1 ( 0 ) | NaN | 0.00 | NaN |\n",
+ "| 6 | 100 | 100 | 0 | 4.29 ( 0.17 ) | 4.79 ( 0.09 ) | 1 ( 0 ) | 0.05 | 5.50 | 0.77 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " n p rou FP FN ROC Stab num_select FDR \n",
+ "1 50 50 0 1 ( 0 ) 6 ( 0 ) 1 ( 0 ) NaN 0.00 NaN\n",
+ "2 100 50 0 1.97 ( 0.13 ) 4.62 ( 0.08 ) 1 ( 0 ) 0.1 3.34 0.55\n",
+ "3 500 50 0 1.26 ( 0.12 ) 2.19 ( 0.08 ) 1 ( 0 ) 0.54 5.07 0.22\n",
+ "4 1000 50 0 0.86 ( 0.09 ) 1.12 ( 0.08 ) 1 ( 0 ) 0.73 5.74 0.13\n",
+ "5 50 100 0 1 ( 0 ) 6 ( 0 ) 1 ( 0 ) NaN 0.00 NaN\n",
+ "6 100 100 0 4.29 ( 0.17 ) 4.79 ( 0.09 ) 1 ( 0 ) 0.05 5.50 0.77"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "\t | n | p | rou | FP | FN | ROC | Stab | num_select | FDR |
\n",
+ "\t | <list> | <list> | <list> | <list> | <list> | <list> | <list> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t11 | 500 | 500 | 0 | 23.08 ( 0.51 ) | 2.34 ( 0.1 ) | 1 ( 0 ) | 0.09 | 26.74 | 0.86 |
\n",
+ "\t12 | 1000 | 500 | 0 | 22.72 ( 0.46 ) | 1.34 ( 0.08 ) | 1 ( 0 ) | 0.14 | 27.38 | 0.83 |
\n",
+ "\t13 | 50 | 1000 | 0 | 1 ( 0 ) | 6 ( 0 ) | 1 ( 0 ) | NaN | 0.00 | NaN |
\n",
+ "\t14 | 100 | 1000 | 0 | 46.76 ( 0.67 ) | 4.86 ( 0.09 ) | 1 ( 0 ) | 0.01 | 47.90 | 0.98 |
\n",
+ "\t15 | 500 | 1000 | 0 | 47.82 ( 0.66 ) | 2.5 ( 0.1 ) | 1 ( 0 ) | 0.04 | 51.32 | 0.93 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 47.14 ( 0.77 ) | 1.46 ( 0.08 ) | 1 ( 0 ) | 0.07 | 51.68 | 0.91 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 9\n",
+ "\\begin{tabular}{r|lllllllll}\n",
+ " & n & p & rou & FP & FN & ROC & Stab & num\\_select & FDR\\\\\n",
+ " & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t11 & 500 & 500 & 0 & 23.08 ( 0.51 ) & 2.34 ( 0.1 ) & 1 ( 0 ) & 0.09 & 26.74 & 0.86\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 22.72 ( 0.46 ) & 1.34 ( 0.08 ) & 1 ( 0 ) & 0.14 & 27.38 & 0.83\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 1 ( 0 ) & 6 ( 0 ) & 1 ( 0 ) & NaN & 0.00 & NaN\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 46.76 ( 0.67 ) & 4.86 ( 0.09 ) & 1 ( 0 ) & 0.01 & 47.90 & 0.98\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 47.82 ( 0.66 ) & 2.5 ( 0.1 ) & 1 ( 0 ) & 0.04 & 51.32 & 0.93\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 47.14 ( 0.77 ) & 1.46 ( 0.08 ) & 1 ( 0 ) & 0.07 & 51.68 & 0.91\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "| | n <list> | p <list> | rou <list> | FP <list> | FN <list> | ROC <list> | Stab <list> | num_select <dbl> | FDR <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 11 | 500 | 500 | 0 | 23.08 ( 0.51 ) | 2.34 ( 0.1 ) | 1 ( 0 ) | 0.09 | 26.74 | 0.86 |\n",
+ "| 12 | 1000 | 500 | 0 | 22.72 ( 0.46 ) | 1.34 ( 0.08 ) | 1 ( 0 ) | 0.14 | 27.38 | 0.83 |\n",
+ "| 13 | 50 | 1000 | 0 | 1 ( 0 ) | 6 ( 0 ) | 1 ( 0 ) | NaN | 0.00 | NaN |\n",
+ "| 14 | 100 | 1000 | 0 | 46.76 ( 0.67 ) | 4.86 ( 0.09 ) | 1 ( 0 ) | 0.01 | 47.90 | 0.98 |\n",
+ "| 15 | 500 | 1000 | 0 | 47.82 ( 0.66 ) | 2.5 ( 0.1 ) | 1 ( 0 ) | 0.04 | 51.32 | 0.93 |\n",
+ "| 16 | 1000 | 1000 | 0 | 47.14 ( 0.77 ) | 1.46 ( 0.08 ) | 1 ( 0 ) | 0.07 | 51.68 | 0.91 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " n p rou FP FN ROC Stab num_select FDR \n",
+ "11 500 500 0 23.08 ( 0.51 ) 2.34 ( 0.1 ) 1 ( 0 ) 0.09 26.74 0.86\n",
+ "12 1000 500 0 22.72 ( 0.46 ) 1.34 ( 0.08 ) 1 ( 0 ) 0.14 27.38 0.83\n",
+ "13 50 1000 0 1 ( 0 ) 6 ( 0 ) 1 ( 0 ) NaN 0.00 NaN\n",
+ "14 100 1000 0 46.76 ( 0.67 ) 4.86 ( 0.09 ) 1 ( 0 ) 0.01 47.90 0.98\n",
+ "15 500 1000 0 47.82 ( 0.66 ) 2.5 ( 0.1 ) 1 ( 0 ) 0.04 51.32 0.93\n",
+ "16 1000 1000 0 47.14 ( 0.77 ) 1.46 ( 0.08 ) 1 ( 0 ) 0.07 51.68 0.91"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "tail(table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "extract_numeric() is deprecated: please use readr::parse_number() instead\n",
+ "\n",
+ "extract_numeric() is deprecated: please use readr::parse_number() instead\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "# export result\n",
+ "result.table_toe <- apply(table_toe,2,as.character)\n",
+ "rownames(result.table_toe) = rownames(table_toe)\n",
+ "result.table_toe = as.data.frame(result.table_toe)\n",
+ "\n",
+ "# extract numbers only for 'n' & 'p'\n",
+ "result.table_toe$n = tidyr::extract_numeric(result.table_toe$n)\n",
+ "result.table_toe$p = tidyr::extract_numeric(result.table_toe$p)\n",
+ "result.table_toe$ratio = result.table_toe$p / result.table_toe$n\n",
+ "\n",
+ "result.table_toe = result.table_toe[c('n', 'p', 'rou', 'ratio', 'Stab', 'ROC', 'FP', 'FN', 'num_select', 'FDR')]\n",
+ "colnames(result.table_toe)[1:4] = c('N', 'P', 'Corr', 'Ratio')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# convert interested measurements to be numeric\n",
+ "result.table_toe$Stab = as.numeric(as.character(result.table_toe$Stab))\n",
+ "result.table_toe$num_select = as.numeric(as.character(result.table_toe$num_select))\n",
+ "\n",
+ "result.table_toe$ROC_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$ROC))\n",
+ "result.table_toe$FP_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$FP))\n",
+ "result.table_toe$FN_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$FN))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 4 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 1 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0 | NaN | 1 | 1 | 6 |
\n",
+ "\t5 | 50 | 100 | 0 | 2 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0 | NaN | 1 | 1 | 6 |
\n",
+ "\t9 | 50 | 500 | 0 | 10 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0 | NaN | 1 | 1 | 6 |
\n",
+ "\t13 | 50 | 1000 | 0 | 20 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0 | NaN | 1 | 1 | 6 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 4 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 1 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0 & NaN & 1 & 1 & 6\\\\\n",
+ "\t5 & 50 & 100 & 0 & 2 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0 & NaN & 1 & 1 & 6\\\\\n",
+ "\t9 & 50 & 500 & 0 & 10 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0 & NaN & 1 & 1 & 6\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 20 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0 & NaN & 1 & 1 & 6\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 4 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 1 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0 | NaN | 1 | 1 | 6 |\n",
+ "| 5 | 50 | 100 | 0 | 2 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0 | NaN | 1 | 1 | 6 |\n",
+ "| 9 | 50 | 500 | 0 | 10 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0 | NaN | 1 | 1 | 6 |\n",
+ "| 13 | 50 | 1000 | 0 | 20 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0 | NaN | 1 | 1 | 6 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN num_select FDR ROC_mean\n",
+ "1 50 50 0 1 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0 NaN 1 \n",
+ "5 50 100 0 2 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0 NaN 1 \n",
+ "9 50 500 0 10 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0 NaN 1 \n",
+ "13 50 1000 0 20 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0 NaN 1 \n",
+ " FP_mean FN_mean\n",
+ "1 1 6 \n",
+ "5 1 6 \n",
+ "9 1 6 \n",
+ "13 1 6 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# check whether missing values exists\n",
+ "result.table_toe[rowSums(is.na(result.table_toe)) > 0,]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 1.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |
\n",
+ "\t2 | 100 | 50 | 0 | 0.50 | 0.10 | 1 ( 0 ) | 1.97 ( 0.13 ) | 4.62 ( 0.08 ) | 3.34 | 0.55 | 1 | 1.97 | 4.62 |
\n",
+ "\t3 | 500 | 50 | 0 | 0.10 | 0.54 | 1 ( 0 ) | 1.26 ( 0.12 ) | 2.19 ( 0.08 ) | 5.07 | 0.22 | 1 | 1.26 | 2.19 |
\n",
+ "\t4 | 1000 | 50 | 0 | 0.05 | 0.73 | 1 ( 0 ) | 0.86 ( 0.09 ) | 1.12 ( 0.08 ) | 5.74 | 0.13 | 1 | 0.86 | 1.12 |
\n",
+ "\t5 | 50 | 100 | 0 | 2.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |
\n",
+ "\t6 | 100 | 100 | 0 | 1.00 | 0.05 | 1 ( 0 ) | 4.29 ( 0.17 ) | 4.79 ( 0.09 ) | 5.50 | 0.77 | 1 | 4.29 | 4.79 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 1.00 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0.00 & NaN & 1 & 1.00 & 6.00\\\\\n",
+ "\t2 & 100 & 50 & 0 & 0.50 & 0.10 & 1 ( 0 ) & 1.97 ( 0.13 ) & 4.62 ( 0.08 ) & 3.34 & 0.55 & 1 & 1.97 & 4.62\\\\\n",
+ "\t3 & 500 & 50 & 0 & 0.10 & 0.54 & 1 ( 0 ) & 1.26 ( 0.12 ) & 2.19 ( 0.08 ) & 5.07 & 0.22 & 1 & 1.26 & 2.19\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 0.05 & 0.73 & 1 ( 0 ) & 0.86 ( 0.09 ) & 1.12 ( 0.08 ) & 5.74 & 0.13 & 1 & 0.86 & 1.12\\\\\n",
+ "\t5 & 50 & 100 & 0 & 2.00 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0.00 & NaN & 1 & 1.00 & 6.00\\\\\n",
+ "\t6 & 100 & 100 & 0 & 1.00 & 0.05 & 1 ( 0 ) & 4.29 ( 0.17 ) & 4.79 ( 0.09 ) & 5.50 & 0.77 & 1 & 4.29 & 4.79\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 1.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |\n",
+ "| 2 | 100 | 50 | 0 | 0.50 | 0.10 | 1 ( 0 ) | 1.97 ( 0.13 ) | 4.62 ( 0.08 ) | 3.34 | 0.55 | 1 | 1.97 | 4.62 |\n",
+ "| 3 | 500 | 50 | 0 | 0.10 | 0.54 | 1 ( 0 ) | 1.26 ( 0.12 ) | 2.19 ( 0.08 ) | 5.07 | 0.22 | 1 | 1.26 | 2.19 |\n",
+ "| 4 | 1000 | 50 | 0 | 0.05 | 0.73 | 1 ( 0 ) | 0.86 ( 0.09 ) | 1.12 ( 0.08 ) | 5.74 | 0.13 | 1 | 0.86 | 1.12 |\n",
+ "| 5 | 50 | 100 | 0 | 2.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |\n",
+ "| 6 | 100 | 100 | 0 | 1.00 | 0.05 | 1 ( 0 ) | 4.29 ( 0.17 ) | 4.79 ( 0.09 ) | 5.50 | 0.77 | 1 | 4.29 | 4.79 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN num_select FDR \n",
+ "1 50 50 0 1.00 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0.00 NaN \n",
+ "2 100 50 0 0.50 0.10 1 ( 0 ) 1.97 ( 0.13 ) 4.62 ( 0.08 ) 3.34 0.55\n",
+ "3 500 50 0 0.10 0.54 1 ( 0 ) 1.26 ( 0.12 ) 2.19 ( 0.08 ) 5.07 0.22\n",
+ "4 1000 50 0 0.05 0.73 1 ( 0 ) 0.86 ( 0.09 ) 1.12 ( 0.08 ) 5.74 0.13\n",
+ "5 50 100 0 2.00 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0.00 NaN \n",
+ "6 100 100 0 1.00 0.05 1 ( 0 ) 4.29 ( 0.17 ) 4.79 ( 0.09 ) 5.50 0.77\n",
+ " ROC_mean FP_mean FN_mean\n",
+ "1 1 1.00 6.00 \n",
+ "2 1 1.97 4.62 \n",
+ "3 1 1.26 2.19 \n",
+ "4 1 0.86 1.12 \n",
+ "5 1 1.00 6.00 \n",
+ "6 1 4.29 4.79 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(result.table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t11 | 500 | 500 | 0 | 1.0 | 0.09 | 1 ( 0 ) | 23.08 ( 0.51 ) | 2.34 ( 0.1 ) | 26.74 | 0.86 | 1 | 23.08 | 2.34 |
\n",
+ "\t12 | 1000 | 500 | 0 | 0.5 | 0.14 | 1 ( 0 ) | 22.72 ( 0.46 ) | 1.34 ( 0.08 ) | 27.38 | 0.83 | 1 | 22.72 | 1.34 |
\n",
+ "\t13 | 50 | 1000 | 0 | 20.0 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |
\n",
+ "\t14 | 100 | 1000 | 0 | 10.0 | 0.01 | 1 ( 0 ) | 46.76 ( 0.67 ) | 4.86 ( 0.09 ) | 47.90 | 0.98 | 1 | 46.76 | 4.86 |
\n",
+ "\t15 | 500 | 1000 | 0 | 2.0 | 0.04 | 1 ( 0 ) | 47.82 ( 0.66 ) | 2.5 ( 0.1 ) | 51.32 | 0.93 | 1 | 47.82 | 2.50 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 1.0 | 0.07 | 1 ( 0 ) | 47.14 ( 0.77 ) | 1.46 ( 0.08 ) | 51.68 | 0.91 | 1 | 47.14 | 1.46 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t11 & 500 & 500 & 0 & 1.0 & 0.09 & 1 ( 0 ) & 23.08 ( 0.51 ) & 2.34 ( 0.1 ) & 26.74 & 0.86 & 1 & 23.08 & 2.34\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 0.5 & 0.14 & 1 ( 0 ) & 22.72 ( 0.46 ) & 1.34 ( 0.08 ) & 27.38 & 0.83 & 1 & 22.72 & 1.34\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 20.0 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0.00 & NaN & 1 & 1.00 & 6.00\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 10.0 & 0.01 & 1 ( 0 ) & 46.76 ( 0.67 ) & 4.86 ( 0.09 ) & 47.90 & 0.98 & 1 & 46.76 & 4.86\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 2.0 & 0.04 & 1 ( 0 ) & 47.82 ( 0.66 ) & 2.5 ( 0.1 ) & 51.32 & 0.93 & 1 & 47.82 & 2.50\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 1.0 & 0.07 & 1 ( 0 ) & 47.14 ( 0.77 ) & 1.46 ( 0.08 ) & 51.68 & 0.91 & 1 & 47.14 & 1.46\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 11 | 500 | 500 | 0 | 1.0 | 0.09 | 1 ( 0 ) | 23.08 ( 0.51 ) | 2.34 ( 0.1 ) | 26.74 | 0.86 | 1 | 23.08 | 2.34 |\n",
+ "| 12 | 1000 | 500 | 0 | 0.5 | 0.14 | 1 ( 0 ) | 22.72 ( 0.46 ) | 1.34 ( 0.08 ) | 27.38 | 0.83 | 1 | 22.72 | 1.34 |\n",
+ "| 13 | 50 | 1000 | 0 | 20.0 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |\n",
+ "| 14 | 100 | 1000 | 0 | 10.0 | 0.01 | 1 ( 0 ) | 46.76 ( 0.67 ) | 4.86 ( 0.09 ) | 47.90 | 0.98 | 1 | 46.76 | 4.86 |\n",
+ "| 15 | 500 | 1000 | 0 | 2.0 | 0.04 | 1 ( 0 ) | 47.82 ( 0.66 ) | 2.5 ( 0.1 ) | 51.32 | 0.93 | 1 | 47.82 | 2.50 |\n",
+ "| 16 | 1000 | 1000 | 0 | 1.0 | 0.07 | 1 ( 0 ) | 47.14 ( 0.77 ) | 1.46 ( 0.08 ) | 51.68 | 0.91 | 1 | 47.14 | 1.46 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN num_select\n",
+ "11 500 500 0 1.0 0.09 1 ( 0 ) 23.08 ( 0.51 ) 2.34 ( 0.1 ) 26.74 \n",
+ "12 1000 500 0 0.5 0.14 1 ( 0 ) 22.72 ( 0.46 ) 1.34 ( 0.08 ) 27.38 \n",
+ "13 50 1000 0 20.0 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0.00 \n",
+ "14 100 1000 0 10.0 0.01 1 ( 0 ) 46.76 ( 0.67 ) 4.86 ( 0.09 ) 47.90 \n",
+ "15 500 1000 0 2.0 0.04 1 ( 0 ) 47.82 ( 0.66 ) 2.5 ( 0.1 ) 51.32 \n",
+ "16 1000 1000 0 1.0 0.07 1 ( 0 ) 47.14 ( 0.77 ) 1.46 ( 0.08 ) 51.68 \n",
+ " FDR ROC_mean FP_mean FN_mean\n",
+ "11 0.86 1 23.08 2.34 \n",
+ "12 0.83 1 22.72 1.34 \n",
+ "13 NaN 1 1.00 6.00 \n",
+ "14 0.98 1 46.76 4.86 \n",
+ "15 0.93 1 47.82 2.50 \n",
+ "16 0.91 1 47.14 1.46 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "tail(result.table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 16 × 13\n",
+ "\n",
+ "\t | N | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 1.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |
\n",
+ "\t2 | 100 | 50 | 0 | 0.50 | 0.10 | 1 ( 0 ) | 1.97 ( 0.13 ) | 4.62 ( 0.08 ) | 3.34 | 0.55 | 1 | 1.97 | 4.62 |
\n",
+ "\t3 | 500 | 50 | 0 | 0.10 | 0.54 | 1 ( 0 ) | 1.26 ( 0.12 ) | 2.19 ( 0.08 ) | 5.07 | 0.22 | 1 | 1.26 | 2.19 |
\n",
+ "\t4 | 1000 | 50 | 0 | 0.05 | 0.73 | 1 ( 0 ) | 0.86 ( 0.09 ) | 1.12 ( 0.08 ) | 5.74 | 0.13 | 1 | 0.86 | 1.12 |
\n",
+ "\t5 | 50 | 100 | 0 | 2.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |
\n",
+ "\t6 | 100 | 100 | 0 | 1.00 | 0.05 | 1 ( 0 ) | 4.29 ( 0.17 ) | 4.79 ( 0.09 ) | 5.50 | 0.77 | 1 | 4.29 | 4.79 |
\n",
+ "\t7 | 500 | 100 | 0 | 0.20 | 0.40 | 1 ( 0 ) | 3.42 ( 0.16 ) | 2.01 ( 0.09 ) | 7.41 | 0.44 | 1 | 3.42 | 2.01 |
\n",
+ "\t8 | 1000 | 100 | 0 | 0.10 | 0.49 | 1 ( 0 ) | 3.39 ( 0.19 ) | 1.14 ( 0.07 ) | 8.25 | 0.38 | 1 | 3.39 | 1.14 |
\n",
+ "\t9 | 50 | 500 | 0 | 10.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |
\n",
+ "\t10 | 100 | 500 | 0 | 5.00 | 0.01 | 1 ( 0 ) | 24.68 ( 0.48 ) | 4.5 ( 0.09 ) | 26.18 | 0.94 | 1 | 24.68 | 4.50 |
\n",
+ "\t11 | 500 | 500 | 0 | 1.00 | 0.09 | 1 ( 0 ) | 23.08 ( 0.51 ) | 2.34 ( 0.1 ) | 26.74 | 0.86 | 1 | 23.08 | 2.34 |
\n",
+ "\t12 | 1000 | 500 | 0 | 0.50 | 0.14 | 1 ( 0 ) | 22.72 ( 0.46 ) | 1.34 ( 0.08 ) | 27.38 | 0.83 | 1 | 22.72 | 1.34 |
\n",
+ "\t13 | 50 | 1000 | 0 | 20.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |
\n",
+ "\t14 | 100 | 1000 | 0 | 10.00 | 0.01 | 1 ( 0 ) | 46.76 ( 0.67 ) | 4.86 ( 0.09 ) | 47.90 | 0.98 | 1 | 46.76 | 4.86 |
\n",
+ "\t15 | 500 | 1000 | 0 | 2.00 | 0.04 | 1 ( 0 ) | 47.82 ( 0.66 ) | 2.5 ( 0.1 ) | 51.32 | 0.93 | 1 | 47.82 | 2.50 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 1.00 | 0.07 | 1 ( 0 ) | 47.14 ( 0.77 ) | 1.46 ( 0.08 ) | 51.68 | 0.91 | 1 | 47.14 | 1.46 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 16 × 13\n",
+ "\\begin{tabular}{r|lllllllllllll}\n",
+ " & N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 1.00 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0.00 & NaN & 1 & 1.00 & 6.00\\\\\n",
+ "\t2 & 100 & 50 & 0 & 0.50 & 0.10 & 1 ( 0 ) & 1.97 ( 0.13 ) & 4.62 ( 0.08 ) & 3.34 & 0.55 & 1 & 1.97 & 4.62\\\\\n",
+ "\t3 & 500 & 50 & 0 & 0.10 & 0.54 & 1 ( 0 ) & 1.26 ( 0.12 ) & 2.19 ( 0.08 ) & 5.07 & 0.22 & 1 & 1.26 & 2.19\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 0.05 & 0.73 & 1 ( 0 ) & 0.86 ( 0.09 ) & 1.12 ( 0.08 ) & 5.74 & 0.13 & 1 & 0.86 & 1.12\\\\\n",
+ "\t5 & 50 & 100 & 0 & 2.00 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0.00 & NaN & 1 & 1.00 & 6.00\\\\\n",
+ "\t6 & 100 & 100 & 0 & 1.00 & 0.05 & 1 ( 0 ) & 4.29 ( 0.17 ) & 4.79 ( 0.09 ) & 5.50 & 0.77 & 1 & 4.29 & 4.79\\\\\n",
+ "\t7 & 500 & 100 & 0 & 0.20 & 0.40 & 1 ( 0 ) & 3.42 ( 0.16 ) & 2.01 ( 0.09 ) & 7.41 & 0.44 & 1 & 3.42 & 2.01\\\\\n",
+ "\t8 & 1000 & 100 & 0 & 0.10 & 0.49 & 1 ( 0 ) & 3.39 ( 0.19 ) & 1.14 ( 0.07 ) & 8.25 & 0.38 & 1 & 3.39 & 1.14\\\\\n",
+ "\t9 & 50 & 500 & 0 & 10.00 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0.00 & NaN & 1 & 1.00 & 6.00\\\\\n",
+ "\t10 & 100 & 500 & 0 & 5.00 & 0.01 & 1 ( 0 ) & 24.68 ( 0.48 ) & 4.5 ( 0.09 ) & 26.18 & 0.94 & 1 & 24.68 & 4.50\\\\\n",
+ "\t11 & 500 & 500 & 0 & 1.00 & 0.09 & 1 ( 0 ) & 23.08 ( 0.51 ) & 2.34 ( 0.1 ) & 26.74 & 0.86 & 1 & 23.08 & 2.34\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 0.50 & 0.14 & 1 ( 0 ) & 22.72 ( 0.46 ) & 1.34 ( 0.08 ) & 27.38 & 0.83 & 1 & 22.72 & 1.34\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 20.00 & NaN & 1 ( 0 ) & 1 ( 0 ) & 6 ( 0 ) & 0.00 & NaN & 1 & 1.00 & 6.00\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 10.00 & 0.01 & 1 ( 0 ) & 46.76 ( 0.67 ) & 4.86 ( 0.09 ) & 47.90 & 0.98 & 1 & 46.76 & 4.86\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 2.00 & 0.04 & 1 ( 0 ) & 47.82 ( 0.66 ) & 2.5 ( 0.1 ) & 51.32 & 0.93 & 1 & 47.82 & 2.50\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 1.00 & 0.07 & 1 ( 0 ) & 47.14 ( 0.77 ) & 1.46 ( 0.08 ) & 51.68 & 0.91 & 1 & 47.14 & 1.46\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 16 × 13\n",
+ "\n",
+ "| | N <dbl> | P <dbl> | Corr <fct> | Ratio <dbl> | Stab <dbl> | ROC <fct> | FP <fct> | FN <fct> | num_select <dbl> | FDR <fct> | ROC_mean <dbl> | FP_mean <dbl> | FN_mean <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 1.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |\n",
+ "| 2 | 100 | 50 | 0 | 0.50 | 0.10 | 1 ( 0 ) | 1.97 ( 0.13 ) | 4.62 ( 0.08 ) | 3.34 | 0.55 | 1 | 1.97 | 4.62 |\n",
+ "| 3 | 500 | 50 | 0 | 0.10 | 0.54 | 1 ( 0 ) | 1.26 ( 0.12 ) | 2.19 ( 0.08 ) | 5.07 | 0.22 | 1 | 1.26 | 2.19 |\n",
+ "| 4 | 1000 | 50 | 0 | 0.05 | 0.73 | 1 ( 0 ) | 0.86 ( 0.09 ) | 1.12 ( 0.08 ) | 5.74 | 0.13 | 1 | 0.86 | 1.12 |\n",
+ "| 5 | 50 | 100 | 0 | 2.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |\n",
+ "| 6 | 100 | 100 | 0 | 1.00 | 0.05 | 1 ( 0 ) | 4.29 ( 0.17 ) | 4.79 ( 0.09 ) | 5.50 | 0.77 | 1 | 4.29 | 4.79 |\n",
+ "| 7 | 500 | 100 | 0 | 0.20 | 0.40 | 1 ( 0 ) | 3.42 ( 0.16 ) | 2.01 ( 0.09 ) | 7.41 | 0.44 | 1 | 3.42 | 2.01 |\n",
+ "| 8 | 1000 | 100 | 0 | 0.10 | 0.49 | 1 ( 0 ) | 3.39 ( 0.19 ) | 1.14 ( 0.07 ) | 8.25 | 0.38 | 1 | 3.39 | 1.14 |\n",
+ "| 9 | 50 | 500 | 0 | 10.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |\n",
+ "| 10 | 100 | 500 | 0 | 5.00 | 0.01 | 1 ( 0 ) | 24.68 ( 0.48 ) | 4.5 ( 0.09 ) | 26.18 | 0.94 | 1 | 24.68 | 4.50 |\n",
+ "| 11 | 500 | 500 | 0 | 1.00 | 0.09 | 1 ( 0 ) | 23.08 ( 0.51 ) | 2.34 ( 0.1 ) | 26.74 | 0.86 | 1 | 23.08 | 2.34 |\n",
+ "| 12 | 1000 | 500 | 0 | 0.50 | 0.14 | 1 ( 0 ) | 22.72 ( 0.46 ) | 1.34 ( 0.08 ) | 27.38 | 0.83 | 1 | 22.72 | 1.34 |\n",
+ "| 13 | 50 | 1000 | 0 | 20.00 | NaN | 1 ( 0 ) | 1 ( 0 ) | 6 ( 0 ) | 0.00 | NaN | 1 | 1.00 | 6.00 |\n",
+ "| 14 | 100 | 1000 | 0 | 10.00 | 0.01 | 1 ( 0 ) | 46.76 ( 0.67 ) | 4.86 ( 0.09 ) | 47.90 | 0.98 | 1 | 46.76 | 4.86 |\n",
+ "| 15 | 500 | 1000 | 0 | 2.00 | 0.04 | 1 ( 0 ) | 47.82 ( 0.66 ) | 2.5 ( 0.1 ) | 51.32 | 0.93 | 1 | 47.82 | 2.50 |\n",
+ "| 16 | 1000 | 1000 | 0 | 1.00 | 0.07 | 1 ( 0 ) | 47.14 ( 0.77 ) | 1.46 ( 0.08 ) | 51.68 | 0.91 | 1 | 47.14 | 1.46 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " N P Corr Ratio Stab ROC FP FN num_select\n",
+ "1 50 50 0 1.00 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0.00 \n",
+ "2 100 50 0 0.50 0.10 1 ( 0 ) 1.97 ( 0.13 ) 4.62 ( 0.08 ) 3.34 \n",
+ "3 500 50 0 0.10 0.54 1 ( 0 ) 1.26 ( 0.12 ) 2.19 ( 0.08 ) 5.07 \n",
+ "4 1000 50 0 0.05 0.73 1 ( 0 ) 0.86 ( 0.09 ) 1.12 ( 0.08 ) 5.74 \n",
+ "5 50 100 0 2.00 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0.00 \n",
+ "6 100 100 0 1.00 0.05 1 ( 0 ) 4.29 ( 0.17 ) 4.79 ( 0.09 ) 5.50 \n",
+ "7 500 100 0 0.20 0.40 1 ( 0 ) 3.42 ( 0.16 ) 2.01 ( 0.09 ) 7.41 \n",
+ "8 1000 100 0 0.10 0.49 1 ( 0 ) 3.39 ( 0.19 ) 1.14 ( 0.07 ) 8.25 \n",
+ "9 50 500 0 10.00 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0.00 \n",
+ "10 100 500 0 5.00 0.01 1 ( 0 ) 24.68 ( 0.48 ) 4.5 ( 0.09 ) 26.18 \n",
+ "11 500 500 0 1.00 0.09 1 ( 0 ) 23.08 ( 0.51 ) 2.34 ( 0.1 ) 26.74 \n",
+ "12 1000 500 0 0.50 0.14 1 ( 0 ) 22.72 ( 0.46 ) 1.34 ( 0.08 ) 27.38 \n",
+ "13 50 1000 0 20.00 NaN 1 ( 0 ) 1 ( 0 ) 6 ( 0 ) 0.00 \n",
+ "14 100 1000 0 10.00 0.01 1 ( 0 ) 46.76 ( 0.67 ) 4.86 ( 0.09 ) 47.90 \n",
+ "15 500 1000 0 2.00 0.04 1 ( 0 ) 47.82 ( 0.66 ) 2.5 ( 0.1 ) 51.32 \n",
+ "16 1000 1000 0 1.00 0.07 1 ( 0 ) 47.14 ( 0.77 ) 1.46 ( 0.08 ) 51.68 \n",
+ " FDR ROC_mean FP_mean FN_mean\n",
+ "1 NaN 1 1.00 6.00 \n",
+ "2 0.55 1 1.97 4.62 \n",
+ "3 0.22 1 1.26 2.19 \n",
+ "4 0.13 1 0.86 1.12 \n",
+ "5 NaN 1 1.00 6.00 \n",
+ "6 0.77 1 4.29 4.79 \n",
+ "7 0.44 1 3.42 2.01 \n",
+ "8 0.38 1 3.39 1.14 \n",
+ "9 NaN 1 1.00 6.00 \n",
+ "10 0.94 1 24.68 4.50 \n",
+ "11 0.86 1 23.08 2.34 \n",
+ "12 0.83 1 22.72 1.34 \n",
+ "13 NaN 1 1.00 6.00 \n",
+ "14 0.98 1 46.76 4.86 \n",
+ "15 0.93 1 47.82 2.50 \n",
+ "16 0.91 1 47.14 1.46 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "result.table_toe\n",
+ "\n",
+ "## export\n",
+ "write.table(result.table_toe, '../results_summary_bin/sim_ind_RF_binary.txt', sep='\\t', row.names=F)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.4_sim_ind_compLasso_binary_update-checkpoint.ipynb b/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.4_sim_ind_compLasso_binary_update-checkpoint.ipynb
new file mode 100644
index 0000000..e765396
--- /dev/null
+++ b/simulations/notebooks_sim_bin/.ipynb_checkpoints/0.4_sim_ind_compLasso_binary_update-checkpoint.ipynb
@@ -0,0 +1,677 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### summarize compositional lasso results on Independent Simulation Scenarios for binary outcome"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dir = '/panfs/panfs1.ucsd.edu/panscratch/lij014/Stability_2020/sim_data'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dim.list = list()\n",
+ "size = c(50, 100, 500, 1000)\n",
+ "idx = 0\n",
+ "for (P in size){\n",
+ " for (N in size){\n",
+ " idx = idx + 1\n",
+ " dim.list[[idx]] = c(P=P, N=N)\n",
+ " }\n",
+ "}\n",
+ "\n",
+ "files = NULL\n",
+ "for (dim in dim.list){\n",
+ " p = dim[1]\n",
+ " n = dim[2]\n",
+ " files = cbind(files, paste0(dir, '/sim_independent_', paste('P', p, 'N', n, sep='_'), '.RData'))\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "16"
+ ],
+ "text/latex": [
+ "16"
+ ],
+ "text/markdown": [
+ "16"
+ ],
+ "text/plain": [
+ "[1] 16"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "length(files)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[1] \"indx: 1\"\n",
+ "[1] \"indx: 2\"\n",
+ "[1] \"indx: 3\"\n",
+ "[1] \"indx: 4\"\n",
+ "[1] \"indx: 5\"\n",
+ "[1] \"indx: 6\"\n",
+ "[1] \"indx: 7\"\n",
+ "[1] \"indx: 8\"\n",
+ "[1] \"indx: 9\"\n",
+ "[1] \"indx: 10\"\n",
+ "[1] \"indx: 11\"\n",
+ "[1] \"indx: 12\"\n",
+ "[1] \"indx: 13\"\n",
+ "[1] \"indx: 14\"\n",
+ "[1] \"indx: 15\"\n",
+ "[1] \"indx: 16\"\n"
+ ]
+ }
+ ],
+ "source": [
+ "avg_FDR = NULL\n",
+ "table_toe = NULL\n",
+ "tmp_num_select = rep(0, length(files))\n",
+ "for (i in 1:length(files)){\n",
+ " print(paste0('indx: ', i))\n",
+ " load(paste0(dir, '/binary_update/ind_GenCompLasso_binary_', i, '.RData')) \n",
+ " \n",
+ " table_toe = rbind(table_toe, results_ind_GenCompLasso[c('n', 'p', 'rou', 'FP', 'FN', 'ROC', 'Stab')])\n",
+ " tmp_num_select[i] = mean(rowSums(results_ind_GenCompLasso$Stab.table))\n",
+ " \n",
+ " # calculate FDR\n",
+ " load(file_name, dat <- new.env())\n",
+ " sub = dat$sim_array[[i]]\n",
+ " p = sub$p # take true values from 1st replicate of each simulated data\n",
+ " coef = sub$beta\n",
+ " coef.true = which(coef != 0)\n",
+ " \n",
+ " tt = results_ind_GenCompLasso$Stab.table\n",
+ " FDR = NULL # false positive rate\n",
+ " for (r in 1:nrow(tt)){\n",
+ " FDR = c(FDR, length(setdiff(which(tt[r, ] !=0), coef.true))/sum(tt[r, ]))\n",
+ "\n",
+ " }\n",
+ " \n",
+ " avg_FDR = c(avg_FDR, mean(FDR, na.rm=T))\n",
+ "}\n",
+ "table_toe = as.data.frame(table_toe)\n",
+ "table_toe$num_select = tmp_num_select\n",
+ "table_toe$FDR = round(avg_FDR,2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "\t | n | p | rou | FP | FN | ROC | Stab | num_select | FDR |
\n",
+ "\t | <list> | <list> | <list> | <list> | <list> | <list> | <list> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t1 | 50 | 50 | 0 | 16.38 ( 0.96 ) | 0.61 ( 0.06 ) | 0.98 ( 0 ) | 0.13 | 21.77 | 0.71 |
\n",
+ "\t2 | 100 | 50 | 0 | 7.94 ( 0.54 ) | 0.34 ( 0.06 ) | 0.93 ( 0 ) | 0.32 | 13.60 | 0.54 |
\n",
+ "\t3 | 500 | 50 | 0 | 2.67 ( 0.6 ) | 0.1 ( 0.03 ) | 0.91 ( 0 ) | 0.63 | 8.57 | 0.19 |
\n",
+ "\t4 | 1000 | 50 | 0 | 1.61 ( 0.48 ) | 0.04 ( 0.02 ) | 0.91 ( 0 ) | 0.75 | 7.57 | 0.09 |
\n",
+ "\t5 | 50 | 100 | 0 | 39.18 ( 2.14 ) | 0.63 ( 0.07 ) | 0.99 ( 0 ) | 0.06 | 44.55 | 0.85 |
\n",
+ "\t6 | 100 | 100 | 0 | 16.29 ( 1.16 ) | 0.47 ( 0.06 ) | 0.94 ( 0 ) | 0.2 | 21.82 | 0.70 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 9\n",
+ "\\begin{tabular}{r|lllllllll}\n",
+ " & n & p & rou & FP & FN & ROC & Stab & num\\_select & FDR\\\\\n",
+ " & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 50 & 50 & 0 & 16.38 ( 0.96 ) & 0.61 ( 0.06 ) & 0.98 ( 0 ) & 0.13 & 21.77 & 0.71\\\\\n",
+ "\t2 & 100 & 50 & 0 & 7.94 ( 0.54 ) & 0.34 ( 0.06 ) & 0.93 ( 0 ) & 0.32 & 13.60 & 0.54\\\\\n",
+ "\t3 & 500 & 50 & 0 & 2.67 ( 0.6 ) & 0.1 ( 0.03 ) & 0.91 ( 0 ) & 0.63 & 8.57 & 0.19\\\\\n",
+ "\t4 & 1000 & 50 & 0 & 1.61 ( 0.48 ) & 0.04 ( 0.02 ) & 0.91 ( 0 ) & 0.75 & 7.57 & 0.09\\\\\n",
+ "\t5 & 50 & 100 & 0 & 39.18 ( 2.14 ) & 0.63 ( 0.07 ) & 0.99 ( 0 ) & 0.06 & 44.55 & 0.85\\\\\n",
+ "\t6 & 100 & 100 & 0 & 16.29 ( 1.16 ) & 0.47 ( 0.06 ) & 0.94 ( 0 ) & 0.2 & 21.82 & 0.70\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "| | n <list> | p <list> | rou <list> | FP <list> | FN <list> | ROC <list> | Stab <list> | num_select <dbl> | FDR <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 1 | 50 | 50 | 0 | 16.38 ( 0.96 ) | 0.61 ( 0.06 ) | 0.98 ( 0 ) | 0.13 | 21.77 | 0.71 |\n",
+ "| 2 | 100 | 50 | 0 | 7.94 ( 0.54 ) | 0.34 ( 0.06 ) | 0.93 ( 0 ) | 0.32 | 13.60 | 0.54 |\n",
+ "| 3 | 500 | 50 | 0 | 2.67 ( 0.6 ) | 0.1 ( 0.03 ) | 0.91 ( 0 ) | 0.63 | 8.57 | 0.19 |\n",
+ "| 4 | 1000 | 50 | 0 | 1.61 ( 0.48 ) | 0.04 ( 0.02 ) | 0.91 ( 0 ) | 0.75 | 7.57 | 0.09 |\n",
+ "| 5 | 50 | 100 | 0 | 39.18 ( 2.14 ) | 0.63 ( 0.07 ) | 0.99 ( 0 ) | 0.06 | 44.55 | 0.85 |\n",
+ "| 6 | 100 | 100 | 0 | 16.29 ( 1.16 ) | 0.47 ( 0.06 ) | 0.94 ( 0 ) | 0.2 | 21.82 | 0.70 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " n p rou FP FN ROC Stab num_select FDR \n",
+ "1 50 50 0 16.38 ( 0.96 ) 0.61 ( 0.06 ) 0.98 ( 0 ) 0.13 21.77 0.71\n",
+ "2 100 50 0 7.94 ( 0.54 ) 0.34 ( 0.06 ) 0.93 ( 0 ) 0.32 13.60 0.54\n",
+ "3 500 50 0 2.67 ( 0.6 ) 0.1 ( 0.03 ) 0.91 ( 0 ) 0.63 8.57 0.19\n",
+ "4 1000 50 0 1.61 ( 0.48 ) 0.04 ( 0.02 ) 0.91 ( 0 ) 0.75 7.57 0.09\n",
+ "5 50 100 0 39.18 ( 2.14 ) 0.63 ( 0.07 ) 0.99 ( 0 ) 0.06 44.55 0.85\n",
+ "6 100 100 0 16.29 ( 1.16 ) 0.47 ( 0.06 ) 0.94 ( 0 ) 0.2 21.82 0.70"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "\t | n | p | rou | FP | FN | ROC | Stab | num_select | FDR |
\n",
+ "\t | <list> | <list> | <list> | <list> | <list> | <list> | <list> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\t11 | 500 | 500 | 0 | 8.32 ( 1.15 ) | 0.1 ( 0.03 ) | 0.91 ( 0 ) | 0.43 | 14.22 | 0.41 |
\n",
+ "\t12 | 1000 | 500 | 0 | 4.63 ( 1.57 ) | 0 ( 0 ) | 0.91 ( 0 ) | 0.6 | 10.63 | 0.21 |
\n",
+ "\t13 | 50 | 1000 | 0 | 406.32 ( 7.33 ) | 0.73 ( 0.07 ) | 1 ( 0 ) | 0.01 | 411.59 | 0.99 |
\n",
+ "\t14 | 100 | 1000 | 0 | 283.24 ( 7.31 ) | 0.49 ( 0.06 ) | 1 ( 0 ) | 0.02 | 288.75 | 0.98 |
\n",
+ "\t15 | 500 | 1000 | 0 | 9.4 ( 1.07 ) | 0.07 ( 0.03 ) | 0.91 ( 0 ) | 0.4 | 15.33 | 0.47 |
\n",
+ "\t16 | 1000 | 1000 | 0 | 1.91 ( 0.29 ) | 0.01 ( 0.01 ) | 0.91 ( 0 ) | 0.82 | 7.90 | 0.18 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 9\n",
+ "\\begin{tabular}{r|lllllllll}\n",
+ " & n & p & rou & FP & FN & ROC & Stab & num\\_select & FDR\\\\\n",
+ " & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t11 & 500 & 500 & 0 & 8.32 ( 1.15 ) & 0.1 ( 0.03 ) & 0.91 ( 0 ) & 0.43 & 14.22 & 0.41\\\\\n",
+ "\t12 & 1000 & 500 & 0 & 4.63 ( 1.57 ) & 0 ( 0 ) & 0.91 ( 0 ) & 0.6 & 10.63 & 0.21\\\\\n",
+ "\t13 & 50 & 1000 & 0 & 406.32 ( 7.33 ) & 0.73 ( 0.07 ) & 1 ( 0 ) & 0.01 & 411.59 & 0.99\\\\\n",
+ "\t14 & 100 & 1000 & 0 & 283.24 ( 7.31 ) & 0.49 ( 0.06 ) & 1 ( 0 ) & 0.02 & 288.75 & 0.98\\\\\n",
+ "\t15 & 500 & 1000 & 0 & 9.4 ( 1.07 ) & 0.07 ( 0.03 ) & 0.91 ( 0 ) & 0.4 & 15.33 & 0.47\\\\\n",
+ "\t16 & 1000 & 1000 & 0 & 1.91 ( 0.29 ) & 0.01 ( 0.01 ) & 0.91 ( 0 ) & 0.82 & 7.90 & 0.18\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 9\n",
+ "\n",
+ "| | n <list> | p <list> | rou <list> | FP <list> | FN <list> | ROC <list> | Stab <list> | num_select <dbl> | FDR <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|\n",
+ "| 11 | 500 | 500 | 0 | 8.32 ( 1.15 ) | 0.1 ( 0.03 ) | 0.91 ( 0 ) | 0.43 | 14.22 | 0.41 |\n",
+ "| 12 | 1000 | 500 | 0 | 4.63 ( 1.57 ) | 0 ( 0 ) | 0.91 ( 0 ) | 0.6 | 10.63 | 0.21 |\n",
+ "| 13 | 50 | 1000 | 0 | 406.32 ( 7.33 ) | 0.73 ( 0.07 ) | 1 ( 0 ) | 0.01 | 411.59 | 0.99 |\n",
+ "| 14 | 100 | 1000 | 0 | 283.24 ( 7.31 ) | 0.49 ( 0.06 ) | 1 ( 0 ) | 0.02 | 288.75 | 0.98 |\n",
+ "| 15 | 500 | 1000 | 0 | 9.4 ( 1.07 ) | 0.07 ( 0.03 ) | 0.91 ( 0 ) | 0.4 | 15.33 | 0.47 |\n",
+ "| 16 | 1000 | 1000 | 0 | 1.91 ( 0.29 ) | 0.01 ( 0.01 ) | 0.91 ( 0 ) | 0.82 | 7.90 | 0.18 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " n p rou FP FN ROC Stab num_select FDR \n",
+ "11 500 500 0 8.32 ( 1.15 ) 0.1 ( 0.03 ) 0.91 ( 0 ) 0.43 14.22 0.41\n",
+ "12 1000 500 0 4.63 ( 1.57 ) 0 ( 0 ) 0.91 ( 0 ) 0.6 10.63 0.21\n",
+ "13 50 1000 0 406.32 ( 7.33 ) 0.73 ( 0.07 ) 1 ( 0 ) 0.01 411.59 0.99\n",
+ "14 100 1000 0 283.24 ( 7.31 ) 0.49 ( 0.06 ) 1 ( 0 ) 0.02 288.75 0.98\n",
+ "15 500 1000 0 9.4 ( 1.07 ) 0.07 ( 0.03 ) 0.91 ( 0 ) 0.4 15.33 0.47\n",
+ "16 1000 1000 0 1.91 ( 0.29 ) 0.01 ( 0.01 ) 0.91 ( 0 ) 0.82 7.90 0.18"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "tail(table_toe)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "extract_numeric() is deprecated: please use readr::parse_number() instead\n",
+ "\n",
+ "extract_numeric() is deprecated: please use readr::parse_number() instead\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "# export result\n",
+ "result.table_toe <- apply(table_toe,2,as.character)\n",
+ "rownames(result.table_toe) = rownames(table_toe)\n",
+ "result.table_toe = as.data.frame(result.table_toe)\n",
+ "\n",
+ "# extract numbers only for 'n' & 'p'\n",
+ "result.table_toe$n = tidyr::extract_numeric(result.table_toe$n)\n",
+ "result.table_toe$p = tidyr::extract_numeric(result.table_toe$p)\n",
+ "result.table_toe$ratio = result.table_toe$p / result.table_toe$n\n",
+ "\n",
+ "result.table_toe = result.table_toe[c('n', 'p', 'rou', 'ratio', 'Stab', 'ROC', 'FP', 'FN', 'num_select', 'FDR')]\n",
+ "colnames(result.table_toe)[1:4] = c('N', 'P', 'Corr', 'Ratio')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# convert interested measurements to be numeric\n",
+ "result.table_toe$Stab = as.numeric(as.character(result.table_toe$Stab))\n",
+ "# result.table_toe$ROC_mean = as.numeric(substr(result.table_toe$ROC, start=1, stop=4))\n",
+ "# result.table_toe$FP_mean = as.numeric(substr(result.table_toe$FP, start=1, stop=4))\n",
+ "# result.table_toe$FN_mean = as.numeric(substr(result.table_toe$FN, start=1, stop=4))\n",
+ "# result.table_toe$FN_mean[is.na(result.table_toe$FN_mean)] = 0\n",
+ "result.table_toe$num_select = as.numeric(as.character(result.table_toe$num_select))\n",
+ "\n",
+ "result.table_toe$ROC_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$ROC))\n",
+ "result.table_toe$FP_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$FP))\n",
+ "result.table_toe$FN_mean = as.numeric(sub(\"\\\\(.*\", \"\", result.table_toe$FN))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "A data.frame: 0 × 13\n",
+ "\n",
+ "\tN | P | Corr | Ratio | Stab | ROC | FP | FN | num_select | FDR | ROC_mean | FP_mean | FN_mean |
\n",
+ "\t<dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | <fct> | <dbl> | <fct> | <dbl> | <dbl> | <dbl> |
\n",
+ "\n",
+ "\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 0 × 13\n",
+ "\\begin{tabular}{lllllllllllll}\n",
+ " N & P & Corr & Ratio & Stab & ROC & FP & FN & num\\_select & FDR & ROC\\_mean & FP\\_mean & FN\\_mean\\\\\n",
+ " & & & & & & & & &