Skip to content

code for simulation studies and experimental microbiome data applications in Stability manuscript

License

Notifications You must be signed in to change notification settings

knightlab-analyses/stability-analyses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stability-analyses

This set of codes are used for reproducing all the simulation studies and experimental microbiome data applications in Stability manuscript.

I. General code:

code_method folder: contain codes to reproduce simulation results for continuous outcomes

getStability.R: function to calculate Stability Index

cv_method.R: code for 4 selected feature selection methods with user-defined parameter grids and cross-validations for parameter tuning when applied to continuous outcomes

cv_method_binary_update.R: code for 4 selected feature selection methods with user-defined parameter grids and cross-validations for parameter tuning when applied to binary outcomes

stab_data_applications.R: function to perform hypothesis testing using bootstrap for continuous outcomes

stab_data_applications_binary.R: function to perform hypothesis testing using bootstrap for binary outcomes

bootstrap_test_compLasso_rf.R: general functions for comparing feature selection methods using hypothesis testing based on bootstrap when applied to continuous outcomes

bootstrap_test_compLasso_rf_binary.R: general functions for comparing feature selection methods using hypothesis testing based on bootstrap when applied to binary outcomes

source code for compositional lasso (continuous outcome) is available at: https://www.math.pku.edu.cn/teachers/linw/software.html
source code for compositional lasso (binary outcome) is available at: https://github.com/UVic-omics/Microbiome-Variable-Selection

II. Simulation part (within simulations folder):

sim_data_generation folder: contain codes to generate simulated data

sim_dat_ind_toeplitz: code to generate simulated data with Independent and Toeplitz correlation designs
sim_dat_block.R: code to generate simulated data with Block correlation design
run_sim_data.sh: bash commands for running simulation data generation code on HPC

code_sim_cts folder: contain codes to reproduce simulation results for continuous outcomes

cv_sim_apply.R: general functions for applying selected feature selection methods to simulated data when applied to continuous outcomes

1. compute Stability and MSE for different simulation scenarios
ind_results.R: code for comparing 3 methods (lasso, elastic net, random forests) in simulated data with Independent design and continuous outcomes
toe_results.R: code for comparing 3 methods (lasso, elastic net, random forests) in simulated data with Toeplitz design and continuous outcomes
block_results.R: code for comparing 3 methods (lasso, elastic net, random forests) in simulated data with Block design and continuous outcomes
CL_sim_apply.R: code for obtaining results for compositional lasso in all simulation correlation designs with continuous outcomes

2. hypothesis testing with bootstrap for selected simulation scenarios
boot_CL_testing.R: code for calculating bootstrapped confidence interval for compositional lasso method in simulated data with continous outcomes
boot_RF_testing.R: code for calculating bootstrapped confidence interval for random forests method in simulated data with continous outcomes

3. bash commands
run_sim_cts.sh: bash commands for running simulation code for continous outcomes on HPC

code_sim_bin folder: contain codes to reproduce simulation results for binary outcomes

cv_sim_apply_binary_update.R: general functions for applying selected feature selection methods to simulated data when applied to binary outcomes

1. compute Stability and AUC for different simulation scenarios
ind_results_binary_update.R: code for comparing all 4 methods in simulated data with Independent design and binary outcomes
toe_results_binary_update.R: code for comparing all 4 methods in simulated data with Toeplitz design and binary outcomes
block_results_binary_update.R: code for comparing all 4 methods in simulated data with Block design and binary outcomes

2. hypothesis testing with bootstrap for selected simulation scenarios
boot_sim_binary.R: code for calculating bootstrapped confidence interval for compositional lasso and random forests methods in simulated data with binary outcomes

3. bash commands
run_sim_bin.sh: bash commands for running simulation code for binary outcomes on HPC

notebooks_sim_cts folder: contain notebooks (R) to summarize simulation results for continuous outcome

notebooks_sim_bin folder: contain notebooks (R) to summarize simulation results for binary outcome

results_summary_cts folder: contain outputs of tables from notebooks in notebooks_sim_cts folder

results_summary_bin folder: contain outputs of tables from notebooks in notebooks_sim_bin folder

figures_combined folder: contain figures generated for both continous and binary outcomes based on notebook 6_make_figures_combined in notebooks_sim_bin folder

III. application part (within data_application folder):

code_cts folder: contain code for real data applications to BMI & soil datasets for continuous outcomes

code_bin folder: contain code for real data applications to BMI & soil datasets for binary outcomes

notebooks_applications folder: contain notebooks (R) to summarize microbiome application results for continuous and binary outcomes

88soils folder: contain data and application results for soil datast

BMI folder: contain data and application results for BMI datast