Lexi René
This code implements a Bayesian approach to causal mediation analysis,
using a zero-one inflated beta regression model in STAN. To illustrate
the method, the JOBS II data is used. This dataset can be found in the R
package
mediation
.
The JOBS II study was a randomized field experiment that investigated
the efficacy of a job training intervention on unemployed workers. In
this dataset there are 899 observations, containing no missing values.
The potential outcome variable, depress2
, is a continuous measure of
depressive symptoms post-treatment (participation in job-skills
workshops). The mediator variable, job_seek
, is a continuous measure
of job search self-efficacy. The treatment variable, treat
is an
indicator variable for whether the participant was randomly selected for
the JOBS II training program.
For pre-processing of the data, the predictors are scaled using the R
function scale
. The scaled matrices are designed to include a column
for the intercept. Both the mediator and the outcome variables are
scaled to be between 0 and 1.
library(rstan)
rstan_options(auto_write = TRUE)
library(mediation)
library(ggplot2)
library(tictoc)
library(bayesplot)
library(kableExtra)
library(ggpubr)
library(forcats)
require(reticulate)
# library(shinystan)
library(tidyr)
library(grid)
data(jobs); invisible(names(jobs))
num_seed = 1810201
## create data
scaled_z <- scale(jobs[,c('econ_hard','sex','age')])
trt <- jobs$treat
## scale mediate and outcome
y <- normalize(jobs$depress2)
med <- normalize(jobs$job_seek)
qplot(y, geom = 'histogram', binwidth = .025, ylab = 'Frequency', xlab = 'Measure of Depression', fill=I('white'), col=I('blue')) + theme_bw() + theme(panel.grid.minor = element_blank())
qplot(med, geom = 'histogram', binwidth = .025, ylab = 'Frequency', xlab = 'Measure of Confidence/Self- Efficacy in Job Search', fill=I('white'), col=I('blue')) + theme_bw() + theme(panel.grid.minor = element_blank())
Using the potential (counterfactual) outcome framework, the causal effect of the job training program can be defined as the difference between two potential outcomes. One potential outcome is realized if the subject participates in the training program and the other potential outcome is realized if the subject does not participate.
Suppose we use to represent the measure of depression for the subject, , the measure of confidence/self efficacy in the job search for the subject, and , the binary indicator variable for the subject’s treatment/participation in the JOBS II training program; takes on the values of (participation in the training) or (otherwise). The depression level of subject is independent of subject (). In addition, since the treatment in the JOBS II study is randomized, is statistically independent of the potential outcomes; we can write this as . The observed value for the depression level can be denoted by , where , which can result in two potential values. For example, would be the observed depression level for subject , if subject actually participated in the training program; in this case, the unobserved outcome for subject is the level of depression if they did not participate in the training program. We will use to represent the potential level of depression that would result under the treatment status for subject . In addition, in causal mediation analysis, the potential outcome also depends on the mediator. In the context of this study, this implies that the level of job search self-efficacy can be affected by participation in the program, which can be represented by ; which also has two potential values and . The potential mediator value of subject are independent of the treatment status for subject (). Therefore, we will update the potential outcome to be denoted as and also note that the potential outcome for subject is independent of both the treatment status and the mediator value of subject ().
The statistical independence between the treatment assignment and the potential outcome allows us to compute the average causal effect as the observed mean difference between the treatment and control group:
Under the counterfactual/potential outcome framework, only one potential outcome of is observed, which we will denote by . Let be a vector of baseline covariate for each subject and be the support of the distribution of ; in addition, the support of is . To identify the effects of treatment and mediation, we assume sequential ignorability, as per Imai et al, by assuming the following two statements of conditional independence hold:
where and for , and, all and .
These ignorability assumptions are made sequentially. The first part of the assumption assumes that given the observed confounders, prior to treatment, the treatment is ignorable. In terms of statistical independence, the observed pre-treatment is independent of the potential outcomes and potential mediators. The second part of sequential ignorability states that the mediator is ignorable given the observed treatment and pre-treatment confounders; meaning that the potential outcome and mediator are unconfounded on the past observations and confounders.
The indirect effect of the treatment on the outcome, through the mediating variable is defined as the causal mediation effect (Imai et al., 2010), for :
The following definitions are defined as the effect of the treatment , on the outcome, through the mediation variable:
The average causal mediation effect is defined by:
The direct effect is defined by:
The average direct effect is defined by:
The total effect is defined by:
Lastly, the average total effect is defined by:
Under the assumptions from and , Imai et al. then showed that the distribution of the potential outcomes is nonparametrically identified:
This result allows us to estimate the potential outcome and mediators that we are unable to observe.
The density of a random variable with a beta distribution, where , can be re-parametrized (Ferrari & Cribari-Neto 2004) to be defined as:
Note for : denotes the gamma function, , and ; is a precision parameter, where for a fixed , there is an inverse relationship between and .
Using we
further assume the following regression models for both the response
variables, depress2
, ,
and job_seek
, , to
follow a zero-one inflated beta (ZOIB) distribution, as they lie within
the bounds [0,1]. The cumulative distribution function of the random
variable under a ZOIB
distribution is:
here is the probability that the response is equal to zero, is the probability that the response is equal to one, given the probability that the response is not equal to zero, is the expected value of the beta distribution, is the precision of the beta distribution, and and are shape parameters; and (Ferrari & Cribari-Neto 2004).
Ultimately, and , where .The moment for the density of and its’ variance can be written as:
For any random variable , where we assume
where the link function is a logit link function; , is the inverse of the link function that map values to a proportion between 0 and 1. is the Multivariate Normal distribution, and represents distribution of the coefficients, excluding the intercept; the intercept of every coefficient is assigned a uniform prior distribution.
For the mediator model, is a matrix containing the intercept, the baseline covariates, , and the treatment variable . For the outcome model, is a matrix containing the intercept, the baseline covariates, , the treatment variable , and the mediator variable under a specified treatment.
The steps above will be programmed in STAN.
The STAN model accepts the following values stored in a list:
- n - the total number of observations
- np - the total number of predictors,excluding the intercept and the treatment
- sim - the total number of iterations per chain
- y - the outcome variable scaled between 0 and 1; vector
- m - the mediator variable scaled between 0 and 1; vector
- a - the treatment variable; vector
- z - the data matrix of scaled predictors
- alpha_cov_m - the covariance for the normal prior set on alpha; used to model m
- gamma_cov_m - the covariance for the normal prior set on gamma; used to model m
- mu_cov_m - the covariance for the normal prior set on mu; used to model m
- phi_cov_m - the covariance for the normal prior set on phi; used to model m
- alpha_cov_y - the covariance for the normal prior set on alpha; used to model y
- gamma_cov_y - the covariance for the normal prior set on gamma; used to model y
- mu_cov_y - the covariance for the normal prior set on mu; used to model y
- phi_cov_y - the covariance for the normal prior set on phi; used to model y
jobs_data <-
list(n = nrow(scaled_z),
np = ncol(scaled_z),
sim = 100*round(nrow(scaled_z),-2),
y = y,
m = med,
a = trt,
z = scaled_z,
## cov_m: prior for coefficients of the mediator model; include treatment, do NOT include the intercept or mediator
alpha_cov_m = diag(5, ncol(scaled_z)+1), ## == np + 1
gamma_cov_m = diag(5, ncol(scaled_z)+1),
mu_cov_m = diag(5, ncol(scaled_z)+1),
phi_cov_m = diag(5, ncol(scaled_z)+1),
## cov_y: prior for coefficients of the outcome model; include the mediator and treatment, do not include the intercept
alpha_cov_y = diag(5, ncol(scaled_z)+2), ## == np + 2
gamma_cov_y = diag(5, ncol(scaled_z)+2),
mu_cov_y = diag(5, ncol(scaled_z)+2),
phi_cov_y = diag(5, ncol(scaled_z)+2)
)
This model will return:
- coef_mediator - alpha, gamma, mu, phi; coefficients for the mediator model (1:iterations,1:np,1:4)
- coef_outcome - alpha, gamma, mu, phi; coefficients for the outcome model (1:iterations,1:np+1,1:4)
- tau - total effect (length = total iterations)
- delta - causal effect (1:iterations, 2) where [a = 0, a = 1]
- zeta - direct effect (1:iterations, 2) where [a = 0, a = 1]
We can fit the model in Stan with the following code .
## functions{
## matrix calc_zoib_par(matrix x_f,matrix coef_f){
## vector[rows(x_f)] p_f;
## vector[rows(x_f)] q_f;
## matrix[rows(x_f), cols(coef_f)] x_theta;
## matrix[rows(x_f), cols(coef_f)] params_hold;
## matrix[rows(x_f), 2] new_alpha_gamma;
## matrix[rows(x_f), 2] p_and_q;
##
## x_theta = x_f * coef_f;
## x_theta[,1] = inv_logit(x_theta[,1]);
## x_theta[,2] = inv_logit(x_theta[,2]);
## x_theta[,3] = inv_logit(x_theta[,3]);
## x_theta[,4] = exp(x_theta[,4]);
## p_f = x_theta[,3] .* x_theta[,4];
## q_f = x_theta[,4] - p_f;
##
## p_and_q = append_col(p_f,q_f);
## new_alpha_gamma = append_col(x_theta[,1], x_theta[,2]);
## params_hold = append_col(new_alpha_gamma, p_and_q);
## return params_hold ;
## }
##
## matrix calc_pred(matrix param_pred, int num_trt_splits){
## matrix[(cols(param_pred)-num_trt_splits), rows(param_pred)] wt;
## int sim = rows(param_pred);
## int d = 1;
## int i = 1;
##
## while( d < cols(param_pred)-3){
## wt[d,] = to_row_vector(param_pred[,i]);
## wt[d+1,] = to_row_vector((rep_vector(1,sim)-param_pred[,i]) .* param_pred[,i+1]);
## wt[d+2,] = to_row_vector((rep_vector(1,sim)-param_pred[,i]) .* (rep_vector(1,sim)-param_pred[,i+1]));
## d += 3;
## i += 4;
## }
## return wt;
## }
## }
## data{
## int n;
## int np; // number of parameters excluding intercept and treatment
## int sim;
## vector<lower=0, upper=1>[n] y;
## vector<lower=0, upper=1>[n] m;
## vector[n] a; //treatment variable
## matrix[n, np] z;
## matrix[np+1, np+1] alpha_cov_m;
## matrix[np+1, np+1] gamma_cov_m;
## matrix[np+1, np+1] mu_cov_m;
## matrix[np+1, np+1] phi_cov_m;
## matrix[np+2, np+2] alpha_cov_y;
## matrix[np+2, np+2] gamma_cov_y;
## matrix[np+2, np+2] mu_cov_y;
## matrix[np+2, np+2] phi_cov_y;
## }
## transformed data{
## matrix[n, np+2] x; //ncol(z), trt, int
## matrix[n, np+3] x_out;
## x = append_col(append_col(rep_vector(1,n),z), a); //insert col for intercept of 1s
## x_out = append_col(x, m);
## }
## parameters{
## matrix[np+2, 4] coef_mediator;
## matrix[np+3, 4] coef_outcome;
## }
## model{
## matrix[n, 4] all_params_m;
## matrix[n, 4] all_params_y;
## all_params_m = calc_zoib_par(x, coef_mediator);
## all_params_y = calc_zoib_par(x_out, coef_outcome);
##
## // coefficients for mediator model; does not include the mediator
## coef_mediator[2:,1] ~ multi_normal(rep_vector(0,cols(x)-1), alpha_cov_m);
## coef_mediator[2:,2] ~ multi_normal(rep_vector(0,cols(x)-1), gamma_cov_m);
## coef_mediator[2:,3] ~ multi_normal(rep_vector(0,cols(x)-1), mu_cov_m);
## coef_mediator[2:,4] ~ multi_normal(rep_vector(0,cols(x)-1), phi_cov_m);
##
## // coefficients for outcome model; includes the mediator
## coef_outcome[2:,1] ~ multi_normal(rep_vector(0,cols(x_out)-1), alpha_cov_y);
## coef_outcome[2:,2] ~ multi_normal(rep_vector(0,cols(x_out)-1), gamma_cov_y);
## coef_outcome[2:,3] ~ multi_normal(rep_vector(0,cols(x_out)-1), mu_cov_y);
## coef_outcome[2:,4] ~ multi_normal(rep_vector(0,cols(x_out)-1), phi_cov_y);
##
## // zero one inflated beta likelihood
## for (i in 1:n) {
## if (y[i] == 0) {
## target += log(all_params_y[i,1]) ;
## } else if (y[i] == 1) {
## target += log1m(all_params_y[i,1]) + log(all_params_y[i,2]);
## } else {
## target += log1m(all_params_y[i,1]) + log1m(all_params_y[i,2]) + beta_lpdf(y[i] | all_params_y[i,3], all_params_y[i,4]);
## }
##
## if (m[i] == 0) {
## target += log(all_params_m[i,1]);
## } else if (m[i] == 1) {
## target += log1m(all_params_m[i,1]) + log(all_params_m[i,2]);
## } else {
## target += log1m(all_params_m[i,1]) + log1m(all_params_m[i,2]) + beta_lpdf(m[i] | all_params_m[i,3], all_params_m[i,4]);
## }
## }
## }
## generated quantities{
## real tau;
## vector[2] delta;
## vector[2] zeta;
## {
## int index;
## matrix[sim, 2] pred_m;
## matrix[sim, 4] pred_y;
## matrix[sim, 16] param_pred_y;
## matrix[sim, 8] param_pred_m;
## vector[rows(z)] wt;
## matrix[6, sim] wt_m; // piecewise density: three possible outcomes for density per trt
## matrix[12, sim] wt_y;
## matrix[sim, np+2] X_sample;
## matrix[sim, np+2] X_m0;
## matrix[sim, np+2] X_m1;
## matrix[sim, np+3] X_y0_m0;
## matrix[sim, np+3] X_y0_m1;
## matrix[sim, np+3] X_y1_m0;
## matrix[sim, np+3] X_y1_m1;
##
## wt = dirichlet_rng(rep_vector(1, rows(z)));
## for (j in 1:sim){
## index = categorical_rng(wt);
## X_sample[j,:] = x[index,:];
## }
##
## // assign treatment values for each sample
## X_m0 = X_sample;
## X_m0[:,cols(x)] = rep_vector(0, sim); // cols(x) == index for last col of x, which is treat
##
## X_m1 = X_sample;
## X_m1[:,cols(x)] = rep_vector(1, sim);
##
## // calculate new alpha, gamma, mu, phi
## param_pred_m[,1:4] = calc_zoib_par(X_m0, coef_mediator);
## param_pred_m[,5:8] = calc_zoib_par(X_m1, coef_mediator);
##
## wt_m = calc_pred(param_pred_m, 2);
##
## for(k in 1:sim){
## int index_wtm0 = categorical_rng(wt_m[1:3,k]);
## int index_wtm1 = categorical_rng(wt_m[4:6,k]);
##
## if (index_wtm0 == 1){ pred_m[k,1] = 0;}
## else if (index_wtm0 == 2){pred_m[k,1] = 1;}
## else if (index_wtm0 == 3){pred_m[k,1] = beta_rng(param_pred_m[k,3],param_pred_m[k,4]);}
##
## if (index_wtm1 == 1){ pred_m[k,2] = 0;}
## else if (index_wtm1 == 2){pred_m[k,2] = 1;}
## else if (index_wtm1 == 3){pred_m[k,2] = beta_rng(param_pred_m[k,7],param_pred_m[k,8]);}
## }
##
## X_y0_m0 = append_col(X_m0, pred_m[,1]);
## X_y0_m1 = append_col(X_m0, pred_m[,2]);
## X_y1_m1 = append_col(X_m1, pred_m[,2]);
## X_y1_m0 = append_col(X_m1, pred_m[,1]);
##
## //coef_mediator is np+2 x 4. it includes the mediator
## param_pred_y[,1:4] = calc_zoib_par(X_y0_m0, coef_outcome);
## param_pred_y[,5:8] = calc_zoib_par(X_y0_m1, coef_outcome);
## param_pred_y[,9:12] = calc_zoib_par(X_y1_m1, coef_outcome);
## param_pred_y[,13:16] = calc_zoib_par(X_y1_m0, coef_outcome);
##
## wt_y = calc_pred(param_pred_y, 4);
##
## for(h in 1:sim){
## int index_y0m0 = categorical_rng(wt_y[1:3,h]);
## int index_y0m1 = categorical_rng(wt_y[4:6,h]);
## int index_y1m1 = categorical_rng(wt_y[7:9,h]);
## int index_y1m0 = categorical_rng(wt_y[10:12,h]);
##
## if (index_y0m0 == 1){pred_y[h,1] = 0;}
## else if (index_y0m0 == 2){pred_y[h,1] = 1;}
## else if (index_y0m0 == 3){pred_y[h,1] = beta_rng(param_pred_y[h,3],param_pred_y[h,4]);}
##
## if (index_y0m1 == 1){ pred_y[h,2] = 0;}
## else if (index_y0m1 == 2){pred_y[h,2] = 1;}
## else if (index_y0m1 == 3){pred_y[h,2] = beta_rng(param_pred_y[h,7],param_pred_y[h,8]);}
##
## if (index_y1m1 == 1){ pred_y[h,3] = 0;}
## else if (index_y1m1 == 2){pred_y[h,3] = 1;}
## else if (index_y1m1 == 3){pred_y[h,3] = beta_rng(param_pred_y[h,11],param_pred_y[h,12]);}
##
## if (index_y1m0 == 1){ pred_y[h,4] = 0;}
## else if (index_y1m0 == 2){pred_y[h,4] = 1;}
## else if (index_y1m0 == 3){pred_y[h,4] = beta_rng(param_pred_y[h,15],param_pred_y[h,16]);}
## }
##
## delta[1] = mean(pred_y[:,2]) - mean(pred_y[:,1]);
## delta[2] = mean(pred_y[:,3]) - mean(pred_y[:,4]);
## zeta[1] = mean(pred_y[:,4]) - mean(pred_y[:,1]);
## zeta[2] = mean(pred_y[:,3]) - mean(pred_y[:,2]);
## tau = mean(pred_y[:,3]) - mean(pred_y[:,1]);
##
## } // end of local variables
## } //end of generated quantities
## $coef_mediator
## [1] 4000 5 4
##
## $coef_outcome
## [1] 4000 6 4
##
## $tau
## [1] 4000
##
## $delta
## [1] 4000 2
##
## $zeta
## [1] 4000 2
##
## $lp__
## [1] 4000
When assessing STAN output, one of the things that you want to check is whether the chains are converging, and that they are converging to the same area. Some of the recommended convergence checks include monitoring the potential scale reduction (PSR) factor, statistic, and using visual checks, e.g. traceplots. evaluates the mixing of the chains by comparing the variation between the chains to the variation within the chains. “The condition of being ‘near’ 1 depends on the problem at hand, but we generally have been satisfied with setting 1.1 as a threshold” (Gelman et al., 2004).
Additionally, since the Markov Chain Monte Carlo (MCMC) does not return independent draws, the simulations within each chain will show some level of autocorrelation. This autocorrelation increases the uncertainty of the estimation of posterior quantities. The amount by which this autocorrelation increases in estimates can be measured by the effective sample size (ESS), , which should be large so that it can provide a measure of precision; is the ‘effective number of independent simulation draws’.
Assessing the Markov Chain Monte Carlo
Min. |
1st Qu. |
Median |
Mean |
3rd Qu. |
Max. |
|
---|---|---|---|---|---|---|
R̂ |
0.9992 |
0.9995 |
0.9999 |
1.0000 |
1.0003 |
1.0027 |
1868 |
2812 |
4272 |
3896 |
4599 |
6478 |
Bayesian and Frequentist Estimated Effects Results
Estimated Causal Effects |
Effect |
Est. |
SD |
Z-Score |
p-value |
95% LCI |
95% UCI |
---|---|---|---|---|---|---|---|
Bayesian |
δ̅(0) |
-0.0129 |
0.0157 |
-0.8173 |
0.4138 |
-0.0449 |
0.0173 |
δ̅(1) |
-0.0119 |
0.0150 |
-0.7944 |
0.4270 |
-0.0416 |
0.0169 |
|
ζ̅(0) |
-0.0342 |
0.0442 |
-0.7737 |
0.4391 |
-0.1226 |
0.0503 |
|
ζ̅(1) |
-0.0333 |
0.0439 |
-0.7570 |
0.4490 |
-0.1202 |
0.0513 |
|
τ̅ |
-0.0461 |
0.0460 |
-1.0024 |
0.3161 |
-0.1387 |
0.0413 |
|
Frequentist |
δ̅ |
-0.0157 |
0.0117 |
-1.3400 |
0.1800 |
-0.0411 |
0.0074 |
ζ̅ |
-0.0403 |
0.0483 |
-0.8300 |
0.4040 |
-0.1191 |
0.0539 |
|
τ̅ |
-0.0560 |
0.0472 |
-1.1900 |
0.2360 |
-0.1347 |
0.0406 |
- Ferrari, S., and Cribari-Neto, F. (2004). “Beta regression for modelling rates and proportions,” Journal of Applied Statistics, 31(7), 799-815
- Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (2004). Bayesian data analysis (2nd ed.)
- Imai, K., Keele, L., and Tingley, D. (2010),“A General Approach to Causal Mediation Analysis,”Psychological Methods, 15(4), 309–334