
Time-varying Mediation Analysis with Binary Exposure and Single Mediator
Source:vignettes/single_mediator.Rmd
single_mediator.Rmd
This example illustrates how to use tvmedg()
function to
conduct time-varying mediation analysis with binary exposure, single
binary mediator, and binary outcome. We will use simulated
monthly longitudinal data, where each row represents
one month of follow-up per individual. The dataset includes the
following variables:
Exposure:
Mediator:
Outcome:
Time-varying confounders: binary, and continuous
Time-fixed confounders:
age
,sex
,ow
, andrisk
Time index:
mm
, indicating the month of follow-up
Since, g-computation is computationally intensive, we will leverage
parallel computing using the foreach
and
doParallel
packages to improve efficiency
library(tvmedg)
head(sim_data)
#> id mm Ap Mp L1 L2 L3 Yp age sex ow risk lastid
#> 1 1 1 0 0 0 100.0000 80.00000 0 16.52949 0 0 0 0
#> 2 1 2 0 0 0 125.1296 102.02885 0 16.52949 0 0 0 0
#> 3 1 3 0 0 0 116.4990 98.99688 0 16.52949 0 0 0 0
#> 4 1 4 0 0 0 131.9247 104.07117 0 16.52949 0 0 0 0
#> 5 1 5 0 0 0 109.5959 103.24813 0 16.52949 0 0 0 0
#> 6 1 6 0 0 0 124.2403 95.82793 0 16.52949 0 0 0 0
Point estimates
First example is when the mediator precedes the time-varying confounders (). This corresponds to the following DAG

DAG for the context that mediator precedes the time-varying confounders
To specify the causal ordering
,
set the argument tvar_to_med = FALSE
(the default). In
contrast, if the time-varying confounder is assumed to precede the
mediator
(),
use tvar_to_med = TRUE
.
The current version of tvmedg
supports lag-based
functional forms for variable histories. In this example, we
specify 2 lags. By default, logistic regression is used for binary
variables, and linear regression is used for continuous variables. For
continuous variables and time indices, spline-based
models can be used by specifying: sp_list
,
sp_type
, and sp_df
. This example sets a cubic
spline with 3 degrees of freedom for the month variable mm
.
Additional variables, spline types (e.g., “ns”), and df can be added to
these arguments to model nonlinear effects. Lastly, we run the algorithm
using 1000 Monte Carlo samples.
library(doParallel)
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
cl <- makeCluster(2)
registerDoParallel(cl)
med_MtoL <- tvmedg(data = sim_data,
id = "id",
basec = c("age","sex","ow","risk"),
expo = c("Ap"),
med = c("Mp"),
tvar = c("L1","L2","L3"),
outc = c("Yp"),
time = c("mm"),
lag = 2,
tvar_to_med = F,
cont_exp = F,
mreg = "binomial",
lreg = c("binomial","gaussian","gaussian"),
yreg = "binomial",
sp_list = c("mm"),
sp_type = c("bs"),
sp_df = c(3),
followup = 12,
seed = 123,
montecarlo = 1000,
boot = F,
parallel = TRUE)
#> Q(a,a): 0.23
#> Q(a,a*): 0.045
#> Q(a*,a*): 0.01
#> Indirect (rIE): 0.185
#> Direct (rDE): 0.035
#> Total (rTE): 0.22
#> Proportional explain: 0.841
#> Total time elapsed: 6.367038 mins
stopCluster(cl)
Confidence interval
tvmedg
implements non-parametric
bootstrap to obtain confidence intervals for effect estimates.
This can be enabled by setting boot = TRUE
, along with
specifying the number of bootstrap samples via nboot
, and
the desired confidence level ci
(e.g.,
ci = 0.95
).
For illustration purposes, and to reduce computational burden in this example, we run only 5 bootstrap iterations.
cl <- makeCluster(2)
registerDoParallel(cl)
med_MtoL_ci <- tvmedg(data = sim_data,
id = "id",
basec = c("age","sex","ow","risk"),
expo = c("Ap"),
med = c("Mp"),
tvar = c("L1","L2","L3"),
outc = c("Yp"),
time = c("mm"),
lag = 2,
tvar_to_med = F,
cont_exp = F,
mreg = "binomial",
lreg = c("binomial","gaussian","gaussian"),
yreg = "binomial",
sp_list = c("mm"),
sp_type = c("bs"),
sp_df = c(3),
followup = 12,
seed = 123,
montecarlo = 1000,
boot = T,
nboot = 5,
ci = .95,
parallel = TRUE)
#> Q(a,a): 0.815 (0, 0.921)
#> Q(a,a*): 0.791 (0, 0.906)
#> Q(a*,a*): 0 (0, 0.27)
#> Indirect (rIE): 0.024 (-0.015, 0.138)
#> Direct (rDE): 0.791 (-0.267, 0.905)
#> Total (rTE): 0.815 (-0.206, 0.92)
#> Proportional explain: 0.029 (-0.293, 0.694)
#> Total time elapsed: 32.26849 mins
stopCluster(cl)
Validation
Diagnostic tools for checking the g-computation algorithm are limited. However, we should expect the observed outcome trajectory to lie between the predicted values of , , and over time.
To aid in this evaluation, tvmedg
provides two
diagnostic plots:
A plot comparing the observed outcome with the simulated outcomes under the three exposure–mediator scenarios across time.
A plot of the cumulative outcomes over time under each scenario.
Time-varying plot
plot(med_MtoL, "tvY")
Cumulative plot
plot(med_MtoL, "cumY")