Generate causal forest data — generate_causal

The following DGPs are available for benchmarking purposes:

"simple": tau = max(X1, 0), e = 0.4 + 0.2 * 1(X1 > 0).
"aw1": equation (27) of https://arxiv.org/pdf/1510.04342.pdf
"aw2": equation (28) of https://arxiv.org/pdf/1510.04342.pdf
"aw3": confounding is from "aw1" and tau is from "aw2"
"aw3reverse": Same as aw3, but HTEs anticorrelated with baseline
"ai1": "Setup 1" from section 6 of https://arxiv.org/pdf/1504.01132.pdf
"ai2": "Setup 2" from section 6 of https://arxiv.org/pdf/1504.01132.pdf
"kunzel": "Simulation 1" from A.1 in https://arxiv.org/pdf/1706.03461.pdf
"nw1": "Setup A" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
"nw2": "Setup B" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
"nw3": "Setup C" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
"nw4": "Setup D" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

generate_causal_data(
  n,
  p,
  sigma.m = 1,
  sigma.tau = 0.1,
  sigma.noise = 1,
  dgp = c("simple", "aw1", "aw2", "aw3", "aw3reverse", "ai1", "ai2", "kunzel", "nw1",
    "nw2", "nw3", "nw4")
)

Arguments

n	The number of observations.
p	The number of covariates (note: the minimum varies by DGP).
sigma.m	The standard deviation of the unconditional mean of Y. Default is 1.
sigma.tau	The standard deviation of the treatment effect. Default is 0.1.
sigma.noise	The conditional variance of Y. Default is 1.
dgp	The kind of dgp. Default is "simple".

Value

A list consisting of: X, Y, W, tau, m, e, dgp.

Details

Each DGP is parameterized by X: observables, m: conditional mean of Y, tau: treatment effect, e: propensity scores, V: conditional variance of Y.

The following rescaled data is returned m = m / sd(m) * sigma.m, tau = tau / sd(tau) * sigma.tau, V = V / mean(V) * sigma.noise^2, W = rbinom(e), Y = m + (W - e) * tau + sqrt(V) + rnorm(n).

Examples

# \donttest{
# Generate simple benchmark data
data <- generate_causal_data(100, 5, dgp = "simple")
# Generate data from Wager and Athey (2018)
data <- generate_causal_data(100, 5, dgp = "aw1")
data2 <- generate_causal_data(100, 5, dgp = "aw2")
# }