The following DGPs are available for benchmarking purposes:

  • "simple": tau = max(X1, 0), e = 0.4 + 0.2 * 1(X1 > 0).

  • "aw1": equation (27) of https://arxiv.org/pdf/1510.04342.pdf

  • "aw2": equation (28) of https://arxiv.org/pdf/1510.04342.pdf

  • "aw3": confounding is from "aw1" and tau is from "aw2"

  • "aw3reverse": Same as aw3, but HTEs anticorrelated with baseline

  • "ai1": "Setup 1" from section 6 of https://arxiv.org/pdf/1504.01132.pdf

  • "ai2": "Setup 2" from section 6 of https://arxiv.org/pdf/1504.01132.pdf

  • "kunzel": "Simulation 1" from A.1 in https://arxiv.org/pdf/1706.03461.pdf

  • "nw1": "Setup A" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

  • "nw2": "Setup B" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

  • "nw3": "Setup C" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

  • "nw4": "Setup D" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

generate_causal_data(
  n,
  p,
  sigma.m = 1,
  sigma.tau = 0.1,
  sigma.noise = 1,
  dgp = c("simple", "aw1", "aw2", "aw3", "aw3reverse", "ai1", "ai2", "kunzel", "nw1",
    "nw2", "nw3", "nw4")
)

Arguments

n

The number of observations.

p

The number of covariates (note: the minimum varies by DGP).

sigma.m

The standard deviation of the unconditional mean of Y. Default is 1.

sigma.tau

The standard deviation of the treatment effect. Default is 0.1.

sigma.noise

The conditional variance of Y. Default is 1.

dgp

The kind of dgp. Default is "simple".

Value

A list consisting of: X, Y, W, tau, m, e, dgp.

Details

Each DGP is parameterized by X: observables, m: conditional mean of Y, tau: treatment effect, e: propensity scores, V: conditional variance of Y.

The following rescaled data is returned m = m / sd(m) * sigma.m, tau = tau / sd(tau) * sigma.tau, V = V / mean(V) * sigma.noise^2, W = rbinom(e), Y = m + (W - e) * tau + sqrt(V) + rnorm(n).

Examples

# \donttest{ # Generate simple benchmark data data <- generate_causal_data(100, 5, dgp = "simple") # Generate data from Wager and Athey (2018) data <- generate_causal_data(100, 5, dgp = "aw1") data2 <- generate_causal_data(100, 5, dgp = "aw2") # }