In the case of a causal forest with binary treatment, we provide estimates of one of the following:

  • The average treatment effect (target.sample = all): E[Y(1) - Y(0)]

  • The average treatment effect on the treated (target.sample = treated): E[Y(1) - Y(0) | Wi = 1]

  • The average treatment effect on the controls (target.sample = control): E[Y(1) - Y(0) | Wi = 0]

  • The overlap-weighted average treatment effect (target.sample = overlap): E[e(X) (1 - e(X)) (Y(1) - Y(0))] / E[e(X) (1 - e(X)), where e(x) = P[Wi = 1 | Xi = x].

This last estimand is recommended by Li, Morgan, and Zaslavsky (2018) in case of poor overlap (i.e., when the propensities e(x) may be very close to 0 or 1), as it doesn't involve dividing by estimated propensities.

  target.sample = c("all", "treated", "control", "overlap"),
  method = c("AIPW", "TMLE"),
  subset = NULL,
  debiasing.weights = NULL,
  compliance.score = NULL,
  num.trees.for.weights = 500



The trained forest.


Which sample to aggregate treatment effects over. Note: Options other than "all" are only currently implemented for causal forests.


Method used for doubly robust inference. Can be either augmented inverse-propensity weighting (AIPW), or targeted maximum likelihood estimation (TMLE). Note: TMLE is currently only implemented for causal forests with a binary treatment.


Specifies subset of the training examples over which we estimate the ATE. WARNING: For valid statistical performance, the subset should be defined only using features Xi, not using the treatment Wi or the outcome Yi.


A vector of length n (or the subset length) of debiasing weights. If NULL (default) these are obtained via the appropriate doubly robust score construction, e.g., in the case of causal_forests with a binary treatment, they are obtained via inverse-propensity weighting.


Only used with instrumental forests. An estimate of the causal effect of Z on W, i.e., Delta(X) = E[W | X, Z = 1] - E[W | X, Z = 0], which can then be used to produce debiasing.weights. If not provided, this is estimated via an auxiliary causal forest.


In some cases (e.g., with causal forests with a continuous treatment), we need to train auxiliary forests to learn debiasing weights. This is the number of trees used for this task. Note: this argument is only used when debiasing.weights = NULL.


An estimate of the average treatment effect, along with standard error.


In the case of a causal forest with continuous treatment, we provide estimates of the average partial effect, i.e., E[Cov[W, Y | X] / Var[W | X]]. In the case of a binary treatment, the average partial effect matches the average treatment effect. Computing the average partial effect is somewhat more involved, as the relevant doubly robust scores require an estimate of Var[Wi | Xi = x]. By default, we get such estimates by training an auxiliary forest; however, these weights can also be passed manually by specifying debiasing.weights.

In the case of instrumental forests with a binary treatment, we provide an estimate of the the Average (Conditional) Local Average Treatment (ACLATE). Specifically, given an outcome Y, treatment W and instrument Z, the (conditional) local average treatment effect is tau(x) = Cov[Y, Z | X = x] / Cov[W, Z | X = x]. This is the quantity that is estimated with an instrumental forest. It can be intepreted causally in various ways. Given a homogeneity assumption, tau(x) is simply the CATE at x. When W is binary and there are no "defiers", Imbens and Angrist (1994) show that tau(x) can be interpreted as an average treatment effect on compliers. This function provides and estimate of tau = E[tau(X)]. See Chernozhukov et al. (2016) for a discussion, and Section 5.2 of Athey and Wager (2021) for an example using forests.

If clusters are specified, then each unit gets equal weight by default. For example, if there are 10 clusters with 1 unit each and per-cluster ATE = 1, and there are 10 clusters with 19 units each and per-cluster ATE = 0, then the overall ATE is 0.05 (additional sample.weights allow for custom weighting). If equalize.cluster.weights = TRUE each cluster gets equal weight and the overall ATE is 0.5.


Athey, Susan, and Stefan Wager. "Policy Learning With Observational Data." Econometrica 89.1 (2021): 133-161.

Chernozhukov, Victor, Juan Carlos Escanciano, Hidehiko Ichimura, Whitney K. Newey, and James M. Robins. "Locally robust semiparametric estimation." Econometrica 90(4), 2022.

Imbens, Guido W., and Joshua D. Angrist. "Identification and Estimation of Local Average Treatment Effects." Econometrica 62(2), 1994.

Li, Fan, Kari Lock Morgan, and Alan M. Zaslavsky. "Balancing covariates via propensity score weighting." Journal of the American Statistical Association 113(521), 2018.

Mayer, Imke, Erik Sverdrup, Tobias Gauss, Jean-Denis Moyer, Stefan Wager, and Julie Josse. "Doubly robust treatment effect estimation with missing attributes." Annals of Applied Statistics, 14(3), 2020.

Robins, James M., and Andrea Rotnitzky. "Semiparametric efficiency in multivariate regression models with missing data." Journal of the American Statistical Association 90(429), 1995.


# \donttest{ # Train a causal forest. n <- 50 p <- 10 X <- matrix(rnorm(n * p), n, p) W <- rbinom(n, 1, 0.5) Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n) c.forest <- causal_forest(X, Y, W) # Predict using the forest. X.test <- matrix(0, 101, p) X.test[, 1] <- seq(-2, 2, length.out = 101) c.pred <- predict(c.forest, X.test) # Estimate the conditional average treatment effect on the full sample (CATE). average_treatment_effect(c.forest, target.sample = "all")
#> estimate std.err #> -0.09675011 0.40735444
# Estimate the conditional average treatment effect on the treated sample (CATT). # We don't expect much difference between the CATE and the CATT in this example, # since treatment assignment was randomized. average_treatment_effect(c.forest, target.sample = "treated")
#> estimate std.err #> -0.06564791 0.40577566
# Estimate the conditional average treatment effect on samples with positive X[,1]. average_treatment_effect(c.forest, target.sample = "all", subset = X[, 1] > 0)
#> estimate std.err #> 0.6126443 0.5750878
# Example for causal forests with a continuous treatment. n <- 2000 p <- 10 X <- matrix(rnorm(n * p), n, p) W <- rbinom(n, 1, 1 / (1 + exp(-X[, 2]))) + rnorm(n) Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n) tau.forest <- causal_forest(X, Y, W) tau.hat <- predict(tau.forest) average_treatment_effect(tau.forest)
#> estimate std.err #> 0.41722579 0.02495619
average_treatment_effect(tau.forest, subset = X[, 1] > 0)
#> estimate std.err #> 0.78518987 0.03564022
# }