Consider a rule S(Xi) assigning scores to units in decreasing order of treatment prioritization. In the case of a forest with binary treatment, we provide estimates of the following, where 1/n <= q <= 1 represents the fraction of treated units:

  • The Rank-Weighted Average Treatment Effect (RATE): \(\int_{0}^{1} alpha(q) TOC(q; S) dq\), where alpha is a weighting method corresponding to either `AUTOC` or `QINI`.

  • The Targeting Operating Characteristic (TOC): \(E[Y(1) - Y(0) | F(S(Xi)) >= 1 - q] - E[Y(1) - Y(0)]\), where F(.) is the distribution function of S(Xi).

The Targeting Operating Characteristic (TOC) is a curve comparing the benefit of treating only a certain fraction q of units (as prioritized by S(Xi)), to the overall average treatment effect. The Rank-Weighted Average Treatment Effect (RATE) is a weighted sum of this curve, and is a measure designed to identify prioritization rules that effectively targets treatment (and can thus be used to test for the presence of heterogeneous treatment effects).

rank_average_treatment_effect(
  forest,
  priorities,
  target = c("AUTOC", "QINI"),
  q = seq(0.1, 1, by = 0.1),
  R = 200,
  subset = NULL,
  debiasing.weights = NULL,
  compliance.score = NULL,
  num.trees.for.weights = 500
)

Arguments

forest

The evaluation set forest.

priorities

Treatment prioritization scores S(Xi) for the units used to train the evaluation forest. Two prioritization rules can be compared by supplying a two-column array or named list of priorities. WARNING: for valid statistical performance, these scores should be constructed independently from the evaluation forest training data.

target

The type of RATE estimate, options are "AUTOC" (exhibits greater power when only a small subset of the population experience nontrivial heterogeneous treatment effects) or "QINI" (exhibits greater power when the entire population experience diffuse or substantial heterogeneous treatment effects). Default is "AUTOC".

q

The grid q to compute the TOC curve on. Default is (10%, 20%, ..., 100%).

R

Number of bootstrap replicates for SEs. Default is 200.

subset

Specifies subset of the training examples over which we estimate the RATE. WARNING: For valid statistical performance, the subset should be defined only using features Xi, not using the treatment Wi or the outcome Yi.

debiasing.weights

A vector of length n (or the subset length) of debiasing weights. If NULL (default) these are obtained via the appropriate doubly robust score construction, e.g., in the case of causal_forests with a binary treatment, they are obtained via inverse-propensity weighting.

compliance.score

Only used with instrumental forests. An estimate of the causal effect of Z on W, i.e., Delta(X) = E[W | X, Z = 1] - E[W | X, Z = 0], which can then be used to produce debiasing.weights. If not provided, this is estimated via an auxiliary causal forest.

num.trees.for.weights

In some cases (e.g., with causal forests with a continuous treatment), we need to train auxiliary forests to learn debiasing weights. This is the number of trees used for this task. Note: this argument is only used when debiasing.weights = NULL.

Value

A list of class `rank_average_treatment_effect` with elements

  • estimate: the RATE estimate.

  • std.err: bootstrapped standard error of RATE.

  • target: the type of estimate.

  • TOC: a data.frame with the Targeting Operator Characteristic curve estimated on grid q, along with bootstrapped SEs.

References

Yadlowsky, Steve, Scott Fleming, Nigam Shah, Emma Brunskill, and Stefan Wager. "Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects." arXiv preprint arXiv:2111.07966, 2021.

Examples

# \donttest{ # Train a causal forest to estimate a CATE based priority ranking n <- 1500 p <- 5 X <- matrix(rnorm(n * p), n, p) W <- rbinom(n, 1, 0.5) event.prob <- 1 / (1 + exp(2*(pmax(2*X[, 1], 0) * W - X[, 2]))) Y <- rbinom(n, 1, event.prob) train <- sample(1:n, n / 2) cf.priority <- causal_forest(X[train, ], Y[train], W[train]) # Compute a prioritization based on estimated treatment effects. # -1: in this example the treatment should reduce the risk of an event occuring. priority.cate <- -1 * predict(cf.priority, X[-train, ])$predictions # Estimate AUTOC on held out data. cf.eval <- causal_forest(X[-train, ], Y[-train], W[-train]) rate <- rank_average_treatment_effect(cf.eval, priority.cate) rate
#> estimate std.err target #> -0.2363237 0.0259136 priorities | AUTOC
# Plot the Targeting Operator Characteristic curve. plot(rate)
# Compute a prioritization based on baseline risk. rf.risk <- regression_forest(X[train[W[train] == 0], ], Y[train[W[train] == 0]]) priority.risk <- predict(rf.risk, X[-train, ])$predictions # Test if two RATEs are equal. rate.diff <- rank_average_treatment_effect(cf.eval, cbind(priority.cate, priority.risk)) rate.diff
#> estimate std.err target #> -0.23632366 0.02387460 priority.cate | AUTOC #> -0.06582898 0.02564700 priority.risk | AUTOC #> -0.17049468 0.03241963 priority.cate - priority.risk | AUTOC
# Construct a 95 % confidence interval. # (a significant result suggests that there are HTEs and that the prioritization rule is effective # at stratifying the sample based on them. Conversely, a non-significant result suggests that either # there are no HTEs or the treatment prioritization rule does not predict them effectively.) rate.diff$estimate + data.frame(lower = -1.96 * rate.diff$std.err, upper = 1.96 * rate.diff$std.err, row.names = rate.diff$target)
#> lower upper #> priority.cate | AUTOC -0.2831179 -0.18952943 #> priority.risk | AUTOC -0.1160971 -0.01556087 #> priority.cate - priority.risk | AUTOC -0.2340372 -0.10695220
# }