Consider a rule $$S(X_i)$$ assigning scores to units in decreasing order of treatment prioritization. In the case of a forest with binary treatment, we provide estimates of the following, where 1/n <= q <= 1 represents the fraction of treated units:

• The Rank-Weighted Average Treatment Effect (RATE): $$\int_{0}^{1} alpha(q) TOC(q; S) dq$$, where alpha is a weighting method corresponding to either AUTOC or QINI.

• The Targeting Operating Characteristic (TOC): $$E[Y_i(1) - Y_i(0) | F(S(X_i)) \geq 1 - q] - E[Y_i(1) - Y_i(0)]$$, where $$F(\cdot)$$ is the distribution function of $$S(X_i)$$.

The Targeting Operating Characteristic (TOC) is a curve comparing the benefit of treating only a certain fraction q of units (as prioritized by $$S(X_i)$$), to the overall average treatment effect. The Rank-Weighted Average Treatment Effect (RATE) is a weighted sum of this curve, and is a measure designed to identify prioritization rules that effectively targets treatment (and can thus be used to test for the presence of heterogeneous treatment effects).

rank_average_treatment_effect(
forest,
priorities,
target = c("AUTOC", "QINI"),
q = seq(0.1, 1, by = 0.1),
R = 200,
subset = NULL,
debiasing.weights = NULL,
compliance.score = NULL,
num.trees.for.weights = 500
)

## Arguments

forest The evaluation set forest. Treatment prioritization scores S(Xi) for the units used to train the evaluation forest. Two prioritization rules can be compared by supplying a two-column array or named list of priorities (yielding paired standard errors that account for the correlation between RATE metrics estimated on the same evaluation data). WARNING: for valid statistical performance, these scores should be constructed independently from the evaluation forest training data. The type of RATE estimate, options are "AUTOC" (exhibits greater power when only a small subset of the population experience nontrivial heterogeneous treatment effects) or "QINI" (exhibits greater power when the entire population experience diffuse or substantial heterogeneous treatment effects). Default is "AUTOC". The grid q to compute the TOC curve on. Default is (10%, 20%, ..., 100%). Number of bootstrap replicates for SEs. Default is 200. Specifies subset of the training examples over which we estimate the RATE. WARNING: For valid statistical performance, the subset should be defined only using features Xi, not using the treatment Wi or the outcome Yi. A vector of length n (or the subset length) of debiasing weights. If NULL (default) these are obtained via the appropriate doubly robust score construction, e.g., in the case of causal_forests with a binary treatment, they are obtained via inverse-propensity weighting. Only used with instrumental forests. An estimate of the causal effect of Z on W, i.e., Delta(X) = E[W | X, Z = 1] - E[W | X, Z = 0], which can then be used to produce debiasing.weights. If not provided, this is estimated via an auxiliary causal forest. In some cases (e.g., with causal forests with a continuous treatment), we need to train auxiliary forests to learn debiasing weights. This is the number of trees used for this task. Note: this argument is only used when debiasing.weights = NULL.

## Value

A list of class rank_average_treatment_effect with elements

• estimate: the RATE estimate.

• std.err: bootstrapped standard error of RATE.

• target: the type of estimate.

• TOC: a data.frame with the Targeting Operator Characteristic curve estimated on grid q, along with bootstrapped SEs.

Yadlowsky, Steve, Scott Fleming, Nigam Shah, Emma Brunskill, and Stefan Wager. "Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects." arXiv preprint arXiv:2111.07966, 2021.

rank_average_treatment_effect.fit for computing a RATE with user-supplied doubly robust scores.

## Examples

# \donttest{
# Train a causal forest to estimate a CATE based priority ranking
n <- 1500
p <- 5
X <- matrix(rnorm(n * p), n, p)
W <- rbinom(n, 1, 0.5)
event.prob <- 1 / (1 + exp(2*(pmax(2*X[, 1], 0) * W - X[, 2])))
Y <- rbinom(n, 1, event.prob)
train <- sample(1:n, n / 2)
cf.priority <- causal_forest(X[train, ], Y[train], W[train])

# Compute a prioritization based on estimated treatment effects.
# -1: in this example the treatment should reduce the risk of an event occuring.
priority.cate <- -1 * predict(cf.priority, X[-train, ])$predictions # Estimate AUTOC on held out data. cf.eval <- causal_forest(X[-train, ], Y[-train], W[-train]) rate <- rank_average_treatment_effect(cf.eval, priority.cate) rate#> estimate std.err target #> -0.2290516 0.02730834 priorities | AUTOC # Plot the Targeting Operator Characteristic curve. plot(rate) # Compute a prioritization based on baseline risk. rf.risk <- regression_forest(X[train[W[train] == 0], ], Y[train[W[train] == 0]]) priority.risk <- predict(rf.risk, X[-train, ])$predictions

# Test if two RATEs are equal.
rate.diff <- rank_average_treatment_effect(cf.eval, cbind(priority.cate, priority.risk))
rate.diff#>     estimate    std.err                                target
#>  -0.22905160 0.02491746                 priority.cate | AUTOC
#>  -0.03267124 0.03048472                 priority.risk | AUTOC
#>  -0.19638036 0.03097604 priority.cate - priority.risk | AUTOC
# Construct a 95 % confidence interval.
# (a significant result suggests that there are HTEs and that the prioritization rule is effective
# at stratifying the sample based on them. Conversely, a non-significant result suggests that either
# there are no HTEs or the treatment prioritization rule does not predict them effectively.)
rate.diff$estimate + data.frame(lower = -1.96 * rate.diff$std.err,
upper = 1.96 * rate.diff$std.err, row.names = rate.diff$target)#>                                             lower       upper
#> priority.cate | AUTOC                 -0.27788982 -0.18021337
#> priority.risk | AUTOC                 -0.09242128  0.02707881
#> priority.cate - priority.risk | AUTOC -0.25709340 -0.13566732# }