Ch22 Target Trail Emulation

As discussed in Part I, causal inference from observational data can be viewed as an attempt to emulate a hypothetical randomized trial, which we refer to as the target trial. Since we now have all the tools that are needed to tackle causal inferences with time-varying treatments, this chapter generalizes the concept of the target trial to sustained treatment strategies and outlines a unified framework for causal inference, regardless of whether the data arose from a randomized experimental or an observational study.

正如在第一部分中讨论的那样,可以将观察数据的因果推断看作是模仿 hypothetical randomized trial 的尝试,我们将其称为目标试验 (target trial)。由于我们现在拥有处理随时间变化的因果推断所需的所有工具,因此本章 generalizes the concept of the target trial to sustained treatment strategies,并概述了因果推论的统一框架,无论数据是否来自随机实验或观察性研究。

This chapter also describes a taxonomy of causal effects that may be of interest when emulating a target trial, including intention-to-treat and per-protocol effects. Valid estimation of those causal effects generally requires data on time-varying prognostic factors and treatments, as well as appropriate adjustment for those time-varying factors using \(g\)-methods. It is precisely the development of \(g\)-methods that makes the concepts discussed here something more than a formal exercise: if data are available, the effects of interest can now be validly estimated.

本章还描述了 a taxonomy of causal effects that may be of interest when emulating a target trial**, 包括intention-to-treat and per-protocol effects. 要对这些因果效应进行有效估计,通常需要 data on time-varying prognostic factors and treatments, as well as appropriate adjustment for those time-varying factors using \(g\)-methods. 正是 \(g\) 方法的发展,使这里讨论的概念比正式的实践更重要:如果有数据可用,现在就可以有效地估计出感兴趣的因果效应。

Table of Contents

  1. The target trial

  2. Causal effects in randomized trials

  3. Causal effects in observational analyses that emulate a target trial

  4. Time zero

  5. A unified analysis for causal inference

The target trail

为了确定思路,请考虑一项随机试验,以评估抗逆转录病毒疗法对HIV阳性个体5年死亡风险的影响。Eligible participants–over 18 years of age, no AIDS, no previous used of antiretroviral therapy–are randomly assigned to either treatment strategy \(g\) or treatment strategy \(g'\) at the start of follow-up \(k = 0\) (baseline). Their follow-up starts at the time of assignment and ends at death, loss to follow-up, or 60 months after baseline, whichever occurs earlier. Of course, as in any trial, not all participants adhere to the treatment strategies defined in the trial protocol. That is, there are deviations from protocol.

Our trial is a pragmatic trial. 特别是,参与者及其治疗医师知道他们所接受的治疗 (i.e., the treatment assignment is not blinded), 没有人接受安慰剂 (i.e., both strategies \(g\) and \(g'\) involve either active treatments or no treatment), 并且与研究之外的普通患者一样,对参与者进行频繁和严格的监控。 A pragmatic trial is preferable when the goal is quantifying the effects of treatment strategies under realistic conditions, including that physicians and participants are aware of their care received by the latter.

If conducting this pragmatic randomized trial were not possible, we may attempt to emulate it through the analysis of existing observational data. We then refer to the trial as the target trial for our observational analysis.

如果不可能进行这种实用的随机试验,我们可以尝试通过对现有观测数据的分析来模拟它。然后,我们将该试验称为目标试验,以进行观察性分析。

Specifying the protocol of the target trial 是一种有用的手段 to clarify the causal question of interest that we wish our observational analysis to answer. 至少,我们需要指定 protocol 的以下关键组成部分: eligibility criteria, start and end of follow-up, treatment strategies, outcomes of interest, causal contrast, and data analysis plan. 请注意,目标试验方案的精确规范可能需要对可用数据进行一些探索。 For example, only after having determined that the data included information on HIV diagnosis, we can reasonably propose to emulate a target trial of HIV-positive individuals.

我们在第3章中介绍了目标试验的概念。但是,第I部分和第II部分仅涉及比较固定治疗的简单目标试验。现在,我们准备讨论现实的目标试验 that compare sustained treatment strategies like \(g_1\) “take therapy continuously during the follow-up, unless a contraindication or toxicity arises” and \(g_0\) “ refrain(避免) from taking therapy during the follow-up”. The next section defines causal effects of interest in (real and emulated) randomized trials concerned with sustained treatment strategies. Additional contrasts of sustained strategies–referred to as direct effects–are described in Technical Point 22.1.

Causal effects in randomized trials

Let us review three types of causal effects that may be of interest in a randomized trial. 为此,我们需要一些熟悉的符号。 Let \(A_k\) take value 1 if the individual receives therapy at time \(k\) and 0 otherwise, and \(C_k\) take value 0 if the individual remains uncensored at time \(k\) and 1 otherwise, for \(k = 0, 1, ..., K\) with \(K = 59\). Our trial will assign eligible individuals to either the strategy \(g_1\) “receive treatment \(A_k = 1\) continuously during the follow-up unless a contraindication or toxicity arises” or the strategy \(g_0\) “receive treatment \(A_k = 0\) continuously during the follow-up”. The assignment indicator \(Z\) takes value 1 if the individual is assigned to \(g_1\) and 0 if assigned to \(g_0\).

In the previous chapters of Part III, we were interested in the causal effect of treatment on an outcome \(Y\) measured at the end of follow-up. Here we extend our description to causal effects on a failure time outcome. (因果估计量)That is, the goal of our trial is to estimate the causal effect on survival (see Technical Point 22.3). Let \(D_k\) be an indicator for death (1: yes, 0: no) by month \(k=1,2,...,K+1\).

Intensiton-to-treat effect

First, let us consider the effect of assignment to the treatment strategy, regardless of treatment actually received. This effect, commonly known as the intention-to-treat effect, is defined by a contrast of the outcome distribution under the interventions:

  • be assigned to strategy \(g_1\) at baseline and remain under study until the end of follow-up

  • be assigned to strategy \(g_0\) at baseline and remain under study until the end of follow-up

The intention-to-treat effect at time \(k\) can then be expressed as the contrast of the counterfactual risks of death \(P(D_k^{z=1, \bar{c}_k = \bar{0}}=1) - P(D_k^{z=0, \bar{c}_k = \bar{0}}=1)\) under assignment to strategy \(g_1(z=1)\) versus\(g_0(z=0)\) if nobody had been lost to follow-up through time \(k(\bar{c}_k=0)\).

In some randomized trials, assignment to and initiation of the treatment strategies occur simultaneously. That is, all individuals assigned to strategy \(g_1\) start to receive treatment at time 0, regardless of whether they continue taking it after baseline, and no individuals assigned to strategy \(g_0\) receive treatment at time 0, regardless of whether they start taking it after baseline. In those cases, the intention-to-treat effect is not only the effect of assignment but also the effect of initiation of treatment, e.g., \(P(D_k^{a_0=1, \bar{c}_k = \bar{0}}=1) - P(D_k^{a_0=0, \bar{c}_k = \bar{0}}=1)\).

(意向性治疗的特点) The intention-to-treat effect is agnostic about any treatment decisions made after baseline, including discontinuation or initiation of the treatments of interest, use of non-approved concomitant treatments, or any other deviations from protocol. This agnosticism implies that the magnitude of the intention-to-treat effect may heavily depend on the particular patterns of deviation from protocol that occur during the conduct of each trial. Two studies with the same protocol but conducted in different settings may have different intention-to-treat effect estimates with neither of them being biased.

Per-protocol effect

Second, let us consider the effect of receiving the interventions as specified in the study protocol. We refer to this effect as the per-protocol effect. The per-protocol effect is defined by a contrast of the outcome distribution under the interventions:

  • receive treatment strategy \(g_1\) continuously between baseline \(k = 0\) and end of follow-up

  • receive treatment strategy \(g_0\) continuously between baseline \(k = 0\) and end of follow-up

The per-protocol effect at time \(k\) can then be expressed as the contrast of the counterfactual risks of death \(P(D_k^{g_1, \bar{c}_k = \bar{0}}=1) - P(D_k^{g_0, \bar{c}_k = \bar{0}}=1)\) under full adherence to strategy \(g_1\) versus \(g_0\) if nobody had been lost to follow-up through time \(k(\bar{c}_k=0)\).

Sensible trial protocols will not mandate that treatment be continued no matter what happens to the individual. For example, our strategy \(g_1\) of continuous treatment mandates treatment discontinuation when a contraindication or toxicity arises. That is, the per-protocol effect generally involves the comparison of dynamic strategies (“do this, if \(X\) happens then do this other thing”) rather than static strategies (“do this, no matter what happens”). Remember that we already made this point in Fine Point 19.2.

明智的试验方案不会强制要求继续治疗,无论个人情况如何。例如,当出现禁忌症或毒性反应时,我们的连续治疗策略 \(g_1\) 要求停止治疗。也就是说,按协议的效果通常涉及动态策略(“如果\(X\)发生,则执行此操作,然后做另一件事”)的比较,而不是静态策略(“无论发生什么,执行此操作”)的比较。请记住,我们已经在Fine Point 19.2中指出了这一点。

Sometimes the study protocol is not explicit about the dynamic nature of the treatment strategies. For example, the protocol may simplify the description of strategy \(g_1\) as “receive treatment \(A = 1\) continuously during the follow- up” without explicitly stating that the therapy must be discontinued “when a contraindication or toxicity arises”. 策略 \(g_1\) 的这种简化描述可能会引起误解。 Specifically, an individual assigned to \(g_1\) who discontinues therapy because of toxicity should not be labeled as someone who is not adhering to strategy \(g_1\). In fact, that person is perfectly adhering to strategy \(g_1\) as (it should have been) stated in the protocol. When doing otherwise is not an option in the real world, discontinuation of the originally assigned treatment or initiation of other treatments cannot possibly be considered a deviation from protocol. 由于 per-protocol effect 是由现实策略的对比定义的,因此它与因果推理研究特别相关,后者旨在为现实世界中的决策提供证据。

实际上, the per-protocol effect 通常是推断的隐含目标. For example, often investigators question the fidelity of the interventions implemented in the study to the interventions described in the protocol, and say that there is “bias”. This language indicates that the investigators are really interested in comparing the interventions implemented during the follow-up as specified in the protocol (i.e., the per-protocol effect) and not in the effect of assignment to the interventions at baseline (i.e., the intention-to-treat effect) because non-adherence after baseline cannot possibly bias the effect of assignment at baseline.

Effect of receiving interventions

Finally, let us consider the effect of receiving interventions other than the ones specified in the study protocol. Suppose that, while our trial is being conducted, a consensus started to emerge that strategy \(g_0\) “receive treatment \(A = 0\) continuously during the follow-up” is inferior to strategy \(g_1\). Therefore some physicians began to recommend initiation of therapy when the clinical course worsened, e.g., when the CD4 cell count (\(L_k\)) first dropped below 200 \(cells/\mu L\). As a result, many individuals in the trial who were assigned to strategy \(g_0\) actually followed the modified strategy \(g_0'\) “receive treatment \(A_k = 0\) continuously during the follow-up but, after \(L_k \leq 200\), switch to treatment \(A_k = 1\)”. The contrast of outcome distributions under the interventions

  • receive treatment strategy \(g_1\) continuously between baseline \(k = 0\) and end of follow-up

  • receive treatment strategy \(g_0'\) continuously between baseline \(k = 0\) and end of follow-up

corresponds to neither the intention-to-treat effect nor the original per-protocol effect. Rather, it is a question about the per-protocol effect in a hypothetical trial in which individuals are randomized to either strategy \(g_1\) or \(g_0'\) .

This example illustrates how causal effects of interest(that do not correspond to the original per-protocol effect) can be conceptualized as per-protocol effects in target trials that can be emulated using the randomized trial data.

有趣的是, if the strategies of interest differ from those in the actual trial, it is actually disadvantageous to have all participants in the actual trial adhere to the strategies specified in the protocol. 具体而言, complete adherence implies that the trial data cannot be used to emulate a target trial with a different protocol (because no individuals followed the protocol of the target trial in the actual data). 例如, a randomized trial with full adherence in which HIV-positive individuals are assigned to different CD4 cell count thresholds at which to initiate antiretroviral therapy is of little use to emulate a trial in which individuals are assigned to either continuous treatment or no treatment, and vice versa. (正是这种 noncompliance 使我们能够使用来自给定随机试验的数据,来模拟其他随机试验,这些试验可以回答不同的,可能更相关的因果问题。) It is precisely the noncompliance that allows us to use the data from a given randomized trial to emulate other randomized trials that answer different, perhaps more relevant, causal questions.

Causal effects in observational analyses that emulate a target trial

The causal effects described above for randomized trials can be analogously defined for observational analyses that emulate a target trial.

The observational analog of the intention-to-treat effect is defined by a contrast of the outcome distribution under the hypothetical interventions

  • initiate treatment \(A_0 = 1\) at baseline and remain under study until the end of follow-up

  • initiate treatment \(A_0 = 0\) at baseline and remain under study until the end of follow-up

This observational analog of the intention-to-treat effect at time \(k\) can then be expressed as the contrast of the counterfactual risks of death \(P(D_k^{a_0=1, \bar{c}_k = \bar{0}}=1) - P(D_k^{a_0=0, \bar{c}_k = \bar{0}}=1)\). That is, it corresponds to the intention-to-treat effect in a target trial in which assignment to and initiation of the strategies occurs simultaneously.(也就是说,它对应于目标试验中的意向治疗效果,在该试验中,策略的分配和启动同时发生。 )

The observational analog of the intention-to-treat effect differs slightly from the intention-to-treat effect in trials in which some individuals assigned to a particular strategy may never initiate it.

In our example, we would estimate an observational analog of the intention-to-treat effect by comparing individuals who do and do not initiate antiretroviral therapy at baseline. This observational effect differs from the intention-to-treat effect of a target trial in which some individuals assigned to \(g_1\) do not take any dose of treatment. Yet a hypothetical intervention on initiation, as opposed to assignment, of treatment preserves a key feature of the intention-to-treat effect: interventions are defined solely by events occurring at baseline.(干预措施仅由基线发生的事件定义。 )

The observational analog of the per-protocol effect is defined identically as that for the target trial. In randomized trials we differentiated between the original per-protocol effect and the per-protocol effects in alternative target trials. In observational studies this difference is unnecessary because, in the absence of a pre-specified protocol, each per-protocol effect corresponds to a particular target trial. In general, we can only use observational data to emu- late target trials whose intended interventions are actually followed by at least some individuals in the study. In some settings, however, investigators may be willing to use modeling, e.g., dose-response structural models, to extrapolate beyond the interventions that are atually present in the data.

每种方案效果的观察类似物的定义与目标试验的相似。在随机试验中,我们将原始的基于方案的效果与替代方案目标试验中的基于方案的效果区分开来。在观察性研究中,这种差异是不必要的,因为在没有预先指定的方案的情况下,每种方案的效果均对应于特定的目标试验。一般而言,我们只能使用观察数据来进行目标试验,该试验的预期干预实际上在研究中至少受到某些人的关注。然而,在某些情况下,研究人员可能愿意使用建模(例如剂量响应结构模型)来推断数据中通常不存在的干预措施。

An advantage of defining the causal effects in observational studies in ref- erence to those in the target trial is that we are then forced to be explicit about the strategies that are compared. Once we adopt this viewpoint, it is obvious that certain comparisons cannot be translated into a contrast between hypothetical interventions and therefore should be avoided, at least when the goal of the analysis is to help decision makers. Revisit Sections 3.5 and 3.6 if necessary.

Another advantage of an explicit definition of causal effects in observational studies is clarity. As discussed in Fine Point 9.4, there is a widespread view that the intention-to-treat effect measures the effectiveness of treatment (loosely defined: the effect of treatment that would be observed under realistic condi- tions), whereas the per-protocol effect measures efficacy (loosely defined: the effect of treatment that would be observed under perfect conditions). This view is especially problematic when interested in sustained treatment strategies: it is often difficult to argue that a per-protocol effect of sustained strategies in a realistic setting measures efficacy, or that the intention-to-treat effect in the presence of uncertainty about the benefits (or harms) of treatment measures the effectiveness after those benefits (or harms) are proven. As a result, the labels “effectiveness” and “efficacy” are ambiguous in settings with sustained strategies over long periods. An explicit definition of the treatment strategies that define the causal effect of interest is more informative because decision makers need information about the effect of well-defined causal interventions.

Time zero

A crucial component of target trial emulation is the determination of the start of follow-up, also referred to as baseline or time zero, in the observational analysis. Eligibility criteria need to be met at that point but not later; study outcomes begin to be counted after that point but not earlier.

目标试验仿真的重要组成部分是观察分析中确定开始随访,也称为基线或零时区。此时需要满足资格标准,但不晚于此;研究结果从那时起开始计算,但没有更早。

In randomized experiments, the time zero for each individual is the time when they are assigned to a treatment strategy while meeting the eligibil- ity criteria. For example, in our randomized trial of antiretroviral therapy, time zero is, the time when the treatment strategies are assigned (the time of randomization), which usually occurs shortly before, or at the same time as, treatment is initiated. We do not start the follow-up, say, 2 years before or after randomization. Starting before randomization would not be reason- able because the treatment strategies had yet to be assigned at that time and the eligibility criteria had not been defined, much less met; starting follow-up after randomization is potentially biased as deaths during the first two years of the trial would thereby be excluded from the analysis. If treatment had a short-term effect on mortality, it would be missed. Even more problematic, if treatment does indeed have a short-term effect, then more susceptible individ- uals would have died by year 2 in the group assigned to active treatment but not in the other group. This differential proportion of susceptible individuals after two years destroys the baseline comparability achieved by randomization and opens the door to selection bias.

在随机实验中,每个人的零时间是指他们在符合资格标准时被分配到治疗策略的时间。例如,在我们的抗逆转录病毒疗法随机试验中,时间零是指分配治疗策略的时间(随机化时间),通常发生在治疗开始前不久或同时发生。我们不会在随机分组之前或之后的2年内开始随访。在随机分组之前开始是没有道理的,因为那时还没有分配治疗策略,并且还没有定义资格标准,更不用说了。随机分组后开始随访可能会产生偏差,因为该试验的前两年死亡可能会从分析中排除。如果治疗对死亡率有短期影响,则将被忽略。更成问题的是,如果治疗确实确实具有短期效果,那么在接受积极治疗的组中,更易感的个体将在第二年死亡,而另一组则没有。两年后,易感个体的这种差异比例破坏了随机化获得的基线可比性,并为选择偏见打开了大门。

A unified analysis for causal inference

Unifying the causal analysis of randomized and observational studies requires a common language to describe both types of studies. The concept of the target trial provides that common language.(目标试验的概念提供了该通用语言。) Aside from baseline randomization, there are no other necessary differences between analyses of observational data that emulate a target trial and of true randomized trials. That is, a randomized trial can be viewed as a follow-up study with baseline randomiza- tion and observational longitudinal data as a follow-up study without baseline randomization.

实际上,只有三件事可以将数据与随机实验和观察研究区分开来。 In randomized experiments, - (i) no baseline confounding is expected because of randomization, - (ii) the randomization probabilities are known, and - (iii) the assignment to a treatment strategy is known for each individual at baseline.

An observational analysis can emulate (i) if one measures and appropriately adjusts for a sufficient set of covariates, and (ii) if the model for treatment assignment is correctly specified. Interestingly, (iii) is not necessary for estimating the per-protocol effect in neither randomized experiments nor observational studies because efficient estimators (that are functions of the sufficient statistic) do not use this information.

The similarities between follow-up studies with and without baseline randomization are increasingly apparent in the health and social sciences as a growing number of randomized experiments attempt to estimate the effects of sustained treatment strategies over long periods in real world settings. These studies are a far cry from the short experiments in highly controlled settings that put randomized trials at the top of the hierarchy of study designs in the early days of clinical research. 长期持续治疗策略的随机实验, with their potential for substantial deviations from protocol (e.g., imperfect adherence to the assigned strategy, loss to follow-up), 是受制于混淆和选择偏差 that we have learned to associate exclusively with observational studies. In particular, when estimating a per-protocol effect, both randomized trials and observational studies may need adjustment for time-varying prognostic factors that predict drop-out (selection bias) and treatment (confounding).

In view of these similarities, one might expect that randomized experiments and observational studies would be analyzed similarly, except for the fact that adjustment for baseline confounders is typically necessary in observational studies. In practice, however, the typical analysis of randomized experiments and observational studies differs radically, which is both perplexing and, as we argue below, problematic.

鉴于这些相似之处,人们可能希望对随机实验和观察性研究进行类似的分析,只是在观察性研究中通常需要对基线混杂因素进行调整。然而,在实践中,对随机实验和观察性研究的典型分析在根本上是不同的,这既令人困惑,又如我们下面所讨论的那样是有问题的。

In summary, the analysis of randomized trials and observational studies should be similar. If we feel compelled to adjust for time-varying confounding and selection bias in the analysis of observational studies, we should feel equally compelled to adjust for post-randomization confounding and selection bias in the analysis of randomized trials. The only necessary difference between follow-up studies with and without baseline randomization is, precisely, baseline randomization. That is, adjustment for baseline confounding will not be generally required in intention-to-treat analyses of randomized trials. However, adjustment for post-baseline (time-varying) factors will generally be necessary for per-protocol analyses of both randomized trials and observational studies. A unified approach to causal inference for sustained treatment strategies is possible based on the target trial concept and on g-methods.

总之,随机试验和观察性研究的分析应相似。如果我们在观察性研究的分析中被迫适应时变的混杂和选择偏见,那么在随机试验的分析中我们也应被迫适应随机化后的混杂和选择偏见。基线随机化与不基线随机化的随访研究之间唯一必要的区别就是基线随机化。也就是说,在随机试验的意向治疗分析中通常不需要调整基线混淆。但是,对于随机试验和观察性研究的每方案分析,通常需要调整基线后(时变)因素。基于目标试验概念和g方法,可以采用统一的方法对持续治疗策略进行因果推理。

Historically, randomized experiments have been considered far superior to observational studies for the purpose of making causal inferences and aiding decision-making. Unfortunately, randomized experiments are not always available because they may be expensive, infeasible, unethical, or simply untimely to support an urgent decision. Therefore, as much as we may like randomization, many decisions will need to be made in the absence of evidence from randomized trials. When we cannot conduct the randomized experiment that would answer our causal question, we resort to observational analyses. It is therefore important to use a sound approach to design and analyze observational studies. Making the target trial explicit is one step in that direction. When the goal is to assist decision making, the analysis of existing observational data need to explicitly emulate a trial and be evaluated with respect to how well they emulate their target trial.

从历史上看,出于做出因果推理和辅助决策的目的,随机实验被认为远远优于观察性研究。不幸的是,随机实验并非总是可用,因为它们可能昂贵,不可行,不道德或只是不及时地支持紧急决定。因此,就我们可能希望的随机性而言,在缺乏随机试验证据的情况下,需要做出许多决定。当我们无法进行能够回答因果问题的随机实验时,我们会进行观察性分析。因此,重要的是要采用合理的方法来设计和分析观测研究。明确目标试验是朝这个方向迈出的一步。当目标是协助决策时,需要对现有观察性数据进行分析以明确模拟试验,并就其模拟目标试验的效果进行评估。