The Prolonged Follow-up Fallacy of Randomized Controlled Trials

Originally published by our sister publication Anesthesiology News

I recently listened to a Zoom presentation displaying or referring to a vast volume of statistically significantly solid clinical data proving the efficacy of metabolic/bariatric surgery for weight reduction and comorbidity resolution. The session, however, concluded with two recommendations: the need for more randomized controlled trials (RCTs) and the need for longer RCTs.

The first recommendation for additional RCTs to legitimize a surgical intervention in the face of overwhelming affirmative data is a hallmark of many internists’ opinions, especially for those whose palliative, perpetual care for a disease entity may be threatened by a cure or a prolonged mitigation. At the same time, their own nonsurgical therapeutic protocols have had far fewer RCTs, if any at all.

The second recommendation for ever longer RCTs has also received the approbation of many practicing clinical surgeons—trialists forever. I question the logic of this perspective, and I will cite five potential fallacies of the precept that the longer an RCT, the better it is. My assessment is Bayesian in nature namely, that in addition to examining what you are looking for in an RCT, you pay attention to what has become evident.

Fallacy 1: The Intention-to-Treat Principle

In every RCT, a certain number of subjects will not adhere to their randomization assignment. Some assigned to the trial’s intervention will balk and not follow through, particularly if the modality involves an operative or invasive procedure, unpleasant side effects, or off-putting follow-up testing. Some subjects assigned to controls, although having agreed to randomization, will regret not receiving the intervention and go elsewhere to obtain the procedure or drug.

A standard commandment of trial statisticians is that an RCT needs to employ intention-to-treat analysis. To the minds of most clinicians, certainly my own, this rule has always escaped logic and the Bayesian approach to reality. An RCT asks if an intervention achieves remission, a disease-free interval, a cure, a prolongation of well-being and even increased life expectancy. If a failure to abide by randomization results in enough intervention group patients not receiving the intervention and/or enough control group patients receiving the intervention, the trial outcome may not answer the hypothesis (or null hypothesis, again to obey the reverse logic of trial statisticians). Does such a trial serve the purpose of unmasking reality, or does it confound truth and detrimentally affect disease management? If a trial result biased by the intention-to-treat principle still achieves statistical significance (conventionally defined as less than a 5% probability of the results being due to chance alone [P<0.05]), it is stronger because of this convention; however, if the intention-to-treat bias is robust enough to increase the P value (P>0.05), we should at least analyze the outcomes data on the basis of the actual number of patients who followed their randomization assignment.

With regard to the intention-to-treat fallacy and the perception that the longer a trial follow-up, the better it is, and if intention-to-treat introduces bias and conclusions contrary to reality, then as the number of trial subjects refusing intention-to-treat increases, the RCT conclusion becomes more misleading.

Fallacy 2: Participant Dropouts

A good trial design accounts for an adequate study population—intervention subjects and controls—allowing for trial dropouts. However, the dropout group, unknown at the start of a trial, is different from the trial adherents. They are not as compliant with a personal commitment as the subjects who remain faithful to the trial’s follow-up precepts. Certain participants assigned to control or intervention will become bored with, or balk at, the follow-up requirements and just drop out.

If the dropout rate is disproportionate between the control and intervention groups, it may influence outcomes. If disproportionate group dropouts continue over time, trial accuracy can be compromised and trial follow-up duration becomes a detriment rather than an asset.

Fallacy 3: Unaccounted Intrinsic Subject Variation

Everyone is born with a genetically determined future—modifiable, of course, by circumstance. This predestination for developing or avoiding certain diseases, as well as family life expectancy, can never be fully or even partly accounted for in the screening before randomization in an RCT. To ensure an uniform population before randomization, screening can include age, sex, recent significant afflictions, personal habits, social status and other variables, but can rarely account for genetic adaptability, survival mechanisms and metabolic response mechanisms. I remember a presentation at an American College of Surgeons annual congress, by past executive director Dr. David Hoyt, on variations in survival of accident victims with identical traumatic injury, after identical matching on age, sex, etc. Randomization, therefore, cannot account for the intrinsic life force of each RCT participant. Over time, the discrepancies in this essential determinant will become more apparent and may covertly influence outcomes and, thereby, the validity of study conclusions.

Fallacy 4: External Variability of Environment

After randomization and the start of a trial, the environmental circumstances of trial participants will most certainly vary. Many individuals will change their personal habits: for example, smoking; some will move to other parts of the country; personal wealth and privileges will vary; and so on. These environmental influences may well alter outcomes in the intervention and control groups that can obfuscate the clarity of responses to the question asked by the RCT. With the passage of time, these infringements on responses will multiply as the number of participants affected by external influences increases. Once again, time may not be conducive to achieving the true outcomes of an RCT.

Fallacy 5: Everyone Dies

Returning to the Zoom presentation at the beginning of this article, a broad and conservative reading of the literature indicates that type 2 diabetes remission is achieved after metabolic/bariatric surgery, in particular with gastric bypass, in about 75% of patients for two to three years, and in about 50% for 10 years. Yet certain internists and diabetologists have used this declining remission rate after metabolic/bariatric surgery to disparage operative intervention. If these data were for a therapy (medical or surgical) for a malignancy, no practitioner would denigrate these results. On the contrary, they would hail them as a therapeutic triumph. Furthermore, because the cardiovascular, ocular, neurologic and other ravages of type 2 diabetes take about 20 years to become fully manifest, who would not welcome a 10-year postponement of these tragic outcomes for half of an affected population?

In the 1990 official report of the POSCH (Program on the Surgical Control of the Hyperlipidemias) trial, the combined end point, at 9.7 years average patient follow-up, of death due to coronary heart disease and confirmed nonfatal myocardial infarction was 35% lower in the intervention group (P<0.001) (N Engl J Med 1990;323:946-955). In 1992, the confirmatory statin trials were published, emphasizing the benefits of lipid modification. Maximum national attention on cholesterol as a risk factor for atherosclerotic cardiovascular disease followed. It was only in 2010, however, that POSCH follow-up data demonstrated a statistically significant (P<0.05) increase in life expectancy in the intervention group compared with controls, and that of only one year (Ann Surg 2010;251[6]:1034-1040). Yet by 1967—47 years earlier—the U.S. mortality rate for atherosclerotic coronary artery disease began to decline, reinforced by the cholesterol trials published in the 1990s. Would it have been worth a 40-year, or at least 30-year, wait before national advocacy for lipid modification?

If the “big data” national databases of today had been available in the 1990s, a surrogate artificial intelligence, computer-generated assessment of POSCH’s near 10-year life expectancy data available in 1990 would have yielded a P value well below the magic number of 0.05 for an anticipated extension of life expectancy far longer than one year.

Again, the longer an RCT, the weaker affirmative findings may become. In longer trials, truths can be lost, benefits avoided and lives detrimentally modified. With respect to overall mortality, or its reciprocal life expectancy, everyone dies. An RCT trial is not a contest to determine the last person standing.

Reflections

Unlimited allocation of the time for statistical end point assessment can be flawed because a trial’s affirmative probability may not only increase with time, but may decrease due to bias introduced by intention-to-treat, participant dropouts, intrinsic subject variation, environmental factors, and the inevitability of participants aging and dying. In essence, R=O/T, where R=reliability, O=outcomes and T=time, may have as much, or more, validity as R=O×T.

RCTs, or meta-analyses of RCTs, are the gold (diamond, astatine) standard for clinical knowledge and therapeutics. These powerful instruments should be used to promote, not to prolong or deny, available laudable patient care.

Buchwald is a professor emeritus of surgery and biomedical engineering, and the Owen H. and Sarah Davidson Wangensteen Chair in Experimental Surgery, at the University of Minnesota, in Minneapolis.

Commentary