Trial by Error, Continued: More on Graded Exercise from Peter White and The Lancet

By David Tuller, DrPH

[June 30, 2017: This post has been corrected and revised.]

Professor Peter White and colleagues have published yet another study in The Lancet promoting graded exercise as an appropriate intervention for the illness they refer to as “chronic fatigue syndrome” but that is more appropriately called “myalgic encephalomyelitis.” (Two compromise terms, ME/CFS and CFS/ME, satisfy no one.) This new article exhibits the range of problems found repeatedly in this body of research, including the reliance on subjective outcomes for an open-label trial, unusual outcome-switching, and self-serving presentations of data.

In short, this latest study seeks to bolster the crumbling evidence base for the PACE/CBT/GET paradigm by reporting modest benefits for graded exercise. But as with previous research espousing this approach, even the unimpressive results reported here cannot be taken seriously by scientists who understand basic research standards.

The full name of the article is: “Guided graded exercise self-help plus specialist medical care vs. specialist medical care alone for chronic fatigue syndrome (GETSET): a pragmatic randomised controlled trial.” It involved 211 patients, split into two groups. Both groups received at least one meeting with a doctor—what the study called “specialist medical care.” The intervention group also received up to 90 minutes of instruction from a physiotherapist on how to pursue a self-guided graded exercise program.

The results presented are short-term–12 weeks after trial entry. The investigators reported very modest benefits for the intervention in scores for self-reported fatigue and non-clinically significant improvements in scores for self-reported physical function. There has already been some terrific analysis of the study’s shortcomings on patient forums, so I’m just going to make a few points here.

The study design itself incorporates a huge and fundamental flaw: The unreliable combination of an open-label study with subjective outcomes. Experts outside the influence of Sir Simon Wessely, most notably Jonathan Edwards, an emeritus professor of medicine from University College London, have repeatedly highlighted this feature as rendering the findings meaningless. As Professor Edwards wrote recently in his commentary for the Journal of Health Psychology, this flaw alone makes the PACE trial “a non-starter in the eyes of any physician or clinical pharmacologist familiar with problems of systemic bias in trial execution.” Studies with this design, he explained, have been abandoned in other fields of medicine.

The difficulty of shielding such trials from systematic bias is the reason that studies are blinded in the first place. It is common sense that if you tell people in one group that the intervention they are getting should help them, and then if you don’t give the intervention to people in another group and do not provide encouragement that they will improve, more people from the first group than the second group are likely to tell you in the short term that they feel a bit better.

It does not mean that they have experienced any objective improvements. It also doesn’t mean that these self-reported benefits of the intervention will be apparent at long-term follow-up. In fact, follow-up studies in this body of research do not provide evidence of long-term differences between study groups.

Unlike in the PACE trial, the investigators chose not to test these subjective findings against objective outcomes. They acknowledge the absence of objective outcomes as a limitation but do not explain why they made the choice to exclude them. Presumably they remembered that PACE’s own objective measures—the six-minute walking test, the step-test for fitness, and whether people got off benefits and back to work—all failed to confirm the trial’s claims of success. In other trials, objective measurements of participants’ movements have also failed to document benefits from the non- interventions tested for the illness.

[Correction/revision: The following three paragraphs replace material included in the original version posted on June 28, 2017]

In this new article, Professor White and his colleagues refer to the GETSET intervention as a “management approach.” The investigators fail to mention that a 2013 paper in Psychological Medicine purported to have proven that people could actually “recover” with GET. They mention patient surveys on reported harms from graded exercise, but they choose to omit the growing peer-reviewed literature on immunological and other dysfunctions of ME/CFS, from leading medical research centers around the world.

They also ignore the major 2015 report from the U.S. Institute of Medicine (now called the Academy of Medicine). This report, which involved an extensive review of the literature, identified “post-exertional malaise” as the cardinal symptom, in the process proposing to rename the illness “systemic exertion intolerance disease.” Other research has also shed light on possible pathophysiological pathways involved in causing the severe relapses that characterize the disease.

If post-exertional malaise or (per the IOM) exertion intolerance is the cardinal symptom, then graded exercise in any form could be contraindicated. Professor White and his colleagues obviously do not have to agree with this interpretation of the recent research and reports. But the failure to mention and discuss these findings in an article investigating a graded exercise intervention demonstrates the investigators’ apparent unwillingness or inability to grapple with the current state of scientific knowledge in the field.

In terms of outcome-switching, the investigators report that they made a confusing (to me) change midway through the trial. Here’s how they explain it:

The original protocol had only one primary outcome measure, the SF-36 PF. However, when some eligible participants were found to have high SF-36 PF scores at randomisation (because of their illness affecting cognitive or social functions but not physical function), we decided to also include fatigue, using the CFQ, as a co-primary outcome. This decision was made mid-way through trial recruitment (on June 20, 2013, after recruitment of 99 [47%] patients), before any outcome data had been examined, and was approved by the Research Ethics Committee, the Data Monitoring and Ethics Committee, and the Trial Steering Committee.

I’m not a statistician or an epidemiologist, but it struck me as unusual that investigators would be allowed to make a change of this magnitude in the middle of a trial. If the specialized treatment centers were following the NICE guidelines and yet diagnosing people having high SF-36 physical function scores, one obvious possible explanation is they could have been misdiagnosing people as having chronic fatigue syndrome even if they were suffering from chronic fatigue for other reasons. In that case, I can understand that adding a fatigue outcome measure might make it easier to demonstrate evidence for improvement. But could this addition be justified from the methodological and statistcal perspective?

For an answer, I turned to Bruce Levin, a professor of biostatistics at Columbia. Since Professor Levin first reviewed the PACE study at my request in 2015, he has been a staunch critic of the trial’s methodological choices and the decision by journals like The Lancet and Psychological Medicine to publish the questionable results. Here’s Professor Levin’s perspective on the choice to add fatigue [July 3, 2017: changed “physical function” to “fatigue”; as previously explained, “physical function” was already the primary outcome] as a co-primary outcome midway through the GETSET trial:

I think the main problem with this follows from the overarching issue of bias in unblinded studies.  Inflation of the type I error rate isn’t a problem because the significance criterion was adjusted to control that.  My concern is what the investigators could have observed and surmised regarding the treatment effect mid-way through the trial, even though they claim not to have looked at any A versus B comparisons.  Even an inkling that the pooled mean physical functioning outcome was too high could suggest a lack of treatment effect.

Obviously they were looking at baseline data in order to notice that “too many” subjects had non-disabled physical functioning.  There should have been no concern about imbalance between the groups in that regard, because they planned to adjust for baseline physical functioning, which would remove any chance imbalance.  [I assume that is the case—I haven’t seen the trial’s SAP.]  No, it seems that (once again) the investigators were second-guessing their own protocol and worrying about having “too little” room for improvement.  If the decision to change the primary endpoint (by adding a co-primary endpoint) was based on what they could see in this unblinded study, that would incur bias.

I find it astonishing that the investigators’ remedy for the perceived problem of “too many” non-disabled subjects was to add a co-primary endpoint.  If one is concerned about low power, the last thing one would ordinarily think of is adding a co-primary endpoint which reduces power, because the adjustment to control type I error makes it less likely to correctly declare statistical significance when the alternative hypothesis is true.

Furthermore, although mid-trial changes in protocol can be implemented without bias in so-called adaptive trial designs, it is important to note that such adaptations are contemplated a priori and built into the design of the study before it begins. This is the so-called “adaptive-by-design” feature.  Other ad hoc or post-hoc adaptations are to be avoided, especially in unblinded studies with self-reported endpoints.

So that’s the blunt assessment from an unimpeachable expert. The statisticians and numbers experts out there will understand the inner details of Professor Levin’s comments much better than me, but the gist is certainly clear. That The Lancet has once again chosen to publish work of this low caliber is sad but predictable, given the journal’s record in this domain. Although the 2011 paper was “fast-tracked” to publication, editor Richard Horton stated in a radio interview not long after the publication of the results that it had undergone “endless rounds” of peer review. He has not explained this apparent contradiction, despite my efforts to extract an answer from him.

This new publication again raises questions about the thoroughness and integrity of The Lancet’s reviewing and editorial processes. And the decision confirms what has been demonstrated repeatedly in this whole saga. Those who have tied their prestige and reputations to the PACE paradigm, like the editors of The Lancet, are willing to make themselves look ridiculous in their misguided and unconvincing efforts to defend this indefensible enterprise.


[June 30, 2017: Explanation for changes]:

In the original version, I quoted the following phrase from the new Lancet article and suggested it represented a change in direction for Professor White and his colleagues: “It is important to note that this trial was not designed to test causative factors in chronic fatigue syndrome, and the relative efficacy of a behavioural intervention does not imply that chronic fatigue syndrome is caused by psychological factors.”

I was wrong. Similar phrasing appeared in the 2011 Lancet article as well as other publications. My mistake was that I forgot to recognize the distinction in the PACE vocabulary between “causative” and “perpetuating” factors of the illness. I have removed the inaccuracy and statements arising from it, and have revised the section to accommodate the correction. For full transparency, I include the original paragraphs below. Of course, I apologize to Professor White and his colleagues for the error.


[Original version}:

In this new article, Professor White appears to be back-pedaling away from a central claim in PACE. GETSET avoids arguing that chronic fatigue syndrome (per the article’s usage) is “reversible” with a graded exercise program. The investigators also fail to mention that a 2013 paper in Psychological Medicine purported to have proven that people could “recover” with GET. Instead, they here refer to the intervention as a “management approach.”

They also insist that any apparent benefit for this management approach would not suggest anything about the cause of the illness. “It is important to note that this trial was not designed to test causative factors in chronic fatigue syndrome, and the relative efficacy of a behavioural intervention does not imply that chronic fatigue syndrome is caused by psychological factors,” they write.

To those who understand the history of this body of research, this statement represents somewhat of a surprising shift. By directly contradicting the description of GET in The Lancet’s 2011 paper, the statement appears to eviscerate the rationale for the graded exercise intervention in the first place. The description in 2011 is very clear. Deconditioning and avoidance of activity are the causes of the continuing symptoms, and the syndrome is “reversible” by addressing these specific problems:

GET was done on the basis of deconditioning and exercise intolerance theories of chronic fatigue syndrome. These theories assume that the syndrome is perpetuated by reversible physiological changes of deconditioning and avoidance of activity. These changes result in the deconditioning being maintained and an increased perception of effort, leading to further inactivity. The aim of treatment was to help the participant gradually return to appropriate physical activities, reverse the deconditioning, and thereby reduce fatigue and disability.

There is no room in this 2011 description of GET for whatever other “causative factors” Professor White now seems to acknowledge could be implicated in the disease. Presumably those possible “causative factors” include ongoing pathological organic processes independent of the “reversible physiological changes” arising from deconditioning. (Professor White has long acknowledged that acute biological illnesses can trigger the onset of chronic fatigue syndrome. This initial illness is then presumed to have launched the downward cycle of deconditioning and fear of activity–the factors seen as responsible for perpetuating the symptoms after the acute illness has been resolved.)

The difficulty for this new study is that GET in PACE is based on the hypothesis that people are experiencing “reversible physiological changes” arising from nothing other than serious deconditioning. Does Professor White still believe in this hypothesis, or not? On the evidence of the current paper, he has disavowed it, without explicitly acknowledging this disavowal.

Since the PACE version of GET was designed to address reversible symptoms occurring in the explicit absence of organic disorders, what is Professor White’s current rationale for recommending a graded exercise approach? And if the underlying rationale for the intervention is no longer the absence of organic disorders, how can PACE itself or the NICE guidelines or Cochrane’s systematic reviews be credibly cited in support of it?

If I interpret the new paper correctly, it appears that Professor White and his team do not believe the PACE hypothesis that deconditioning and avoidance of activity are the sole causes of the perpetuation of the symptoms. They acknowledge that other possible “causative factors” could be involved. Given that change in perspective about the illness itself, I don’t understand the basis on which they are recommending graded exercise. I think they need to provide a new and plausible scientific explanation to support the continued testing of this approach on human subjects diagnosed with this disease.

This is especially so given the wealth of information that has emerged since 2011. The investigators mention patient surveys on reported harms from graded exercise, but they choose to ignore the growing peer-reviewed literature from leading medical research centers around the world. They also ignore the major 2015 report from the U.S. Institute of Medicine (now called the Academy of Medicine), which identified “post-exertional malaise” as the cardinal symptom, in the process renaming it “exertion intolerance.”

If “post-exertional malaise” or “exertion intolerance” is the cardinal symptom, then graded exercise in any form could be contraindicated. Professor White and his colleagues obviously do not have to agree with this reasonable interpretation of the recent research and reports. But the failure to mention and discuss these findings in an article investigating a graded exercise intervention demonstrates the investigators’ apparent unwillingness or inability to grapple with the current state of scientific knowledge in the field.

