Trial By Error, Continued: Did the PACE Trial Really Prove that Graded Exercise Is Safe?

By Julie Rehmeyer and David Tuller, DrPH

Julie Rehmeyer is a journalist and Ted Scripps Environmental Journalism Fellow at the University of Colorado, Boulder, who has written extensively about ME/CFS.

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley.

Joining me for this episode of our ongoing saga is my friend and colleague Julie Rehmeyer. In my initial series, I only briefly touched on the PACE trial’s blanket claim of safety. Here we examine this key aspect of the study in more detail, which is complicated and requires a deep dive into technicalities. Sorry about that, but the claim is too consequential to ignore.

One of the most important and controversial claims from the PACE Trial was that graded exercise therapy is safe for patients with chronic fatigue syndrome (or ME/CFS, as U.S. government agencies now call it).

If this treatment is done by skilled people in an appropriate way, it actually is safe and can stand a very good chance of benefiting [patients], Michael Sharpe, one of the principal PACE investigators, told National Public Radio in 2011, shortly after The Lancet published the first results.

But to many in the ME/CFS community, this safety claim goes against the very essence of the disease. The hallmark of chronic fatigue syndrome, despite the name, is not actually fatigue but the body’s inability to tolerate too much exertion,a phenomenon that has been documented in exercise studies. All other symptoms, like sleep disorders, cognitive impairment, blood pressure regulation problems, and muscle pain, are exacerbated by physical or mental activity. An Institute of Medicine report this year even recommended that the illness be renamed to emphasize this central problem, choosing the name systemic exertion intolerance disease, or SEID. [see correction below]

A careful analysis shows that the PACE researchers’ attempts to prove safety were as flawed as their attempts to prove efficacy. However, while the trial reports gave enough information to establish that the treatments were not effective (in spite of the claims of success and recovery), they did not give enough information to establish whether they were safe (also in spite of their claims). We simply do not know.

I would be very skeptical in recommending a blanket statement that GET is safe, says Bruce Levin, a biostatistician at Columbia University, who has reviewed the PACE trial and found other methodological aspects indefensible. The aphorism that absence of evidence is not evidence of absence applies here. There is real difficulty interpreting these results.

* * * * * *

Assessing the PACE team’s safety claims is critical, because the belief that graded exercise is safe has had enormous consequences for patients. In the UK, graded exercise therapy is recommended for all mild to moderate ME/CFS patients by the National Institute for Health and Care Excellence, which strongly influences treatment across the country. In the US, the Centers for Disease Control and Prevention also recommends graded exercise.

Exertion intolerance, also called post-exertional malaise, presents ME/CFS patients with a quandary: They want to do as much as they can when they’re able, while not doing so much that they make themselves sicker later. Among themselves, they’ve worked out a strategy to accomplish that, which they call pacing. Because their energy levels fluctuate, they carefully monitor how they are feeling and adapt their activities to stay within the day’s energy envelope. This requires sensitive attunement to their symptoms in order to pick up on early signs of exacerbation and avoid exceeding their limits.

But according to the hypothesis behind the PACE study, this approach is all wrong. Because the investigators believe physical deconditioning rather than an organic disease perpetuated the many symptoms, they theorized that the key to getting better was to deliberately exceed current limits, gradually training the body to adapt to greater levels of activity. Rather than being sensitively attuned to symptoms, patients should ignore them, on the grounds that they have become obsessed about sensations most people would consider normal. Any increase in symptoms from exertion was explained as expected, transient and unimportant, the result of the body’s current state of weakness, not an underlying disease.

Many patients in the UK have tested this theory, since graded exercise therapy, or GET, is one of the few therapies available to patients there. And patient reports on the approach are very, very bad. In May 2015, the ME Association, a British charity, released a survey of patients’ experiences with GET, cognitive behavioral therapy, and pacing. The results suggested that GET was far and away the most dangerous. Of patients who received GET, 74 percent said that it had made them worse. In contrast, 18 percent said they were worse after cognitive behavior therapy and only 14 percent after pacing.

The survey is filled with reports similar to this one: My condition deteriorated significantly, becoming virtually housebound, spending most of my day in bed in significant pain and with extreme fatigue.

Anecdotal reports, however, don’t provide the proof of a randomized clinical trial. So this was one of the central issues at stake in the PACE study: Is it safe for patients to increase their activity on a set schedule while ignoring their symptoms?

* * * * * *

In the 2011 Lancet article with the first PACE results, the researchers reported that eight percent of all participants experienced a serious deterioration and less than two percent experienced a serious adverse reaction over the course of the year, without significant differences between the arms of the trial.

For patients to have a serious deterioration, their physical function score needed to drop by at least 20 points and they needed to report that their overall health was much worse or very much worse at two consecutive assessment periods (out of a total of three).

To have a serious adverse reaction, the patient needed to experience a persistent severe, i.e. significant deterioration, which was not defined, or to experience a major health-related event, such as a hospitalization or even death. Furthermore, a doctor needed to determine that the event was directly caused by the treatment, a decision that was made after the doctor was told which arm of the trial the patient was in.

Subsequent safety results were published in a 2014 article in the Journal of Psychosomatic Research. And this paper revealed a critical detail unmentioned in the Lancet paper: the six centers around England participating in the study appear to have applied the methods for assessing safety differently. That raises questions about how to interpret the results and whether the overall claims of safety can be taken at face value.

Beyond that issue, a major problem with the PACE investigators’ reporting on harms from exercise is that it looks as though participants might not have actually done much exercise. While the researchers stated the ambitious goal that participants would exercise for at least 30 minutes five times a week, they gave no information on how much exercise participants in fact did.

The trial’s objective outcomes suggest it may not have been much. The exercise patients were only able to walk 11 percent further in a walking test at the end of the trial than patients who hadn’t exercised. Even with this minimal improvement, participants were still severely disabled, with a poorer performance than patients with chronic heart failure, severe multiple sclerosis, or chronic obstructive pulmonary disorder.

On top of that, almost a third of those in the exercise arm who finished other aspects of the trial never completed the final walking test; if they couldn’t because they were too sick, that would skew the results. In addition, the participants in GET showed no improvement at all on a step test designed to measure fitness. Presumably, if the trial’s theory that patients suffered from deconditioning was correct, participants who had managed to exercise should have become more fit and performed better on these tests.

Tom Kindlon, a long-time patient and an expert on the clinical research, suggests that even if those in the exercise arm performed more graded exercise under the guidance of trial therapists, they may have simply cut back on other activities to compensate, as has been found in other studies of graded activity. He also notes that the therapists in the trial were possibly more cautious than therapists in everyday practice.

In the PACE Trial, there was a much greater focus on the issue of safety [than in previous studies of graded activity], with much greater monitoring of adverse events, says Kindlon, who published an analysis of the reporting of harms from trials of graded activity in ME/CFS, including PACE. In this scenario, it seems quite plausible that those running the trial and the clinicians would be very cautious about pushing participants to keep exercising when they had increased symptoms, as this could increase the chances the patients would say such therapies caused adverse events.

* * * * * *

Had the investigators stuck to their original plan, we would have more evidence to evaluate participants’ activity levels. Originally, participants were going to wear a wristwatch-sized ankle band called an actometer, similar to a FitBit, that would measure how many steps they took for a week at the beginning of the trial and for a week at the end.

A substantial increase in the number of steps over the course of the trial would have definitively established both that participants were exercising and that they weren’t decreasing other activity in order to do so.

But in reviewing the PACE Trial protocol, which was published in 2007, Kindlon noticed, to his surprise, that the researchers had abandoned this plan. Instead, they were asking participants to wear the actometers only at the beginning of the trial, but not at the end. Kindlon posted a comment on the journal’s website questioning this decision. He pointed out that in previous studies of graded activity, actometer measurements showed that patients were not moving more, even if they reported feeling better. Hence, the exercise program in that case in fact did not raise their overall activity levels.

In a posted response, White and his colleagues explained that they decided that a test that required participants to wear an actometer around their ankle for a week was too great a burden at the end of the trial. However, they had retained the actometer as a baseline measure, they wrote, to test as a moderator of outcome, that is, to determine factors that predicted which participants improved. The investigators also noted that the trial contained other objective outcome measures. (They subsequently dismissed the relevance of these objective measures after they failed to demonstrate efficacy.)

That answer didn’t make sense to Kindlon. They clearly don’t find it that great a burden that they drop it altogether as it is being used on patients before the start, he wrote in a follow-up comment. If they feel it was that big of a burden, it should probably have been dropped altogether.

* * * * *

The other major flaws that make it impossible to assess the validity of their safety claims are related to those that affected the PACE trial as a whole. In particular, problems related to four issues affected their methods for reporting harms: the case definition, changes in outcome measures after the trial began, lack of blinding, and encouraging participants to discount symptoms in a trial that relied on subjective endpoints.

First, the study’s primary case definition for identifying participants, called the Oxford criteria, was extremely broad; it required only six months of medically unexplained fatigue, with no other symptoms necessary. Indeed, 16% of the participants didn’t even have exercise intolerance, now recognized as the primary symptom of ME/CFS, and hence would not be expected to suffer serious exacerbations from exercise. The trial did use two additional case definitions to conduct sub-group analyses, but they didn’t break down the results on harms by the definition used. So we don’t know if the participants who met one of the more stringent definitions suffered more setbacks due to exercise.

Second, after the trial began, the researchers tightened their definition of harms, just as they had relaxed their methods of assessing improvement. In the protocol, for example, a steep drop in physical function since the previous assessment, or a slight decline in reported overall health, both qualified as a serious deterioration. However, as reported in The Lancet, the steep drop in physical function had to be sustained across two out of the trial’s three assessments rather than just since the previous one. And reported overall health had to be much worse or very much worse, not just slightly worse. The researchers also changed their protocol definition of a serious adverse reaction, making it more stringent.

The third problem was that the study was unblinded, so both participants and therapists knew the treatment being administered. Many participants were probably aware that the researchers themselves favored graded exercise therapy and another treatment, cognitive behavior therapy, which also involved increasing activity levels. Such information has been shown in other studies to lead to efforts to cooperate, which in this case could lead to lowered reporting of harms.

And finally, therapists were explicitly instructed to urge patients in the graded exercise and cognitive behavioral therapy arms to consider increased symptoms as a natural response to increased activity, a direct encouragement to downplay potential signals of physiological deterioration. Since the researchers were relying on self-reports about changes in functionality to assess harms, these therapeutic suggestions could have influenced the outcomes.

Clinicians or patients cannot take from this trial that it is safe to undertake graded exercise programs, Kindlon says. We simply do not know how much activity was performed by individual participants in this trial and under what circumstances; nor do we know what was the effect on those that did try to stick to the programs.

Correction: The original text stated that the Institute of Medicine report came out “this” year; that was accurate when it was written in late December but inaccurate by the time of publication.

Start typing and press enter to search