Trial By Error, Continued: Why has the PACE Study’s “Sister Trial” been “Disappeared” and Forgotten?

By David Tuller, DrPH

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley.

In 2010, the BMJ published the results of the Fatigue Intervention by Nurses Evaluation, or FINE. The investigators for this companion trial to PACE, also funded by the Medical Research Council, reported no benefits to ME/CFS patients from the interventions tested.

 In medical research, null findings often get ignored in favor or more exciting “positive” results. In this vein, the FINE trial seems to have vanished from the public discussion over the controversial findings from the PACE study. I thought it was important to re-focus some attention on this related effort to prove that “deconditioning” is the cause of the devastating symptoms of ME/CFS. (This piece is also too long but hopefully not quite as dense.)

An update on something else: I want to thank the public relations manager from Queen Mary University of London for clarifying his previous assertion that I did not seek comment from the PACE investigators before Virology Blog posted my story. In an e-mail, he explained that he did not mean to suggest that I hadn’t contacted them for interviews. He only meant, he wrote, that I hadn’t sent them my draft posts for comment before publication. He apologized for the misunderstanding.

I accept his apology, so that’s the end of the matter. In my return e-mail, however, I did let him know I was surprised at the expectation that I might have shared the draft with the PACE investigators before publication. I would not have done that whether or not they had granted me interviews. This is journalism, not peer-review. Different rules.

************************************************************************

In 2003, with much fanfare, the U.K. Medical Research Council announced that it would fund two major studies of non-pharmacological treatments for chronic fatigue syndrome. In addition to PACE, the agency decided to back a second, smaller study called “Fatigue Intervention by Nurses Evaluation,” or FINE. Because the PACE trial was targeting patients well enough to attend sessions at a medical clinic, the complementary FINE study was designed to test treatments for more severely ill patients.

(Chronic fatigue syndrome is also known as myalgic encephalomyelitis, CFS/ME, and ME/CFS, which has now been adopted by U.S. government agencies. The British investigators of FINE and PACE prefer to call it chronic fatigue syndrome, or sometimes CFS/ME.)

Alison Wearden, a psychologist at the University of Manchester, was the lead FINE investigator. She also sat on the PACE Trial Steering Committee and wrote an article about FINE for one of the PACE trial’s participant newsletters. The Medical Research Council and the PACE team referred to FINE as PACE’s “sister” trial. The two studies included the same two primary outcome measures, self-reported fatigue and physical function, and used the same scales to assess them.

The FINE results were published in BMJ in April, 2010. Yet when the first PACE results were published in The Lancet the following year, the investigators did not mention the FINE trial in the text. The trial has also been virtually ignored in the subsequent public debate over the results of the PACE trial and the effectiveness, or lack thereof, of the PACE approach.

What happened? Why has the FINE trial been “disappeared”?

*****

The main goal of the FINE trial was to test a treatment for homebound patients that adapted and combined elements of cognitive behavior therapy and graded exercise therapy, the two rehabilitative therapies being tested in PACE. The approach, called “pragmatic rehabilitation,” had been successfully tested in a small previous study. In FINE, the investigators planned to compare “pragmatic rehabilitation” with another intervention and with standard care from a general practitioner.

Here’s what the Medical Research Council wrote about the main intervention in an article in its newsletter, MRC Network, in the summer of 2003: “Pragmatic rehabilitation…is delivered by specially trained nurses, who give patients a detailed physiological explanation of symptom patterns. This is followed by a treatment programme focussing on graded exercise, sleep and relaxation.”

The second intervention arm featured a treatment called “supportive listening,” a patient-centered and non-directive counseling approach. This treatment presumed that patients might improve if they felt that the therapist empathized with them, took their concerns seriously, and allowed them to find their own approach to addressing the illness.

The Medical Research Council committed 1.3 million pounds to the FINE trial. The study was conducted in northwest England, with 296 patients recruited from primary care. Each intervention took place over 18 weeks and consisted of ten sessions–five home visits lasting up to 90 minutes alternating with five telephone conversations of up to 30 minutes.

As in the PACE trial, patients were selected using the Oxford criteria for chronic fatigue syndrome, defined as the presence of six months of medically unexplained fatigue, with no other symptoms required. The Oxford criteria have been widely criticized for yielding heterogeneous samples, and a report commissioned by the National Institutes of Health this year recommended by the case definition be “retired” for that reason.

More specific case definitions for the illness require the presence of core symptoms like post-exertional malaise, cognitive problems and sleep disorders, rather than just fatigue per se. Because the symptom called post-exertional malaise means that patients can suffer severe relapses after minimal exertion, many patients and advocacy organizations consider increases in activity to be potentially dangerous.

To be eligible for the FINE trial, participants needed to score 70 or less out of 100 on the physical function scale, the Medical Outcomes Study 36-Item Short Form Health Survey, known as the SF-36. They also needed to score a 4 or more out of 11 on the 11-item Chalder Fatigue Scale, with each item scored as either 0 or 1. On the fatigue scale, a higher score indicated greater fatigue.

Among other measures, the trial also included a key objective outcome–the “time to take 20 steps, (or number of steps
taken, if this is not achieved) and maximum heart rate reached on a step-test.”

Participants were to be assessed on these measures at 20 weeks, which as right after the end of the treatment period, and again at 70 weeks, which was one year after the end of treatment. According to the FINE trial protocol, published in the journal BMC Medicine in 2006, “short-term assessments of outcome in a chronic health condition such as CFS/ME can be misleading” and declared the 70-week assessment to be the “primary outcome point.”

*****

The theoretical model behind the FINE trial and pragmatic rehabilitation paralleled the PACE concept. The physical symptoms were presumed to be the result not of a pathological disease process but of “deconditioning” or “dysregulation” caused by sedentary behavior, accompanied by disrupted sleep cycles and stress. The sedentary behavior was itself presumed to be triggered by patients’ “unhelpful’ conviction that they suffered from a progressive medical illness. Counteracting the deconditioning involved re-establishing normal sleep cycles, reducing anxiety levels and gently increasing physical exertion, even if patients remained homebound.

“The treatment [pragmatic rehabilitation] is based on a model proposing that CFS/ME is best understood as a consequence of physiological dysregulation associated with inactivity and disturbance of sleep and circadian rhythms,” stated the FINE trial protocol. “We have argued that these conditions…are often maintained by illness beliefs that lead to exercise-avoidance. The essential feature of the treatment is the provision of a detailed explanation for patients’ symptoms, couched in terms of the physiological dysregulation model, from which flows the rationale for a graded return to activity.”

On the FINE trial website, a 2004 presentation about pragmatic rehabilitation explained the illness in somewhat simpler terms, comparing it to “very severe jetlag.” After explaining how and why pragmatic rehabilitation led to physical improvement, the presentation offered this hopeful message, in boldface: “There is no disease–you have a right to full health. This is a good news diagnosis. Carefully built up exercise can reverse the condition. Go for 100% recovery.”

In contrast, patients, advcoates and many leading scientists have completely rejected the PACE and FINE approach. They believe the evidence overwhelmingly points to an immunological and neurological disorder triggered by an initial infection or some other physiological insult. Last month, the National Institutes of Health ratified this perspective when it announced a major new push to seek biomedical answers to the disease, which it refers to as ME/CFS.

As in PACE, patients in the FINE trial were issued different treatment manuals depending upon their assigned study arm. The treatment manual for pragmatic rehabilitation repeatedly informed participants that the therapy could help them get better—even though the trial itself was designed to test the effectiveness of the therapy. (In the PACE trial, the manuals for the cognitive behavior and graded therapy arms also included many statements promoting the idea that the therapies could successfully treat the illness.)

“This booklet has been written with the help of patients who have made a full recovery from Chronic Fatigue Syndrome,” stated the FINE pragmatic rehabilitation manual on its second page. “Facts and information which were important to them in making this recovery have been included.” The manual noted that the patients who helped write it had been treated at the Royal Liverpool University Hospital but did not include more specific details about their “full recovery” from the illness.

Among the “facts and information” included in the manual were assertions that the trial participants, contrary to what they might themselves believe, had no persistent viral infection and “no underlying serious disease.” The manual promised them that pragmatic rehabilitation could help them overcome the illness and the deconditioning perpetuating it. “Instead of CFS controlling you, you can start to regain control of your body and your life,” stated the manual.

Finally, as in PACE, participants were encouraged to change their beliefs about their condition by “building the right thoughts for your recovery.” Participants were warned that “unhelpful thoughts”—such as the idea that continued symptoms indicated the presence of an organic disease and could not be attributed to deconditioning—“can put you off parts of the treatment programme and so delay or prevent recovery.”

The supportive listening manual did not similarly promote the idea that “recovery” from the illness was possible. During the sessions, the manual explained, “The listener, your therapist, will provide support and encourage you to find ways to cope by using your own resources to change, manage or adapt to difficulties…She will not tell you what to do, advise, coach or direct you.”

*****

A qualitative study about the challenges of the FINE research process, published by the investigators in the journal Implementation Science in 2011, shed light on how much the theoretical framework and the treatment approaches frustrated and angered trial participants. According to the interviews with some of the nurses, nurse supervisors, and participants involved in FINE, the home visits often bristled with tension over the different perceptions of what caused the illness and which interventions could help.

“At times, this lack of agreement over the nature of the condition and lack of acceptance as to the rationale behind the treatment led to conflict,” noted the FINE investigators in the qualitative paper. “A particularly difficult challenge of interacting with patients for the nurses and their supervisors was managing patients’ resistance to the treatment.”

One participant in the pragmatic rehabilitation arm, who apparently found it difficult to do what was apparently expected, attributed this resistance to the insistence that deconditioning caused the symptoms and that activity would reverse them. “If all that was standing between me and recovery was the reconditioning I could work it out and do it, but what I have got is not just a reconditioning problem,” the participant said. “I have got something where there is damage and a complete lack of strength actually getting into the muscles and you can’t work with what you haven’t got in terms of energy.”

Another participant in the pragmatic rehabilitation arm was more blunt. “I kept arguing with her [the nurse administering the treatment] all the time because I didn’t agree with what she said,” said the participant, who ended up dropping out of the trial.

Some participants in the supportive listening arm also questioned the value of the treatment they were receiving, according to the study. “I mostly believe it was more physical than anything else, and I didn’t see how talking could truthfully, you know, if it was physical, do anything,” said one.

In fact, the theoretical orientation also alienated some prospective participants as well, according to interviews the investigators conducted with some patients who declined to enter the trial. ‘It [the PR intervention] insisted that physiologically there was nothing wrong,” said one such patient. “There was nothing wrong with my glands, there was nothing wrong, that it was just deconditioned muscles. And I didn’t believe that…I can’t get well with treatment you don’t believe in.”

When patients challenged or criticized the therapeutic interventions, the study found, nurses sometimes felt their authority and expertise to be under threat. “They are testing you all the time,” said one nurse. Another reported: “That anger…it’s very wearing and demoralizing.”

One nurse remembered the difficulties she faced with a particular participant. “I used to go there and she would totally block me, she would sit with her arms folded, total silence in the house,” said the nurse. “It was tortuous for both of us.”

At times, nurses themselves responded to these difficult interactions with bouts of anger directed at the participants, according to a supervisor.

“Their frustration has reached the point where they sort of boiled over,” said the supervisor. “There is sort of feeling that the patient should be grateful and follow your advice, and in actual fact, what happens is the patient is quite resistant and there is this thing like you know, ‘The bastards don’t want to get better.’”

*****

BMJ published the FINE results in 2010. The FINE investigators found no statistically significant benefits to either pragmatic rehabilitation or supportive listening at 70 weeks. Despite these null findings one year after the end of the 18-week course of treatment, the mean scores of those in the pragmatic rehabilitative arm demonstrated at 20 weeks a “clinically modest” but statistically significant reduction in fatigue—a drop of one point (plus a little) on the 11-point fatigue scale. The slight improvement still meant that participants were much more fatigued than the initial entry threshold for disability, and any benefits were no longer statistically significant by the final assessment.

Despite the null findings at 70 weeks, the authors put a positive gloss on the results, reporting first in the abstract that fatigue was “significantly improved” at 20 weeks. Given the very modest one-point change in average fatigue scores, perhaps the FINE investigators intended to report instead that there was a “statistically significant improvement” at 20 weeks—an accurate phrase with a somewhat different meaning.

The abstract included another interesting linguistic element. While the trial protocol had designated the 70-week assessment as “the primary outcome point,” the abstract of the paper itself now stated that “the primary clinical outcomes were fatigue and physical functioning at the end of treatment (20 weeks) and 70 weeks from recruitment.”

After redefining their primary outcome points to include the 20-week as well as the 70-week assessment, the abstract promoted the positive effects found at the earlier point as the study’s main finding. Only after communicating the initial benefits did they note that these advantages for pragmatic rehabilitation later wore off. The FINE paper cited no oversight committee approval for this expanded interpretation of the trial’s primary outcome points to include the 20-week assessment, nor did it mention the protocol’s caveat about the “misleading” nature of short-term assessments in chronic health conditions.

In fact, within the text of the paper, the investigators noted that the “pre-designated outcome point” was 70 weeks. But they did not explain why they then decided to highlight most in the abstract what was not the pre-designated but instead a post-hoc “primary” outcome point—the 20-week assessment.

A BMJ editorial that accompanied the FINE trial also accentuated the positive results at 20 weeks rather than the bad news at 70 weeks. According to the editorial’s subhead, pragmatic rehabilitation “has a short term benefit, but supportive listening does not.” The editorial did not note that this was not the pre-designated primary outcome point. The null results for that outcome point—the 70-week assessment—were not mentioned until later in the editorial.

*****

Patients and advocates soon began criticizing the study in the “rapid response” section of the BMJ website, citing its theoretical framework, the use of the broad Oxford criteria as a case definition, and the failure to provide the step-test outcomes, among other issues.

“The data provide strong evidence that the anxiety and deconditioning model of CFS/ME on which the trial is predicated is either wrong or, at best, incomplete,” wrote one patient. “These results are immensely important because they demonstrate that if a cure for CFS/ME is to be found, one must look beyond the psycho-behavioural paradigm.”

Another patient wrote that the study was “a wake-up call to the whole
of the medical establishment” to take the illness seriously. One predicted “that there will those who say that the this trial failed because
the patients were not trying hard enough.”

A physician from Australia sought to defend the interests not of patients but of the English language, decrying the lack of hyphens in the paper’s full title: “Nurse led, home based self help treatment for patients in primary care with chronic fatigue syndrome: randomised controlled trial.”

“The hyphen is a coupling 
between carriages of words to ensure unambiguous
 transmission of thought,” wrote the doctor. “Surely this should read ‘Nurse-led, home-based, self-
help…’

“Lest English sink further into the Great Despond of 
ambiguity and non-sense [hyphen included in the original comment], may I implore the co-editors of
the BMJ to be the vigilant watchdogs of our mother tongue
 which at the hands of a younger ‘texting’ generation is heading towards anarchy.” [The original comment did not include the expected comma between ‘tongue’ and ‘which.’]

*****

In a response on the BMJ website a month after publishing the study, the FINE investigators reported that they had conducted a post-hoc analysis with a different kind of scoring for the Chalder Fatigue Scale.

Instead of scoring the answers as 0 or 1 using what was called a bimodal scale, they rescored them using what was called a continuous scale, with values ranging from 0 to 3. The full range of possible scores now ran from 0 to 33, rather than 0 to 11. (As collected, the data for the Chalder Fatigue Scale allowed for either scoring system; however, the original entry criteria of 4 on the bimodal scale would translate into a range from 4 to as high as 19 on the revised scale.)

With the revised scoring, they now reported a “clinically modest, but statistically significant effect” of pragmatic rehabilitation at 70 weeks—a reduction from baseline of about 2.5 points on the 0 to 33 scale. This final score represented some increase in fatigue from the 20-week interim assessment point.

In their comment on the website, the FINE investigators now reaffirmed that the 70-week assessment was “our primary outcome point.” This statement conformed to the protocol but differed from the suggestion in the BMJ paper that the 20-week results also represented “primary” outcomes. Given that the post-hoc rescoring allowed the investigators to report statistically significant results at the 70-week endpoint, this zig-zag back to the protocol language was perhaps not surprising.

In their comment, the FINE investigators also explained that they did not report their step-test results—their one objective measure of physical capacity–“due to a significant amount of missing data.” They did not provide an explanation for the missing data. (One obvious possible reason for missing data on an objective fitness test is that participants were too disabled to perform it at all.)

The FINE investigators did not address the question of whether the title of their paper should have included hyphens.

In the rapid comments, Tom Kindlon, a patient and advocate from a Dublin suburb, responded to the FINE investigators’ decision to report their new post-hoc analysis of the fatigue scale. He noted that the investigators themselves had chosen the bimodal scoring system for their study rather than the continuous method.

“I’m
 sure many pharmacological and non-pharmacological studies could look
 different if investigators decided to use a different scoring method or
scale at the end, if the results weren’t as impressive as they’d hoped,” he wrote. “But that is not normally how medicine works. So, while it is interesting
 that the researchers have shared this data, I think the data in the main
paper should be seen as the main data.”

*****

The FINE investigators have published a number of other papers arising from their study. In a 2013 paper on mediators of the effects of pragmatic rehabilitation, they reported that there were no differences between the three groups on the objective measure of physical capacity, the step test, despite their earlier decision not to publish the data in the BMJ paper.

Wearden herself presented the trial as a high point of her professional career in a 2013 interview for the website of the University of Manchester’s School of Psychological Sciences. “I suppose the thing I did that I’m most proud of is I ran a large treatment trial of pragmatic rehabilitation treatment for patients with chronic fatigue syndrome,” she said in the interview. “We successfully carried that trial out and found a treatment that improved patients’ fatigue, so that’s probably the thing that I’m most proud of.”

The interview did not mention that the improvement at 20 weeks was transient until the investigators performed a post-hoc-analysis and rescored the fatigue scale.

*****

The Science Media Centre, a self-styled “independent” purveyor of information about science and scientific research to journalists, has consistently shown an interest in research on what it calls CFS/ME. It held a press briefing for the first PACE results published in The Lancet in 2011, and has helped publicize the release of subsequent studies from the PACE team.

However, the Science Media Centre does not appear to have done anything to publicize the 2010 release of the FINE trial, despite its interest in the topic. A search of the center’s website for the lead FINE investigator, Alison Wearden, yielded no results. And a search for CFS/ME indicated that the first study embraced by the center’s publicity machine was the 2011 Lancet paper.

That might help explain why the FINE trial was virtually ignored by the media. A search on the LexisNexis database for “PACE trial” and “chronic fatigue syndrome” yielded 21 “newspaper” articles (I use the “apostrophes” here because I don’t know if that number includes articles on newspaper websites that did not appear in the print product; the accuracy of the number is also in question because the list did not include two PACE-related articles that I wrote for The New York Times).

Searches on the database combining “chronic fatigue syndrome” with either “FINE trial” or “pragmatic rehabilitation” yielded no results. (I used the version of LexisNexis Academic available to me through the University of California library system.)

Other researchers have also paid scant attention to the FINE trial, especially when compared to the PACE study. According to Google Scholar, the 2011 PACE paper in The Lancet has been cited 355 times. In contrast, the 2010 FINE paper in BMJ has only been cited 39 times.

*****

The PACE investigators likely exacerbated this virtual disappearance of the FINE trial by their decision not to mention it in their Lancet paper, despite its longstanding status as a “sister trial” and the relevance of the findings to their own study of cognitive behavior therapy and graded exercise therapy. The PACE investigators have not explained their reasons for ignoring the FINE trial. (I wrote about this lapse in my Virology Blog story, but in their response the PACE investigators did not mention it.)

This absence is particularly striking in light of the decision made by the PACE investigators to drop their protocol method of assessing the Chalder Fatigue Scale. In the protocol, their primary fatigue outcome was based on bimodal scoring on the 11-item fatigue scale. The protocol included continuous scoring on the fatigue scale, with the 0 to 33 scale, as a secondary outcome.

In the PACE paper itself, the investigators announced that they had dropped the bimodal scoring in favor of the continuous scoring “to more sensitively test our hypotheses of effectiveness.” They did not explain why they simply didn’t provide the findings under both scoring methods, since the data as collected allowed for both analyses. They also did not cite any references to support this mid-trial decision, nor did they explain what prompted it.

They certainly did not mention that PACE’s “sister” study, the FINE trial, had reported null results at the 70-week endpoint—that is, until the investigators rescored the data using a continuous scale rather than the bimodal scale used in the original paper.

The three main PACE investigators—psychiatrist Peter White and Michael Sharpe, and behavioral psychologist Trudie Chalder—did not respond to an e-mail request for comment on why their Lancet paper did not mention the FINE study, especially in reference to their post-hoc decision to change the method of scoring the fatigue scale. Lancet editor Richard Horton also did not respond to an e-mail request for an interview on whether he believed the Lancet paper should have included information about the FINE trial and its results.

*****

Update 11/9/15 10:46 PM: According to a list of published and in-process papers on the FINE trial website, the main FINE study was rejected by The Lancet before being accepted by BMJ, suggesting that the journal was at least aware of the trial well before it published the PACE study. That raises further questions about the absence of any mention of FINE and its null findings in the text of the PACE paper.

Trial By Error, Continued: Did the PACE Study Really Adopt a ‘Strict Criterion’ for Recovery?

By David Tuller, DrPH

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley.

First, some comments: When Virology Blog posted my very, very, very long investigation of the PACE trial two weeks ago, I hoped that the information would gradually leak out beyond the ME/CFS world. So I’ve been overwhelmed by the response, to say the least, and technologically unprepared for my viral moment. I didn’t even have a photo on my Twitter profile until yesterday.

Given the speed at which events are unfolding, I thought it made sense to share a few thoughts, prompted by some of the reactions and comments and subsequent developments.

I approached this story as a journalist, not an academic. I read as much as I could and talked to a lot of people. I did not set out to write the definitive story about the PACE trial, document every single one of its many oddities, or credit everyone involved in bringing these problems to light. My goal was to explain what I recognized as some truly indefensible flaws in a clear, readable way that would resonate with scientists, public health and medical professionals, and others not necessarily immersed in the complicated history of this terrible disease.

To do that most effectively and maximize the impact, I had to find a story arc, some sort of narrative, to carry readers through 14,000 words and many dense explanations of statistical and epidemiologic concepts. After a couple of false starts, I settled on a patient and advocate, Tom Kindlon, as my “protagonist”—someone readers could understand and empathize with. Tom is smart, articulate, and passionate about good science–and he knows the PACE saga inside out. He was a terrific choice whose presence in the story, I think, made reading it a lot more bearable.

That decision in no way implied that Tom was the only possible choice or even the best possible choice. I built my work on the work of others, including many that James Coyne recently referred to as “citizen-scientists.” Tom’s dedication to tracking and critiquing the research has been heroic, given his health struggles. But the same could be said, and should be said, of many others who have fought to raise awareness about the problems with PACE since the trial was announced in 2003.

The PACE study has generated many peer-reviewed publications and a healthy paper trail. My account of the story, notwithstanding its length, has significant gaps. I haven’t finished writing about PACE, so I hope to fill in some of them myself—as with today’s story on the 2011 Lancet commentary written by colleagues of Peter White, the lead PACE investigator. But I have no monopoly on this story, nor would I want one—the stakes are too high and too many years have already been wasted. Given the trial’s wealth of problems and its enormous influence and ramifications, there are plenty of PACE-related stories left for everyone to tackle.

I am, obviously, indebted to Tom—for his good humor, his willingness to trust me given so many unfair media portrayals of ME/CFS, and his patience when I peppered him with question after question via Facebook, Twitter, and e-mail.

I am also indebted to my friend Valerie Eliot Smith. We met when I began research on this project in July, 2014; since then, she has become an indispensible resource, offering transatlantic support across multiple domains. Valerie has given me invaluable legal counsel, making sure that what I was writing was verifiable and, just as important, defendable—especially in the U.K. (I don’t want to know how many billable hours she has invested!) She has provided keen strategic advice. She has been a terrific editor, whose input greatly improved the story’s flow and readability. She has done all this, I realize, at some risk to her own health. I am lucky she decided to join me on this unexpected journey.

I would like to thank, as well, Dr. Malcolm Hooper, Margaret Williams, Dr. Nigel Speight, Dr. William Weir, Natalie Boulton, Lois Addy, and the Countess of Mar for their help and hospitality while I was in England researching the story last year. I will always cherish the House of Lords plastic bag that I received from the Countess. (The bag was stuffed with PACE-related reports and documents.)

So far, Richard Horton, the editor of The Lancet, has not responded to the criticisms documented in my story. As for the PACE investigators, they provided their own response last Friday on Virology Blog, followed by my rebuttal.

In seeking that opportunity for the PACE investigators to respond, a public relations representative from Queen Mary University of London, or QMUL, had approached Virology Blog. In e-mails to Dr. Racaniello, the public relations representative had suggested that “misinformation” and “inaccuracies” in my article had triggered social media “abuse” and could cause “reputational damage.”

These are serious charges, not to be taken lightly. Last Friday’s exchange has hopefully put an end to such claims. It seems unlikely that calling rituximab an “anti-inflammatory” rather than an “immunomodulatory” drug would trigger social media abuse or cause reputational damage.

Last week, in an effort to expedite Virology Blog’s publication of the PACE investigators’ response, the QMUL public relations representative further charged that I had not sought their input before the article was posted. This accusation goes to the heart of my professional integrity as a journalist. It is also untrue—as the public relations representative would have known had he read my piece or talked to the PACE investigators themselves. (Whether earlier publication of their response would have helped their case is another question.)

Disseminating false information to achieve goals is not usually an effective PR strategy. I have asked the QMUL public relations representative for an explanation as to why he conveyed false information to Dr. Racaniello in his attempt to advance the interests of the PACE investigators. I have also asked for an apology.


 

Since 2011, the PACE investigators have released several papers, repeatedly generating enthusiastic news coverage about the possibility of “recovery”–coverage that has often drawn conclusions beyond what the publications themselves have reported.

The PACE researchers can’t control the media and don’t write headlines. But in at least one case, their actions appeared to stimulate inaccurate media accounts–and they made no apparent effort immediately afterwards to correct the resulting international coverage. The misinformation spread to medical and public health journals as well.

(I mentioned this episode, regarding the Lancet “comment” that accompanied the first PACE results in 2011, in my excruciatingly long series two weeks ago on Virology Blog. However, that series focused on the PACE study, and the comment itself raised additional issues that I did not have the chance to explore. Because the Lancet comment had such an impact on media coverage, and ultimately most likely on patient care, I felt it was important to return to it.)

The Lancet comment, written by Gils Bleijenberg and Hans Knoop from the Expert Centre for Chronic Fatigue at Radboud University Nijmegen in the Netherlan was called “Chronic fatigue syndrome: where to PACE from here?” It reported that 30 percent of those receiving the two rehabilitative interventions favored by the PACE investigators–cognitive behavior therapy and graded exercise therapy–had “recovered.” Moreover, these participants had “recovered” according to what the comment stated was the “strict criterion” used by the PACE study itself.

Yet the PACE investigators themselves did not make this claim in their paper. Rather, they reported that participants in the two rehabilitative arms were more likely to improve and to be within what they referred to as “the normal range” for physical function and fatigue, the study’s two primary outcome measures. (“Normal range” is a statistical concept that has no inherent connection to “normal functioning” or “recovery.” More on that below.)

In addition, the comment did not mention that 15 percent of those receiving only the baseline condition of “specialist medical care” also “recovered” according to the same criterion. Thus, only half of this 30 percent “recovery” rate could actually be attributed to the interventions.

The PACE investigators themselves reviewed the comment before publication.

Thanks to this inaccurate account of the PACE study’s reported findings, the claim of a 30 percent “recovery” rate dominated much of the news coverage. Trudie Chalder, one of the key PACE investigators, reinforced the message of the Lancet comment when she declared at the press conference announcing the PACE results that participants in the two rehabilitative interventions got “back to normal.”

Just as the PACE paper did not report that anyone had “recovered,” it also did not report that anyone got “back to normal.”

Three months later, the PACE authors acknowledged in correspondence in The Lancet that the paper did not discuss “recovery” at all and that they would be presenting “recovery” data in a subsequent paper. They did not explain, however, why they had not taken earlier steps to correct the apparently inaccurate news coverage about how patients in the trial had “recovered” and gotten “back to normal.”

*****

It is not unusual for journals, when they publish studies of significance, to also commission commentaries or editorials that discuss the implications of the findings. It is also not unusual for colleagues of a study’s authors to be asked to write such commentaries. In this case, Bleijenberg and Knoop were colleagues of Peter White, the lead PACE investigator.  In 2007, the three had published, along with two other colleagues, a paper called “Is a full recovery possible after cognitive behavior therapy for chronic fatigue syndrome?” in the journal Psychotherapy and Psychosomatics.

(In their response last Friday to my Virology Blog story, the PACE investigators noted that they had published a “correction” to clarify that the 2011 Lancet paper was not about “recovery”; presumably, they were referring to the Lancet correspondence three months later. In their response to Virology Blog, they blamed the misconception on an “editorial…written by others.” But they did not mention that those “others” were White’s colleagues. In their response, they also did not explain why they did not “correct” this “recovery” claim during their pre-publication review of the comment, nor why Chalder spoke at the press conference of participants getting “back to normal.”)

In the Lancet comment, Bleijenberg and Knoop hailed the PACE team for its work. And here’s what they wrote about the trial’s primary outcome measures for physical function and fatigue: “PACE used a strict criterion for recovery: a score on both fatigue and physical function within the range of the mean plus (or minus) one standard deviation of a healthy person’s score.”

This statement was problematic for a number of reasons. Given that the PACE paper itself made no claims for “recovery,” Bleijenberg and Knoop’s assertion that it “used” any criterion for “recovery” at all was false. The PACE study protocol had outlined four specific criteria that constituted what the investigators referred to as “recovery.” Two of them were thresholds on the physical function and fatigue measures, but the Lancet paper did not present data for the other criteria and so could not report “recovery” rates.

Instead, the Lancet paper reported the rates of participants in all the groups who finished the study within what the researchers referred to as “the normal ranges” for physical function and fatigue. But as noted immediately by some in the patient community, these “normal ranges” featured a bizarre paradox: the thresholds for being “within the normal range” on both the physical function and fatigue scales indicated worse health than the entry thresholds required to demonstrate enough disability to qualify for the trial in the first place.

*****

To many patients and other readers, for the Lancet comment to refer to “normal range” scales in which entry and outcome criteria overlapped as a “strict criterion for recovery” defied logic and common sense. (According to data not included in the Lancet paper but obtained later by a patient through a freedom-of-information request, 13 percent of the total sample was already “within normal range” for physical function, fatigue or both at baseline, before any treatment began.)

In the Lancet comment, Bleijenberg and Knoop also noted that these “normal ranges” were based on “a healthy person’s score.” In other words, the “normal ranges” were purportedly derived from responses to the physical function and fatigue questionnaires by population-based samples of healthy people.

But this statement was also at odds with the fact. The source for the fatigue scale was a population of attendees at a medical practice—a population that could easily have had more health issues than a sample from the general population. And as the PACE authors themselves acknowledged in the Lancet correspondence several months after the initial publication, the SF-36 population-based scores they used to determine the physical function “normal range” were from an “adult” population, not the healthier, working-age population they had inaccurately referred to in The Lancet. (An “adult” population includes the elderly.)

The Lancet has never corrected this factual mistake in the PACE paper itself. The authors had described—inaccurately–how they derived a key outcome for one of their two primary measures. This error indisputably made the results appear better than they were, but only those who scrutinized the correspondence were aware of this discrepancy.

The Lancet comment, like the Lancet paper itself, has also never been corrected to indicate that the source population for the SF-36 responses was not a “healthy” population after all, but an “adult” one that included many elderly. The comment’s parallel claim that the source population for the fatigue scale “normal range” was “healthy” as well has also not been corrected.

Richard Horton, the editor of The Lancet, did not respond to a request for an interview to discuss whether he agreed that the “normal range” thresholds represented “a strict criterion for recovery.” Peter White, Trudie Chalder and Michael Sharpe, the lead PACE investigators, and Gils Bleijenberg, the lead author of the Lancet comment, also did not respond to requests for interviews for this story.

*****

How did the PACE study end up with “normal ranges” in which participants could get worse and still be counted as having achieved the designated thresholds?

Here’s how: The investigators committed a major statistical error in determining the PACE “normal ranges.” They used a standard statistical formula designed for normally distributed populations — that is, populations in which most people score somewhere in the middle, with the rest falling off evenly on each side. When normally distributed populations are graphed, they form the classic bell curve. In PACE, however, the data they were analyzing was far from normally distributed. The population-based responses to the physical function and fatigue questionnaires were skewed—that is, clustered toward the healthy end rather than symmetrically spread around a mean value.

With a normally distributed set of data, a “normal range” using the standard formula used in PACE—taking the mean, plus/minus one standard deviation–contains 68 percent of the values. But when the values are clustered toward one end, as in the source populations for physical function and fatigue, a larger percentage ends up being included in a “normal range” calculated using this same formula. Other statistical methods can be used to calculate 68 percent of the values when a dataset does not form a normal distribution.

If the standard formula is used on a population-based survey with scores clustered toward the healthier end, the result is an expanded “normal range” that pushes the lower threshold even lower, as happened with the PACE physical function scale. And in PACE, the threshold wasn’t just low–it was lower than the score required for entry into the trial. This score, of course, already represented severe disability, not “recovery” or being “back to normal”—and certainly not a “strict criterion” for anything.

Bleijenberg and Knoop, the comment authors, were themselves aware of the challenges faced in calculating accurate “normal ranges,” since the issue was addressed in the 2007 paper they co-wrote with Peter White. In this paper, White, Bleijenberg, and Knoop discussed the concerns related to determining a “normal range” from population data that was heavily clustered toward the healthy end of the scale. The paper noted that using the standard formula “assumed a normal distribution of scores” and generated different results under the “violation of the assumptions of normality.”

*****

Despite the caveats the three scientists included in this 2007 paper, Bleijenberg and Knoop’s 2011 Lancet comment did not mention these concerns about distortion arising from applying the standard statistical formula to values that were not normally distributed. (White and his colleagues also did not mention this problem in the PACE study itself.)

Moreover, the 2007 paper from White, Bleijenberg, and Knoop had identified a score of 80 on the SF-36 as representing “recovery”—a much higher “recovery” threshold than the SF-36 score of 60 that Bleijenberg and Knoop now declared to be a “strict criterion” In the Lancet comment, the authors did not mention this major discrepancy, nor did they explain how and when they had changed their minds about whether an SF-36 score of 60 or 80 best represented “recovery.” (In 2011, White and his colleagues also did not mention this discrepancy between the score for “recovery” in the 2007 paper and the much lower “normal range” threshold in the PACE paper.)

Along with the PACE paper, The Lancet comment caused an uproar in the patient and advocacy communities–especially since the claim that 30 percent of participants in the rehabilitative arms “recovered” per a “strict criterion” was widely disseminated.

The comment apparently caused some internal consternation at The Lancet as well. In an e-mail to Margaret Williams, the pseudonym for a longtime clinical manager in the National Health Service who had complained about the Lancet comment, an editor at the journal, Zoe Mullan, agreed that the reference to “recovery” was problematic.

“Yes I do think we should correct the Bleijenberg and Knoop Comment, since White et al explicitly state that recovery will be reported in a separate report,” wrote Mullan in the e-mail. “I will let you know when we have done this.”

No correction was made, however.

*****

In 2012, to press the issue, the Countess of Mar pursued a complaint about the comment’s claim of “recovery” with the (now-defunct) Press Complaints Commission, a regulatory body established by the media industry that was authorized to investigate the conduct of news organizations. The countess, who frequently championed the cause of the ME/CFS patient community in Parliament’s House of Lords, had long questioned the scientific basis of support of cognitive behavior therapy and graded exercise therapy, and she believed the Lancet’s comment’s claims of “recovery” contradicted the study itself.

In defending itself to the Press Complaints Commission, The Lancet acknowledged the earlier suggestion by a journal editor that the comment should be corrected.

“I can confirm that our editor of our Correspondence section, Zoe Mullan, did offer her personal opinion at the time, in which she said that she thought that we should correct the Comment,” wrote Lancet deputy editor Astrid James to the Press Complaints Commission, in an e-mail.

“Zoe made a mistake in not discussing this approach with a more senior member of our editorial team,” continued James in the e-mail. “Now, however, we have discussed this case at length with all members of The Lancet’s senior editorial team, and with Zoe, and we do not agree that there is a need to publish a correction.”

The Lancet now rejected the notion that the comment was inaccurate. Despite the explicit language in the comment identifying the “normal range” thresholds as the PACE trial’s own “strict criterion for recovery,” The Lancet argued in its response to the Press Complaints Commission that the authors were only expressing their personal opinion about what constituted “recovery.”

In other words, according to The Lancet, Bleijenberg and Knoop were not describing—wrongly–the conclusions of the PACE paper itself. They were describing their own interpretation of the findings. Therefore, the comment was not inaccurate and did not need to be corrected.

(In its response to the Press Complaints Commission, The Lancet did not explain why thresholds that purportedly represented a “strict criterion for recovery” overlapped with the entry criteria for disability.)

*****

The Press Complaints Commission issued its findings in early 2013. The commission agreed with the Countess of Mar that the statement about “recovery” in the Lancet comment was inaccurate. But the commission gave a slightly different reason. The commission accepted the Lancet’s argument that Bleijenberg and Knoop were trying to express their own opinion. The problem, the commission ruled, was that the comment itself didn’t make that point clear.

“The authors of the comment piece were clearly entitled to take a view on how “recovery” should be defined among the patients in the trial,” wrote the commission. However, continued the decision: “The authors of the comment had failed to make clear that the 30 per cent figure for ‘recovery’ reflected their view that function within “normal range’ was an appropriate way of ‘operationalising’ recovery–rather than statistical analysis by the researchers based on the definition for recovery provided. This was a distinction of significance, particularly in the context of a comment on a clinical trial published in a medical journal. The comment was misleading on this point and raised a breach of Clause 1 (Accuracy) of the Code.”

However, this determination seemed based on a msreading of what Bleijenberg and Knoop had actually written: “PACE used a strict criterion for recovery.” That phrasing did not suggest that the authors were expressing their own opinion about “recovery.” Rather, it was a statement about how the PACE study itself purportedly defined “recovery.” And the statement was demonstrably untrue.

Compounding the confusion, the Press Complaints Commission decision noted that the Lancet comment had been discussed with the PACE investigators prior to publication. Since the phrase “strict criterion for recovery” had thus apparently been vetted by the PACE team itself, it remained unclear why the commission determined that Bleijenberg and Knoop were only expressing their own opinion.

The commission’s response left other questions unanswered. The commission noted that the Countess had pointed out that the “recovery” score for physical function cited by the commenters was lower than the score required for entry. Despite this obvious anomaly, the commission did not indicate whether it had asked The Lancet or Bleijenberg and Knoop to explain how such a nonsensical scale could be used to assess “recovery.”.

*****

Notwithstanding the inaccuracy of the Lancet comment’s “recovery” claim, the commission also found that the journal had already taken “sufficient remedial action” to rectify the problem. The commission noted that the correspondence published after the trial had provided a prominent forum to debate concerns over the definition of “recovery.” The decision also noted that the PACE authors themselves had clarified in the correspondence that the actual “recovery” findings would be published in a subsequent paper.

In ruling that “sufficient remedial action” had already been taken, however, the commission did not mention the potential damage that already might have been caused by this inaccurate “recovery” claim. Given the comment’s declaration that 30 percent of participants in the cognitive behavior and graded exercise therapy arms had “recovered” according to a “strict criterion,” the message received worldwide dissemination—even though the PACE paper itself made no such claim.

Medical and public health journals, conflating the Lancet comment and the PACE study itself, also transmitted the 30 percent “recovery” rate directly to clinicians and others who treat or otherwise deal with ME/CFS patients.

The BMJ referred to the approximately 30 percent of patients who met the “normal range” thresholds as “cured.” A study in BMC Health Services Research cited PACE as having demonstrated “a recovery rate of 30-40%”—months after the PACE authors had issued their “correction” that their paper did not report on “recovery” at all. (Another mystery about the BMC Health Services Research report is the source of the 40 percent figure for “recovery.”) A 2013 paper in PLoS One similarly cited the PACE study—not the Lancet comment—and noted that 30 percent achieved a “full recovery.”

Given that relapsing after too much exertion is a core symptom of the illness, it is impossible to calculate the possible harms that could have arisen from this widespread dissemination of misinformation to health care professionals—all based on the flawed claim from the comment that 30 percent of participants had recovered according to the PACE study’s “strict criterion for recovery.”

And that “strict criterion,” it should be remembered, allowed participants to get worse and still be counted as better.

David Tuller responds to the PACE investigators

David Tuller’s three-installment investigation of the PACE trial for chronic fatigue syndrome, “Trial By Error,” has received enormous attention. Although the PACE investigators declined David’s efforts to interview them, they have now requested the right to reply. Today, virology blog posts their response to David’s story, and below, his response to their response. 

According to the communications department of Queen Mary University, the PACE investigators have been receiving abuse on social media as a result of David Tuller’s posts. When I published Mr. Tuller’s articles, my intent was to provide a forum for discussion of the controversial PACE results. Abuse of any kind should not have been, and must not be, part of that discourse. -vrr


Last December, I offered to fly to London to meet with the main PACE investigators to discuss my many concerns. They declined the offer. Dr. White cited my previous coverage of the issue as the reason and noted that “we think our work speaks for itself.” Efforts to reach out to them for interviews two weeks ago also proved unsuccessful.

After my story ran on virology blog last week, a public relations manager for medicine and dentistry in the marketing and communications department of Queen Mary University e-mailed Dr. Racaniello. He requested, on behalf of the PACE authors, the right to respond. (Queen Mary University is Dr. White’s home base.)

That response arrived Wednesday. My first inclination, when I read it, was that I had already rebutted most of their criticisms in my 14,000-word piece, so it seemed like a waste of time to engage in further extended debate.

Later in the day, however, the public relations manager for medicine and dentistry from the marketing and communications department of Queen Mary University e-mailed Dr. Racaniello again, with an urgent request to publish the response as soon as possible. The PACE investigators, he said, were receiving “a lot of abuse” on social media as a result of my posts, so they wanted to correct the “misinformation” as soon as possible.

Because I needed a day or two to prepare a careful response to the PACE team’s rebuttal, Dr. Racaniello agreed to post them together on Friday morning.

On Thursday, Dr. Racaniello received yet another appeal from the public relations manager for medicine and dentistry from the marketing and communications department of Queen Mary University. Dissatisfied with the Friday publishing timeline, he again urged expedited publication because “David’s blog posts contain a number of inaccuracies, may cause a considerable amount of reputational damage, and he did not seek comment from any of the study authors before the virology blog was published.”

The charge that I did not seek comment from the authors was at odds with the facts, as Dr. Racaniello knew. (It is always possible to argue about accuracy and reputational damage.) Given that much of the argument for expedited posting rested on the public relations manager’s obviously “dysfunctional cognition” that I had unfairly neglected to provide the PACE authors with an opportunity to respond, Dr. Racaniello decided to stick with his pre-planned posting schedule.

Before addressing the PACE investigators’ specific criticisms, I want to apologize sincerely to Dr. White, Dr. Chalder, Dr. Sharpe and their colleagues on behalf of anyone who might have interpreted my account of what went wrong with the PACE trial as license to target the investigators for “abuse.” That was obviously not my intention in examining their work, and I urge anyone engaging in such behavior to stop immediately. No one should have to suffer abuse, whether online or in the analog world, and all victims of abuse deserve enormous sympathy and compassion.

However, in this case, it seems I myself am being accused of having incited a campaign of social media “abuse” and potentially causing “reputational damage” through purportedly inaccurate and misinformed reporting. Because of the seriousness of these accusations, and because such accusations have a way of surfacing in news reports, I feel it is prudent to rebut the PACE authors’ criticisms in far more detail that I otherwise would. (I apologize in advance to the obsessives and others who feel they need to slog through this rebuttal; I urge you to take care not to over-exert yourself!)

In their effort to correct the “misinformation” and “inaccuracies” in my story about the PACE trial, the authors make claims and offer accounts similar to those they have previously presented in published comments and papers. In the past, astonishingly, journal editors, peer reviewers, reporters, public health officials, and the British medical and academic establishments have accepted these sorts of non-responsive responses as adequate explanations for some of the study’s fundamental flaws. I do not.

None of what they have written in their response actually addresses or resolves the core issues that I wrote about last week. They have ignored many of the questions raised in the article. In their response, they have also not mentioned the devastating criticisms of the trial from top researchers from Columbia, Stanford, University College London, and elsewhere. They have not addressed why major reports this year from the Institute of Medicine and the National Institutes of Health have presented portraits of the disease starkly at odds with the PACE framework and approach.

I will ignore their overview of the findings and will focus on the specific criticisms of my work. (I will, however, mention here that my piece discussed why their claims of cost-effectiveness for cognitive behavior therapy and graded exercise therapy are based on inaccurate statements in a paper published in PLoS One in 2012).

13% of patients had already “recovered” on entry into the trial

I did not write that 13% of the participants were “recovered” at baseline, as the PACE authors state. I wrote that they were “recovered” or already at the “recovery” thresholds for two specific indicators, physical function and fatigue, at baseline—a different statement, and an accurate one.

The authors acknowledge, in any event, that 13% of the sample was “within normal range” at baseline. For the 2013 paper in Psychological Medicine, these “normal range” thresholds were re-purposed as two of the four required “recovery” criteria.

And that begs the question: Why, at baseline, was 13% of the sample “within normal range” or “recovered” on any indicator in the first place? Why did entry criteria for disability overlap with outcome scores for being “within the normal range” or “recovered”? The PACE authors have never provided an explanation of this anomaly.

In their response, the authors state that they outlined other criteria that needed to be met for someone to be called “recovered.” This is true; as I wrote last week, participants needed to meet “recovery” criteria on four different indicators to be considered “recovered.” The PACE authors did not provide data for two of the indicators in the 2011 Lancet paper, so in that paper they could not report results for “recovery.”

However, at the press conference presenting the 2011 Lancet paper, Trudie Chalder referred to people who met the overlapping disability/”normal range” thresholds as having gotten “back to normal”—an explicit “recovery” claim. In a Lancet comment published along with the PACE study itself, colleagues of the PACE team referred to these bizarre “normal range” thresholds for physical function and fatigue as a “strict criterion for recovery.” As I documented, the Lancet comment was discussed with the PACE authors before publication; the phrase “strict criterion for recovery” obviously survived that discussion.

Much of the coverage of the 2011 paper reported that patients got “back to normal” or “recovered,” based on Dr. Chalder’s statement and the Lancet comment. The PACE authors made no public attempt to correct the record in the months after this apparently inaccurate news coverage, until they published a letter in the Lancet. In the response to Virology Blog, they say that they were discussing “normal ranges” in the Lancet paper, and not “recovery.” Yet they have not explained why Chalder spoke about participants getting “back to normal” and why their colleagues wrote that the nonsensical “normal ranges” thresholds represented a “strict criterion of recovery.”

Moreover, they still have not responded to the essential questions: How does this analysis make sense? What are the implications for the findings if 13 % are already “within normal range” or “recovered” on one of the two primary outcome measures? How can they be “disabled” enough on the two primary measures to qualify for the study if they’re already “within normal range” or “recovered”? And why did the PACE team use the wrong statistical methods for calculating their “normal ranges” when they knew that method was wrong for the data sources they had?

Bias was caused by a newsletter for patients giving quotes from patients and mentioning UK government guidance on management. A key investigator was on the guideline committee.

The PACE authors apparently believe it is appropriate to disseminate positive testimonials during a trial as long as the therapies or interventions are not mentioned. (James Coyne dissected this unusual position yesterday.)

This is their argument: “It seems very unlikely that this newsletter could have biased participants as any influence on their ratings would affect all treatment arms equally.” Apparently, the PACE investigators believe that if you bias all the arms of your study in a positive direction, you are not introducing bias into your study. It is hard to know what to say about this argument.

Furthermore, the PACE authors argue that the U.K. government’s new treatment guidelines had been widely reported. Therefore, they contend, it didn’t matter that–in the middle of a trial to test the efficacy of cognitive behavior therapy and graded exercise therapy–they had informed participants that the government had already approved cognitive behavior therapy and graded exercise therapy “based on the best available evidence.”

They are wrong. They introduced an uncontrolled, unpredictable co-intervention into their study, and they have no idea what the impact might have been on any of the four arms.

In their response, the PACE authors note that the participants’ newsletter article, in addition to cognitive behavior therapy and graded exercise therapy, included a third intervention, Activity Management. As they correctly note, I did not mention this third intervention in my Virology Blog story. The PACE authors now write: “These three (not two as David Tuller states) therapies were the ones being tested in the trial, so it is hard to see how this might lead to bias in the direction of one or other of these therapies.”

This statement is nonsense. Their third intervention was called “Adaptive Pacing Therapy,” and they developed it specifically for testing in the PACE trial. It is unclear why they now state that their third intervention was Activity Management, or why they think participants would know that Activity Management was synonymous with Adaptive Pacing Therapy. After all, cognitive behavior therapy and graded exercise therapy also involve some form of “activity management.” Precision in language matters in science.

Finally, the investigators say that Jessica Bavington, a co-author of the 2011 paper, had already left the PACE team before she served on the government committee that endorsed the PACE therapies. That might be, but it is irrelevant to the question that I raised in my piece: whether her dual role presented a conflict of interest that should have been disclosed to participants in the newsletter article about the U.K. treatment guidelines. The PACE newsletter article presented the U.K. guideline committee’s work as if it were independent of the PACE trial itself, when it was not.

Bias was caused by changing the two primary outcomes and how they were analyzed

 The PACE authors seem to think it is acceptable to change methods of assessing primary outcome measures during a trial as long as they get committee approval, announce it in the paper, and provide some sort of reasonable-sounding explanation as to why they made the change. They are wrong.

They need as well to justify the changes with references or citations that support their new interpretations of their indicators, and they need to conduct sensitivity analyses to assess the impact of the changes on their findings. Then they need to explain why their preferred findings are more robust than the initial, per-protocol findings. They did not take these steps for any of the many changes they made from their protocol.

The PACE authors mention the change from bimodal to Likert-style scoring on the Chalder Fatigue Scale. They repeat their previous explanation of why they made this change. But they have ignored what I wrote in my story—that the year before PACE was published, its “sister” study, called the FINE trial, had no significant findings on the physical function and fatigue scales at the end of the trial and only found modest benefits in a post-hoc analysis after making the same change in scoring that PACE later made. The FINE study was not mentioned in PACE. The PACE authors have not explained why they left out this significant information about their “sister” study.

Regarding the abandonment of the original method of assessing the physical function scores, this is what they say in their response: “We decided this composite method [their protocol method] would be hard to interpret clinically, and would not answer our main question of comparing effectiveness between treatment arms. We therefore chose to compare mean scores of each outcome measure between treatment arms instead.” They mention that they received committee approval, and that the changes were made before examining the outcome data.

The authors have presented these arguments previously. However, they have not responded to the questions I raised in my story. Why did they not report any sensitivity analyses for the changes in methods of assessing the primary outcome measures? (Sensitivity analyses can assess how changes in assumptions or variables impact outcomes.) What prompted them to reconsider their assessment methods in the middle of the trial? Were they concerned that a mean-based measure, unlike their original protocol measure, did not provide any information about proportions of participants who improved or got worse? Any information about proportions of participants who got better or worse were from post-hoc analyses—one of which was the perplexing “normal range” analysis.

Moreover, this was an unblinded trial, and researchers generally have an idea of outcome trends before examining outcome data. When the PACE authors made the changes, did they already have an idea of outcome trends? They have not answered that question.

Our interpretation was misleading after changing the criteria for determining recovery

 The PACE authors relaxed all four of their criteria for “recovery” in their 2013 paper and cited no committees who approved this overall redefinition of this critical concept. Three of these relaxations involved expanded thresholds; the fourth involved splitting one category into two sub-categories—one less restrictive and one more restrictive. The authors gave the full results for the less restrictive category of “recovery.”

The PACE authors now say that they changed the “recovery” thresholds on three of the variables “since we believed that the revised thresholds better reflected recovery.” Again, they apparently think that simply stating their belief that the revisions were better justifies making the changes.

Let’s review for a second. The physical function threshold for “recovery” fell from 85 out of 100 in the protocol, to a score of 60 in the 2013 paper. And that “recovery” score of 60 was lower than the entry score of 65 to qualify for the study. The PACE authors have not explained how the lower score of 60 “better reflected recovery”—especially since the entry score of 65 already represented serious disability. Similar problems afflicted the fatigue scale “recovery” threshold.

The PACE authors also report that “we included those who felt “much” (and “very much”) better in their overall health” as one of the criteria for “recovery.” This is true. They are referring to the Clinical Global Impression scale. In the protocol, participants needed to score a 1 (“very much better”) on this scale to be considered “recovered” on that indicator. In the 2013 paper, participants could score a 1 (“very much better”) or a 2 (“much better”). The PACE authors provided no citations to support this expanded interpretation of the scale. They simply explained in the paper that they now thought “much better” reflected the process of recovery and so those who gave a score of 2 should also be considered to have achieved the scale’s “recovery” threshold.

With the fourth criterion—not meeting any of the three case definitions used to define the illness in the study—the PACE authors gave themselves another option. Those who did not meet the study’s main case definition but still met one or both of the other two were now eligible for a new category called “trial recovery.” They did not explain why or when they made this change.

The PACE authors provided no sensitivity analyses to measure the impact of the significant changes in the four separate criteria for “recovery,” as well as in the overall re-definition. And remember, participants at baseline could already have achived the “recovery” requirements for one or two of the four criteria—the physical function and fatigue scales. And 13% of them already had.

Requests for data under the freedom of information act were rejected as vexatious

The PACE authors have rejected requests for the results per the protocol and many other requests for documents and data as well—at least two for being “vexatious,” as they now report. In my story, I incorrectly stated that requests for per-protocol data were rejected as “vexatious” [see clarification below]. In fact, earlier requests for per-protocol data were rejected for other reasons.

One recent request rejected as “vexatious” involved the PACE investigators’ 2015 paper in The Lancet Psychiatry. In this paper, they published their last “objective” outcome measure (except for wages, which they still have not published)—a measure of fitness called a “step-test.” But they only published a tiny graph on a page with many other tiny graphs, not the actual numbers from which the graph was drawn.

The graph was too small to extract any data, but it appeared that the cognitive behavior therapy and graded exercise therapy groups did worse than the other two. A request for the step-test data from which they created the graph was rejected as “vexatious.”

However, I apologize to the PACE authors that I made it appear they were using the term “vexatious” more extensively in rejecting requests for information than they actually have been. I also apologize for stating incorrectly that requests for per protocol data specifically had been rejected as “vexatious” [see clarification below].

This is probably a good time to address the PACE authors’ repeated refrain that concerns about patient confidentiality prevent them from releasing raw data and other information from the trial. They state: “The safe-guarding of personal medical data was an undertaking enshrined in the consent procedure and therefore is ethically binding; so we cannot publicly release these data. It is important to remember that simple methods of anonymization does [sic] not always protect the identity of a person, as they may be recognized from personal and medical information.”

This argument against the release of data doesn’t really hold up, given that researchers share data all the time without compromising confidentiality. Really, it’s not that difficult to do!

(It also bears noting that the PACE authors’ dedication to participant protection did not extend to fulfilling their protocol promise to inform participants of their “possible conflicts of interest”—see below.)

Subjective and objective outcomes

The PACE authors included multiple objective measures in their protocol. All of them failed to demonstrate real treatment success or “recovery.” The extremely modest improvements in the exercise therapy arm in the walking test still left them more severely disabled people with people with pacemakers, cystic fibrosis patients, and relatively healthy women in their 70s.

The authors now write: “We interpreted these data in the light of their context and validity.”

What the PACE team actually did was to dismiss their own objective data as irrelevant or not actually objective after all. In doing so, they cited various reasons they should have considered before including these measures in the study as “objective” outcomes. They provide one example in their response. They selected employment data as an objective measure of function, and then—as they explain in their response, and have explained previously–they decided afterwards that it wasn’t an objective measure of function after all, for this and that reason.

The PACE authors consider this interpreting data “in light of their context and validity.” To me, it looks like tossing data they don’t like.

What they should do, but have not, is to ask whether the failure of all their objective measures might mean they should start questioning the meaning, reliability and validity of their reported subjective results.

There was a bias caused by many investigators’ involvement with insurance companies and a failure not to declare links with insurance companies in information regarding consent

The PACE authors here seriously misstate the concerns I raised in my piece. I did not assert that bias was caused by their involvement with insurance companies. I asserted that they violated an international research ethics document and broke a commitment they made in their protocol to inform participants of “any possible conflicts of interest.” Whether bias actually occurred is not the point.

In their approved protocol, the authors promised to adhere to the Declaration of Helsinki, a foundational human rights document that is explicit on what constitutes legitimate informed consent: Prospective participants must be “adequately informed” of “any possible conflicts of interest.” The PACE authors now suggest this disclosure was unnecessary because 1) the conflicts weren’t really conflicts after all; 2) they disclosed these “non-conflicts” as potential conflicts of interest in the Lancet and other publications, 3) they had a lot of investigators but only three had links with insurers, and 4) they informed participants about who funded the research.

These responses are not serious. They do nothing to explain why the PACE authors broke their own commitment to inform participants about “any possible conflicts of interest.” It is not acceptable to promise to follow a human rights declaration, receive approvals for a study, and then ignore inconvenient provisions. No one is much concerned about PACE investigator #19; people are concerned because the three main PACE investigators have  advised disability insurers that cognitive behavior therapy and graded exercise therapy can get claimants off benefits and back to work.

That the PACE authors made the appropriate disclosures to journal editors is irrelevant; it is unclear why they are raising this as a defense. The Declaration of Helsinki is about protecting human research subjects, not about protecting journal editors and journal readers. And providing information to participants about funding sources, however ethical that might be, is not the same as disclosing information about “any possible conflicts of interest.” The PACE authors know this.

Moreover, the PACE authors appear to define “conflict of interest” quite narrowly. Just because the insurers were not involved in the study itself does not mean there is no conflict of interest and does not alleviate the PACE authors of the promise they made to inform trial participants of these affiliations. No one required them to cite the Declaration of Helsinki in their protocol as part of the process of gaining approvals for their trial.

As it stands, the PACE study appears to have no legitimate informed consent for any of the 641 participants, per the commitments the investigators themselves made in their protocol. This is a serious ethical breach.

I raised other concerns in my story that the authors have not addressed. I will save everyone much grief and not go over them again here.

I want to acknowledge two additional minor errors. In the last section of the piece, I referred to the drug rituximab as an “anti-inflammatory.” While it does have anti-inflammatory effects, rituximab should more properly be referred to as an “immunomodulatory” drug.

Also, in the first section of the story, I wrote that Dr. Chalder and Dr. Sharpe did not return e-mails I sent them last December, seeking interviews. However, during a recent review of e-mails from last December, I found a return e-mail from Dr. Sharpe that I had forgotten about. In the e-mail, Dr. Sharpe declined my request for an interview.

I apologize to Dr. Sharpe for suggesting he hadn’t responded to my e-mail last December.

Clarification: In a decision on a data request, the UK Information Commissioner’s Office noted last year that Queen Mary University of London “has advised that the effect of these requests [for PACE-related material] has been that the team involved in the PACE trial, and in particular the professor involved, now feel harassed and believe that the requests are vexatious in nature.” In other words, whatever the stated reason for denying requests, White and his colleagues regarded them all as “vexatious” by definition. Therefore, the statement that the investigators rejected the requests for data as being “vexatious” is accurate, and I retract my previous apology.

PACE trial investigators respond to David Tuller

Professors Peter White, Trudie Chalder and Michael Sharpe (co-principal investigators of the PACE trial) respond to the three blog posts by David Tuller, published here on 21st, 22nd and 23rd October 2015, about the PACE trial.

Overview

The PACE trial was a randomized controlled trial of four non-pharmacological treatments for 641 patients with chronic fatigue syndrome (CFS) attending secondary care clinics in the United Kingdom (UK) (http://www.wolfson.qmul.ac.uk/current-projects/pace-trial) The trial found that individually delivered cognitive behaviour therapy (CBT) and graded exercise therapy (GET) were more effective than both adaptive pacing therapy (APT), when added to specialist medical care (SMC), and SMC alone. The trial also found that CBT and GET were cost-effective, safe, and were about three times more likely to result in a patient recovering than the other two treatments.

There are a number of published systematic reviews and meta-analyses that support these findings from both before and after the PACE trial results were published (Whiting et al, 2001, Edmonds et al, 2004, Chambers et al, 2006, Malouff et al, 2008, Price et al, 2008, Castell et al, 2011, Larun et al, 2015, Marques et al, 2015, Smith et al, 2015). We have published all the therapist and patient manuals used in the trial, which can be down-loaded from the trial website (http://www.wolfson.qmul.ac.uk/current-projects/pace-trial).

We will only address David Tuller’s main criticisms. Most of these are often repeated criticisms that we have responded to before, and we will argue that they are unjustified.

Main criticisms:

13% of patients had already “recovered” on entry into the trial

Some 13% of patients entering the trial did have scores within normal range (i.e. within one standard deviation of the population means) for either one or both of the primary outcomes of fatigue and physical function – but this is clearly not the same as being recovered; we have published a correction after an editorial, written by others, implied that it was (White et al, 2011a). In order to be considered recovered, patients also had to:

  • Not meet case criteria for CFS
  • Not meet eligibility criteria for either of the primary outcome measures for entry into the trial
  • Rate their overall health (not just CFS) as “much” or “very much” better.

It would therefore be impossible to be recovered and eligible for trial entry (White et al, 2013). 

Bias was caused by a newsletter for patients giving quotes from patients and mentioning UK government guidance on management. A key investigator was on the guideline committee

It is considered good practice to publish newsletters for participants in trials, so that they are kept fully informed both about the trial’s progress and topical news about their illness. We published four such newsletters during the trial, which can all be found at http://www.wolfson.qmul.ac.uk/current-projects/pace-trial. The newsletter referred to is the one found at this link: http://www.wolfson.qmul.ac.uk/images/pdfs/participantsnewsletter3.pdf.

As can be seen no specific treatment or therapy is named in this newsletter and we were careful to print feedback from participants from all four treatment arms. All newsletters were approved by the independent research ethics committee before publication. It seems very unlikely that this newsletter could have biased participants as any influence on their ratings would affect all treatment arms equally.

The same newsletter also mentioned the release of the UK National Institute for Health and Care Excellence guideline for the management of this illness (this institute is independent of the UK government). This came out in 2007 and received much media interest, so most patients would already have been aware of it. Apart from describing its content in summary form we also said “The guidelines emphasize the importance of joint decision making and informed choice and recommended therapies include Cognitive Behavioural Therapy, Graded Exercise Therapy and Activity Management.” These three (not two as David Tuller states) therapies were the ones being tested in the trial, so it is hard to see how this might lead to bias in the direction of one or other of these therapies.

The “key investigator” on the guidelines committee, who was mentioned by David Tuller, helped to write the GET manuals, and provided training and supervision for one of the therapies; however they had left the trial team two years before the newsletter’s publication. 

Bias was caused by changing the two primary outcomes and how they were analyzed

These criticisms were first made four years ago, and have been repeatedly addressed and explained by us (White et al, 2013a, White 2015), including explicit descriptions and justification within the main paper itself (White et al, 2011), the statistical analysis plan (Walwyn et al, 2013), and the trial website section of frequently asked questions, published in 2011 (http://www.wolfson.qmul.ac.uk/images/pdfs/pace/faq2.pdf).

The two primary outcomes for the trial were the SF36 physical function sub-scale and the Chalder fatigue questionnaire, as in the published trial protocol; so there was no change in the outcomes themselves. The only change to the primary outcomes from the original protocol was the use of the Likert scoring method (0, 1, 2, 3) of the fatigue questionnaire. This was used in preference to the binary method of scoring (0, 0, 1, 1). This was done in order to improve the variance of the measure (and thus provide better evidence of any change).

The other change was to drop the originally chosen composite measures (the number of patients who either exceeded a threshold score or who changed by more than 50 per cent). After careful consideration, we decided this composite method would be hard to interpret clinically, and would not answer our main question of comparing effectiveness between treatment arms. We therefore chose to compare mean scores of each outcome measure between treatment arms instead.

All these changes were made before any outcome data were analyzed (i.e. they were pre-specified), and were all approved by the independent Trial Steering Committee and Data Monitoring and Ethics committee.

Our interpretation was misleading after changing the criteria for determining recovery

We addressed this criticism two years ago in correspondence that followed the paper (White et al, 2013b), and the changes were fully described and explained in the paper itself (White et al, 2013). We changed the thresholds for recovery from the original protocol for our secondary analysis paper on recovery for three, not four, of the variables, since we believed that the revised thresholds better reflected recovery. For instance, we included those who felt “much” (and “very much”) better in their overall health as one of the five criteria that defined recovery. This was done before the analysis occurred (i.e. it was pre-specified). In the discussion section of the paper we discussed the limitations and difficulties in measuring recovery, and stated that other ways of defining recovery could produce different results. We also provided the results of different criteria for defining recovery in the paper. The bottom line was that, however we defined recovery, significantly more patients had recovered after receiving CBT and GET than after other treatments (White et al, 2013).

Requests for data under the freedom of information act were rejected as vexatious

 We have received numerous Freedom of Information Act requests over the course of many years. These even included a request to know how many Freedom of Information requests we had received. We have provided these data when we were able to (e.g. the 13% figure mentioned above came from our releasing these data). However, the safe-guarding of personal medical data was an undertaking enshrined in the consent procedure and therefore is ethically binding; so we cannot publicly release these data. It is important to remember that simple methods of anonymization does not always protect the identity of a person, as they may be recognized from personal and medical information. We have only considered two of these many Freedom of Information requests as vexatious, although an Information Tribunal judge considered an earlier request was also vexatious (General Regulation Chamber, 2013).

Subjective and objective outcomes

These issues were first raised seven years ago and have all been addressed before (White et al, 2008, White et al, 2011, White et al, 2013a, White et al, 2013b, Chalder et al, 2015a). We chose (subjective) self-ratings as the primary outcomes, since we considered that the patients themselves were the best people to determine their own state of health. We have also reported the results of a number of objective outcomes, including a walking test, a stepping test, employment status and financial benefits (White et al, 2011a, McCrone et al, 2012, Chalder et al, 2015). The distance participants could walk in six minutes was significantly improved following GET, compared to other treatments. There were no significant differences in fitness, employment or benefits between treatments. We interpreted these data in the light of their context and validity. For instance, we did not use employment status as a measure of recovery or improvement, because patients may not have been in employment before falling ill, or they may have lost their job as a consequence of being ill (White et al, 2013b). Getting better and getting a job are not the same things, and being in employment depends on the prevailing state of the local economy as much as being fit for work.

There was a bias caused by many investigators’ involvement with insurance companies and a failure not to declare links with insurance companies in information regarding consent

No insurance company was involved in any aspect of the trial. There were some 19 investigators, three of whom have done consultancy work at various times for insurance companies. This was not related to the research and was listed as a potential conflict of interest in the relevant papers. The patient information sheet informed all potential participants as to which organizations had funded the research, which is consistent with ethical guidelines.

References

Castell BD et al, 2011. Cognitive Behavioral Therapy and Graded Exercise for Chronic Fatigue Syndrome: A Meta‐Analysis. Clin Psychol Sci Pract 18; 311-324.

doi: http://dx.doi.org/10.1111/j.1468-2850.2011.01262.x

Chalder T et al, 2015. Rehabilitative therapies for chronic fatigue syndrome: a secondary mediation analysis of the PACE trial. Lancet Psychiatry 2; 141-152.

doi: http://dx.doi.org/10.1016/S2215-0366(14)00069-8

Chalder T et al, 2015a. Methods and outcome reporting in the PACE trial–Author’s reply. Lancet Psychiatry 2; e10–e11. doi: http://dx.doi.org/10.1016/S2215-0366(15)00114-5.

Chambers D et al, 2006. Interventions for the treatment, management and rehabilitation of patients with chronic fatigue syndrome/myalgic encephalomyelitis: an updated systematic review. J R Soc Med 99: 506-520.

Edmonds M et al, 2004. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev 3: CD003200. doi: http://dx.doi.org/10.1002/14651858.CD003200.pub2

General Regulation Chamber (Information Rights) First Tier Tribunal. Mitchell versus Information commissioner. EA 2013/0019.

www.informationtribunal.gov.uk/DBFiles/Decision/i1069/20130822%20Decision%20EA20130019.pdf

Larun L et al, 2015. Exercise therapy for chronic fatigue syndrome. Cochrane Database of Systematic Reviews Issue 2. Art. No.: CD003200.

doi: http://dx.doi.org/10.1002/14651858.CD003200.pub3

Malouff JM et al, 2008. Efficacy of cognitive behavioral therapy for chronic fatigue syndrome: a meta-analysis. Clin Psychol Rev 28: 736–45.

doi: http://dx.doi.org/10.1016/j.cpr.2007.10.004

Marques MM et al, 2015. Differential effects of behavioral interventions with a graded physical activity component in patients suffering from Chronic Fatigue (Syndrome): An updated systematic review and meta-analysis. Clin Psychol Rev 40; 123–137. doi: http://dx.doi.org/10.1016/j.cpr.2015.05.009

McCrone P et al. Adaptive pacing, cognitive behaviour therapy, graded exercise, and specialist medical care for chronic fatigue syndrome: a cost effectiveness analysis. PLoS ONE 2012; 7: e40808. Doi: http://dx.doi.org/10.1371/journal.pone.0040808

Price JR et al, 2008. Cognitive behaviour therapy for chronic fatigue syndrome in adults. Cochrane Database Syst Rev 3: CD001027.

doi: http://dx.doi.org/10.1002/14651858.CD001027.pub2

Smith MB et al, 2015. Treatment of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: A Systematic Review for a National Institutes of Health Pathways to Prevention Workshop. Ann Intern Med. 162: 841-850. doi: http://dx.doi.org/10.7326/M15-0114

Walwyn R et al, 2013. A randomised trial of adaptive pacing therapy, cognitive behaviour therapy, graded exercise, and specialist medical care for chronic fatigue syndrome (PACE): statistical analysis plan. Trials 14: 386. http://www.trialsjournal.com/content/14/1/386

White PD et al, 2007. Protocol for the PACE trial: a randomised controlled trial of adaptive pacing, cognitive behaviour therapy, and graded exercise, as supplements to standardised specialist medical care versus standardised specialist medical care alone for patients with the chronic fatigue syndrome/myalgic encephalomyelitis or encephalopathy. BMC Neurol 7:6. doi: http://dx.doi.org/10.1186/1471-2377-7-6

White PD et al, 2008. Response to comments on “Protocol for the PACE trial”. http://www.biomedcentral.com/1471-2377/7/6/COMMENTS/prepub#306608

White PD et al, 2011. The PACE trial in chronic fatigue syndrome – Authors’ reply. Lancet 377; 1834-35. DOI: http://dx.doi.org/10.1016/S0140-6736(11)60651-X

White PD et al, 2011a. Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. Lancet 377:823-36. doi: http://dx.doi.org/10.1016/S0140-6736(11)60096-2

White PD et al, 2013. Recovery from chronic fatigue syndrome after treatments given in the PACE trial. Psychol Med 43: 227-35. doi: http://dx.doi.org/10.1017/S0033291713000020

White PD et al, 2013a. Chronic fatigue treatment trial: PACE trial authors’ reply to letter by Kindlon. BMJ 347:f5963. doi: http://dx.doi.org/10.1136/bmj.f5963

White PD et al, 2013b. Response to correspondence concerning ‘Recovery from chronic fatigue syndrome after treatments in the PACE trial’. Psychol Med 43; 1791-2. doi: http://dx.doi.org/10.1017/S0033291713001311

White PD et al, 2015. The planning, implementation and publication of a complex intervention trial for chronic fatigue syndrome: the PACE trial. Psychiatric Bulletin 39, 24-27. doi: http://dx.doi.org/10.1192/pb.bp.113.045005

Whiting P et al, 2001. Interventions for the Treatment and Management of Chronic Fatigue Syndrome: A Systematic Review. JAMA. 286:1360-68. doi: http://dx.doi.org/10.1001/jama.286.11.1360

TRIAL BY ERROR: The Troubling Case of the PACE Chronic Fatigue Syndrome Study (final installment)

By David Tuller, DrPH

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley. 

A few years ago, Dr. Racaniello let me hijack this space for a long piece about the CDC’s persistent incompetence in its efforts to address the devastating illness the agency itself had misnamed “chronic fatigue syndrome.” Now I’m back with an even longer piece about the U.K’s controversial and highly influential PACE trial. The $8 million study, funded by British government agencies, purportedly proved that patients could “recover” from the illness through treatment with one of two rehabilitative, non-pharmacological interventions: graded exercise therapy, involving a gradual increase in activity, and a specialized form of cognitive behavior therapy. The main authors, a well-established group of British mental health professionals, published their first results in The Lancet in 2011, with additional results in subsequent papers.

Much of what I report here will not be news to the patient and advocacy communities, which have produced a voluminous online archive of critical commentary on the PACE trial. I could not have written this piece without the benefit of that research and the help of a few statistics-savvy sources who talked me through their complicated findings. I am also indebted to colleagues and friends in both public health and journalism, who provided valuable suggestions and advice on earlier drafts. Today’s Virology Blog installment is the final quarter; the first and second installment were published previously. I was originally working on this piece with Retraction Watch, but we could not ultimately agree on the direction and approach. 

After this article was posted, the PACE investigators replied, and in turn I responded to their criticisms. All the articles can be found at the ME/CFS page.

SUMMARY

This examination of the PACE trial of chronic fatigue syndrome identified several major flaws:

*The study included a bizarre paradox: participants’ baseline scores for the two primary outcomes of physical function and fatigue could qualify them simultaneously as disabled enough to get into the trial but already “recovered” on those indicators–even before any treatment. In fact, 13 percent of the study sample was already “recovered” on one of these two measures at the start of the study.

*In the middle of the study, the PACE team published a newsletter for participants that included glowing testimonials from earlier trial subjects about how much the “therapy” and “treatment” helped them. The newsletter also included an article informing participants that the two interventions pioneered by the investigators and being tested for efficacy in the trial, graded exercise therapy and cognitive behavior therapy, had been recommended as treatments by a U.K. government committee “based on the best available evidence.” The newsletter article did not mention that a key PACE investigator was also serving on the U.K. government committee that endorsed the PACE therapies.

*The PACE team changed all the methods outlined in its protocol for assessing the primary outcomes of physical function and fatigue, but did not take necessary steps to demonstrate that the revised methods and findings were robust, such as including sensitivity analyses. The researchers also relaxed all four of the criteria outlined in the protocol for defining “recovery.” They have rejected requests from patients for the findings as originally promised in the protocol as “vexatious.”

*The PACE claims of successful treatment and “recovery” were based solely on subjective outcomes. All the objective measures from the trial—a walking test, a step test, and data on employment and the receipt of financial information—failed to provide any evidence to support such claims. Afterwards, the PACE authors dismissed their own main objective measures as non-objective, irrelevant, or unreliable.

*In seeking informed consent, the PACE authors violated their own protocol, which included an explicit commitment to tell prospective participants about any possible conflicts of interest. The main investigators have had longstanding financial and consulting ties with disability insurance companies, having advised them for years that cognitive behavior therapy and graded exercise therapy could get claimants off benefits and back to work. Yet prospective participants were not told about any insurance industry links and the information was not included on consent forms. The authors did include the information in the “conflicts of interest” sections of the published papers.

Top researchers who have reviewed the study say it is fraught with indefensible methodological problems. Here is a sampling of their comments:

Dr. Bruce Levin, Columbia University: “To let participants know that interventions have been selected by a government committee ‘based on the best available evidence’ strikes me as the height of clinical trial amateurism.”

Dr. Ronald Davis, Stanford University: “I’m shocked that the Lancet published it…The PACE study has so many flaws and there are so many questions you’d want to ask about it that I don’t understand how it got through any kind of peer review.”

Dr. Arthur Reingold, University of California, Berkeley: “Under the circumstances, an independent review of the trial conducted by experts not involved in the design or conduct of the study would seem to be very much in order.”

Dr. Jonathan Edwards, University College London: “It’s a mass of un-interpretability to me…All the issues with the trial are extremely worrying, making interpretation of the clinical significance of the findings more or less impossible.”

Dr. Leonard Jason, DePaul University: “The PACE authors should have reduced the kind of blatant methodological lapses that can impugn the credibility of the research, such as having overlapping recovery and entry/disability criteria.”

************************************************************************

PART FOUR:

The Publication Aftermath

Publication of the paper triggered what The Lancet described in an editorial as “an outpouring of consternation and condemnation from individuals or groups outside our usual reach.” Patients expressed frustration and dismay that once again they were being told to exercise and seek psychotherapy. They were angry as well that the paper ignored the substantial evidence pointing to patients’ underlying biological abnormalities.

Even Action For ME, the organization that developed the adaptive pacing therapy with the PACE investigators, declared in a statement that it was “surprised and disappointed” at “the exaggerated claims” being made about the rehabilitative therapies. And the findings that the treatments did not cause relapses, noted Peter Spencer, Action For ME’s chief executive officer, in the statement, “contradict the considerable evidence of our own surveys and those of other patient groups.”

Many believed the use of the broad Oxford criteria helped explain some of the reported benefits and lack of adverse effects. Although people with psychosis, bipolar disorder, substance “misuse,” organic brain disorder, or an eating disorder were screened out of the PACE sample, 47 percent of the participants were nonetheless diagnosed with “mood and anxiety disorders,” including depression. But just as cognitive and behavioral interventions have proven successful with people suffering from primary depression, as DePaul psychologist Leonard Jason had noted, the increased activity was also unlikely to harm such participants if they did not also experience the core ME/CFS symptom of post-exertional malaise.

Others, like Tom Kindlon, speculated that many of the patients in the two rehabilitative arms, even if they had reported subjective improvements, might not have significantly increased their levels of exertion. To bolster this argument, he noted the poor results from the six-minute walking test, which suggested little or no improvement in physical functioning.

“If participants did not follow the directives and did not gradually increase their total activity levels, they might not suffer the relapses and flare-ups that patients sometimes report with these approaches,” said Kindlon.

During an Australian radio interview, Lancet editor Richard Horton denounced what he called the “orchestrated response” from patients, based on “the flimsiest and most unfair allegations,” seeking to undermine the credibility of the research and the researchers. “One sees a fairly small, but highly organized, very vocal and very damaging group of individuals who have, I would say, actually hijacked this agenda and distorted the debate so that it actually harms the overwhelming majority of patients,” he said.

In fact, he added, “what the investigators did scrupulously was to look at chronic fatigue syndrome from an utterly impartial perspective.”

In explaining The Lancet’s decision to publish the results, Horton told the interviewer that the paper had undergone “endless rounds of peer review.” Yet the ScienceDirect database version of the article indicated that The Lancet had “fast-tracked” it to publication. According to current Lancet policy, a standard fast-tracked article is published within four weeks of receipt of the manuscript.

Michael Sharpe, one of the lead investigators, also participated in the Australian radio interview. In response to a question from the host, he acknowledged that only one in seven participants received a “clinically important treatment benefit” from the rehabilitative therapies of graded exercise therapy and cognitive behavior therapy—a key data point not mentioned in the Lancet paper.

“What this trial isn’t able to answer is how much better are these treatments than really not having very much treatment at all,” Sharpe told the radio host in what might have been an unguarded moment, given that the U.K. government had spent five million pounds on the PACE study to find out the answer. Sharpe’s statement also appeared to contradict the effusive “recovery” and “back-to-normal” news stories that had greeted the reported findings.

***

In correspondence published three months after the trial, the PACE authors gave no ground. In response to complaints about changes from the protocol, they wrote that the mid-trial revisions “were made to improve either recruitment or interpretability” and “were approved by the Trial Steering Committee, were fully reported in our paper, and were made before examining outcome data to avoid outcome reporting bias.” They did not mention whether, since it was an unblinded trial, they already had a general sense of outcome trends even before examining the actual outcome data. And they did not explain why they did not conduct sensitivity analyses to measure the impact of the protocol changes.

They defended their post-hoc “normal ranges” for fatigue and physical function as having been calculated through the “conventional” statistical formula of taking the mean plus/minus one standard deviation. As in the Lancet paper itself, however, they did not mention or explain the unusual overlaps between the entry criteria for disability and the outcome criteria for being within the “normal range.” And they did not explain why they used this “conventional” method for determining normal ranges when their two population-based data sources did not have normal distributions, a problem White himself had acknowledged in his 2007 study.

The authors clarified that the Lancet paper had not discussed “recovery” at all; they promised to address that issue in a future publication. But they did not explain why Chalder, at the press conference, had declared that patients got “back to normal.”

They also did not explain why they had not objected to the claim in the accompanying commentary, written by their colleagues and discussed with them pre-publication, that 30 percent of participants in the rehabilitative arms had achieved “recovery” based on a “strict criterion” —especially since that “strict criterion” allowed participants to get worse and still be “recovered.” Finally, they did not explain why, if the paper was not about “recovery,” they had not issued public statements to correct the apparently inaccurate news coverage that had reported how study participants in the graded exercise therapy and cognitive behavior therapy arms had “recovered” and gotten “back to normal.”

The authors acknowledged one error. They had described their source for the “normal range” for physical function as a “working-age” population rather than what it actually was–an “adult” population. (Unlike a “working-age” population, an “adult” population includes elderly people and is therefore less healthy. Had the PACE participants’ scores on the SF-36 physical function scale actually been compared to the SF-36 responses of the working-age subset of the adult population used as the source for the “normal range,” the percentages achieving the “normal range” threshold of this healthier group would have been even lower than the reported results.)

Yet The Lancet did not append a correction to the article itself, leaving readers completely unaware that it contained—and still contains–a mistake that involved a primary outcome and made the findings appear better than they actually were. (Lancet policy calls for correcting “any substantial error” and “any numerical error in the results, or any factual error in interpretation of results.”)

***

A 2012 paper in PLoS One, on financial aspects of the illness, included outcomes for some additional objective measures. Instead of a decrease in financial benefits received by those in the rehabilitative therapy arms, as would be expected if disabled people improved enough to increase their ability to work, the paper reported a modest average increase in the receipt of benefits across all the arms of the study. There were also no differences among the groups in days lost from work.

The investigators did not include the promised information on wages. They also had still not published the results of the self-paced step-test, described in the protocol as a measure of fitness.

In another finding, the PLoS One paper argued that the graded exercise and cognitive behavior therapies were the most cost-effective treatments from a societal perspective. In reaching this conclusion, the investigators valued so-called  “informal” care—unpaid care provided by family and friends–at the replacement cost of a homecare worker. The PACE statistical analysis plan (approved in 2010 but not published until 2013) had included two additional, lower-cost assumptions. The first valued informal care at minimum wage, the second at zero compensation.

The PLoS One paper itself did not provide these additional findings, noting only that “sensitivity analyses revealed that the results were robust for alternative assumptions.”

Commenters on the PLoS One website, including Tom Kindlon, challenged the claim that the findings would be “robust” under the alternative assumptions for informal care. In fact, they pointed out, the lower-cost conditions would reduce or fully eliminate the reported societal cost-benefit advantages of the cognitive behavior and graded exercise therapies.

In a posted response, the paper’s lead author, Paul McCrone, conceded that the commenters were right about the impact that the lower-cost, alternative assumptions would have on the findings. However, McCrone did not explain or even mention the apparently erroneous sensitivity analyses he had cited in the paper, which had found the societal cost-benefit advantages for graded exercise therapy and cognitive behavior therapy to be “robust” under all assumptions. Instead, he argued that the two lower-cost approaches were unfair to caregivers because families deserved more economic consideration for their labor.

“In our opinion, the time spent by families caring for people with CFS/ME has a real value and so to give it a zero cost is controversial,” McCrone wrote. “Likewise, to assume it only has the value of the minimum wage is also very restrictive.”

In a subsequent comment, Kindlon chided McCrone, pointing out that he had still not explained the paper’s claim that the sensitivity analyses showed the findings were “robust” for all assumptions. Kindlon also noted that the alternative, lower-cost assumptions were included in PACE’s own statistical plan.

“Remember it was the investigators themselves that chose the alternative assumptions,” wrote Kindlon. “If it’s ‘controversial’ now to value informal care at zero value, it was similarly ‘controversial’ when they decided before the data was looked at, to analyse the data in this way. There is not much point in publishing a statistical plan if inconvenient results are not reported on and/or findings for them misrepresented.”

***

The journal Psychological Medicine published the long-awaited findings on “recovery” in January, 2013. In the paper, the investigators imposed a serious limitation on their construct of “recovery.” They now defined it as recovery solely from the most recent bout of illness—a health status generally known as  “remission,” not “recovery.” The protocol definition included no such limitation.

In a commentary, Fred Friedberg, a psychologist in the psychiatry department at Stony Brook University and an expert on the illness, criticized the PACE authors’ use of the term “recovery” as inaccurate. “Their central construct…refers only to recovery from the current episode, rather than sustained recovery over long periods,” he and a colleague wrote. The term “remission,” they noted, was “less prone to misinterpretation and exaggeration.”

Tom Kindlon was more direct. “No one forced them to use the word ‘recovery’ in the protocol and in the title of the paper,” he said. “If they meant ‘remission,’ they should have said ‘remission.’” As with the release of the Lancet paper, when Chalder spoke of getting “back to normal” and the commentary claimed “recovery” based on a “strict criterion,” Kindlon believed the PACE approach to naming the paper and reporting the results would once again lead to inaccurate news reports touting claims of “recovery.”

In the new paper, the PACE investigators loosened all four of the protocol’s required criteria for “recovery” but did not mention which, if any, oversight committees approved this overall redefinition of the term. Two of the four revised criteria for “recovery” were the Lancet paper’s fatigue and physical function “normal ranges.” Like the Lancet paper, the Psychological Medicine paper did not point out that these “normal ranges”—now re-purposed as “recovery” thresholds–overlapped with the study’s entry criteria for disability, so that participants could already be “recovered” on one or both of these two indicators from the outset.

The four revised “recovery” criteria were:

*For physical function, “recovery” required a score of 60 or more. In the protocol, “recovery” required a score of 85 or more. At entry, a score of 65 or less was required to demonstrate enough disability to be included in the trial. This entry threshold of 65 indicated better health than the new “recovery” threshold of 60.

*For fatigue, a score of 18 or less out of 33 (on the fatigue scale, a higher score indicated more fatigue). In the protocol, “recovery” required a score of 3 or less out of 11 under the original scoring system. At entry, a score of at least 12 on the revised scale was required to demonstrate enough fatigue to be included the trial. This entry threshold of 12 indicated better health than the new “recovery” threshold of 18.

*A score of 1 (“very much better”) or 2 (“much better”) out of 7 on the Clinical Global Impression scale. In the protocol, “recovery” required a score of 1 (“very much better” on the Clinical Global Impression scale; a score of 2 (“much better”) was not good enough. The investigators made this change, they wrote, because “we considered that participants rating their overall health as ‘much better’ represented the process of recovery.” They did not cite references to justify their post-protocol reconsideration of the meaning of the Clinical Global Impression scale, nor did they explain when and why they changed their minds about how to interpret it.

*The last protocol requirement for “recovery”—not meeting any of the three case definitions used in the study–was now divided into less and more restrictive sub-categories. Presuming participants met the relaxed fatigue, physical function, and Clinical Global Impression thresholds, those who no longer met the Oxford criteria were now defined as having achieved “trial recovery,” even if they still met one of the other two case definitions, the CDC’s chronic fatigue syndrome case definition and the ME definition. Those who fulfilled the protocol’s stricter criteria of not meeting any of the three case definitions were now defined as having achieved “clinical recovery.” The authors did not explain when or why they decided to divide this category into two.

After these multiple relaxations of the protocol definition of “recovery,” the paper reported the full data for the less restrictive category of “trial recovery,” not the more restrictive category of “clinical recovery.” The authors found that the odds of “trial recovery” in the cognitive behavior therapy and graded exercise therapy arms were more than triple those in the adaptive pacing therapy and specialist medical care arms. They did not report having conducted any sensitivity analyses to measure the impact of all the changes in protocol definition of “recovery.”

They acknowledged that the “trial recovery” rate from the two rehabilitative treatments, at 22 percent in each group, was low. They suggested that increasing the total number of graded exercise therapy and cognitive behavior therapy sessions and/or bundling the two interventions could boost the rates.

***

Like the Lancet paper, the “recovery” findings received uncritical media coverage—and as Tom Kindlon feared, the news accounts did not generally mention “remission.” Nor did they discuss the dramatic changes in all four of the criteria from the original protocol definition of “recovery.” Not surprisingly, the report drew fire from patients and advocacy groups.

Commenters on the journal’s website and on patient and advocacy blogs challenged the revised definition for “recovery,” including the use of the overlapping “normal ranges” for fatigue and physical function as two of the four criteria. They wondered why the PACE authors used the term “recovery” at all, given the serious limitation they had placed on its meaning. They also noted that the investigators were ignoring the Lancet paper’s objective results from the six-minute walking test in assessing whether people had recovered, as well as the employment and benefits data from the PLoS One paper—all of which failed to support the “recovery” claims.

In their response, White and his colleagues defended their use of the term “recovery” by noting that they explained clearly what they meant in the paper itself. “We were careful to give a precise definition of recovery and to emphasize that it applied at one particular point only and to the current episode of illness,” they wrote. But they did not explain why, given that narrow definition, they simply did not use the standard term “remission, ” since there was always the possibility that the word “recovery” would lead to misunderstanding of the findings.

Once again, they did not address or explain why the entry criteria for disability and the outcome criteria for the physical function and fatigue “normal ranges”—now redefined as “recovery” thresholds–overlapped. They again did not explain why they used the statistical formula to find “normal ranges” for normally distributed populations on samples that they knew were skewed. And they now disavowed the significance of objective measures they themselves had selected, starting with the walking test, which had been described as “an objective outcome measure of physical capacity” in the protocol.

“We dispute that in the PACE trial the six-minute walking test offered a better and more ‘objective’ measure of recovery,” they now wrote, citing “practical limitations” with the data.

For one thing, the researchers now explained that during the walking test, in deference to participants’ poor health, they did not verbally encourage them, in contrast to standard practice. For another, they did not have follow-up walking tests for more than a quarter of the sample, a significant data gap that they did not explain. (One possible explanation is that participants were too sick to do the walking test at all, suggesting that the findings might have looked significantly worse if they had included actual results from those missing subjects.)

Finally, the PACE investigators explained, they had only 10 meters of corridor space for conducting the test, rather than the standard of 30 to 50 meters–although they did not explain whether all six of their study centers around the country, or just some of them, suffered from this deficiency. “This meant that participants had to stop and turn around more frequently, slowing them down and thereby vitiating comparisons with other studies,” wrote the investigators.

This explanation raised further questions, however. The investigators had started assessing participants–and administering the walking-test–in 2005. Yet two years later, in the protocol published in BMC Neurology, they did not mention any comparison-vitiating problems; instead, they described the walking test as an “objective” measure of physical capacity. While the protocol itself was written before the trial started, the authors posted a comment on the BMC Neurology web page in 2008, in response to patient comments, that reaffirmed the six-minute walking test as one of “several objective outcome measures.”

In their response in the Psychological Medicine correspondence, White and his colleagues did not explain if they had recognized the walking test’s comparison-vitiating limitations by the time they published their protocol in 2007 or their comment on BMC Neurology’s website in 2008–and if not, why not.

In their response, they also dismissed the relevance of their employment and benefits outcomes, which had been described as “another more objective measure of function” in the protocol. “Recovery from illness is a health status, not an economic one, and plenty of working people are unwell, while well people do not necessarily work,” they now wrote. “In addition, follow-up at 6 months after the end of therapy may be too short a period to affect either benefits or employment. We therefore disagree…that such outcomes constitute a useful component of recovery in the PACE trial.”

In conclusion, they wrote in their Psychological Medicine response, cognitive behavior therapy and graded exercise therapy “should now be routinely offered to all those who may benefit from them.”

***

Each published paper fueled new questions. Patients and advocates filed dozens of freedom-of-information requests for PACE-related documents and data with Queen Mary University of London, White’s institutional home and the designated administrator for such matters.

How many PACE participants, patients wanted to know, were “recovered” according to the much stricter criteria in the 2007 protocol? How many participants were already “within the normal range” on fatigue or physical function when they entered the study? When exactly were the changes made to the assessment strategies promised in the protocol, what oversight committees approved them, and why?

Some requests were granted. One response revealed that 85 participants—or 13 percent of the total sample–were already “recovered” or “within the normal range” for fatigue or physical function even as they qualified as disabled enough for the study. (Almost all of these, 78 participants, achieved the threshold for physical function alone; four achieved it for fatigue, and three for both.)

But many other requests have been turned down. Anna Sheridan, a long-time patient with a doctorate in physics, requested data last year on how the patients deemed “recovered” by the investigators in the 2013 Psychological Medicine paper had performed on the six-minute walking test. Queen Mary University rejected the request as “vexatious.”

Sheridan asked for an internal review. “As a scientist, I am seeking to understand the full implications of the research,” she wrote. “As a patient, the distance that I can walk is of incredible concern…When deciding to undertake a treatment such as CBT and GET, it is surely not unreasonable to want to know how far the patients who have recovered using these treatments can now walk.”

The university re-reviewed the request and informed Sheridan that it was not, in fact, “vexatious.” But her request was again being rejected, wrote the university, because the resources needed to locate and retrieve the information “would exceed the appropriate limit” designated by the law. Sheridan appealed the university’s decision to the next level, the U.K. Information Commissioner’s Office, but was recently turned down.

The Information Commissioner’s Office also turned down a request from a plaintiff seeking meeting minutes for PACE oversight committees to understand when and why outcome measures were changed. The plaintiff appealed to a higher-level venue, the First-Tier Tribunal. The tribunal panel–a judge and two lay members—upheld the decision, declaring that it was “pellucidly clear” that release of the minutes would threaten academic freedom and jeopardize future research.

The tribunal panel defended the extensive protocol changes as “common to most clinical trials” and asserted that the researchers “did not engineer the results or undermine the integrity of the findings.” The panel framed the many requests for trial documents and data as part of a campaign of harassment against the researchers, and sympathetically cited the heavy time burdens that the patients’ demands placed on White. In conclusion, wrote the panel, the tribunal “has no doubt that properly viewed in its context, this request should have been seen as vexatious–it was not a true request for information–rather its function was largely polemical.”

To date, the PACE investigators have rejected requests to release raw data from the trial for independent analysis. Patients and other critics say the researchers have a particular obligation to release the data because the trial was conducted with public funds.

Since the Lancet publication, much media coverage of the PACE investigators and their colleagues has focused on what The Guardian has called the “campaign of abuse and violence” purportedly being waged by “militants…considered to be as dangerous and uncompromising as animal rights extremists.” In a news account in the BMJ, White portrayed the protestors as hypocrites. “The paradox is that the campaigners want more research into CFS, but if they don’t like the science they campaign to stop it,” he told the publication. While news reports have also repeated the PACE authors’ claims of treatment success and “recovery,” these accounts have not generally examined the study itself in depth or investigated whether patients’ complaints about the trial are valid.

Tom Kindlon has often heard these arguments about patient activists and says they are used to deflect attention away from the PACE trial’s flaws. “They’ve said that the activists are unstable, the activists have illogical reasons and they are unfair or prejudiced against psychiatry, so they’re easy to dismiss,” said Kindlon.

What patients oppose, he and others explain, is not psychiatry or psychiatrists, but being told that their debilitating organic disease requires treatments based on the hypothesis that they have false cognitions about it.

***

In January of this year, the PACE authors published their paper on mediators of improvement in The Lancet Psychiatry. Not surprisingly, they found that reducing participants’ presumed fears of activity was the main mechanism through which the rehabilitative interventions of graded exercise therapy and cognitive behavior therapy delivered their purported benefits. News stories about the findings suggested that patients with ME/CFS could get better if they were able to rid themselves of their fears of activity.

Unmentioned in the media reports was a tiny graph tucked into a page with 13 other tiny graphs: the results of the self-paced step-test, the fitness measure promised in the protocol. The small graph indicated no advantages for the two rehabilitative intervention groups on the step-test. In fact, it appeared to show that those in the other two groups might have performed better. However, the paper did not include the data on which the graph was based, and the graph was too small to extract any useful data from it.

After publication of the study, a patient filed a request to obtain the actual step-test results that were used to create the graph. Queen Mary University rejected the request as “vexatious.”

With the publication of the step-test graph, the study’s key “objective” outcomes—except for the still-unreleased data on wages–had now all failed to support the claims of “recovery” and treatment success from the two rehabilitative therapies. The Lancet Psychiatry paper did not mention this serious lack support for the study’s subjective findings from all its key objective measures.

Some scientific developments since the 2011 Lancet paper–such as this year’s National Institutes of Health and Institute of Medicine panel reports, the Columbia University findings of distinct immune system signatures, further promising findings from Norwegian research into the immunomodulatory drug [see correction below] pioneered by rheumatoid arthritis expert Jonathan Edwards, and a growing body of evidence documenting patients’ abnormal responses to activity–have helped shift the focus to biomedical factors and away from PACE, at least outside Great Britain.

In the U.K. itself, the Medical Research Council, in a modest shift, has awarded some grants for biomedical research, but the PACE approach remains the dominant framework for treatment within the national health system. Two years ago, the disparate scientific and political factions launched the CFS/ME Research Collaborative, conceived as an umbrella organization representing a range of views. At the collaborative’s inaugural two-day gathering in Bristol in September of 2014, many speakers presented on promising biomedical research. Peter White’s talk, called “PACE: A Trial and Tribulations,” focused on the response to his study from disaffected patients.

According to the conference report, White cited the patient community’s “campaign against the PACE trial” for recruitment delays that forced the investigators to seek more time and money for the study. He spoke about “vexatious complaints” and demands for PACE-related data, and said he had so far fielded 168 freedom-of-information requests. (He’d received a freedom-of-information request asking how many freedom-of-information requests he’d received.) This type of patient activity “damages” research efforts, he said.

Jonathan Edwards, the rheumatoid arthritis expert now working on ME/CFS, filed a separate report on the conference for a popular patient forum. “I think I can only describe Dr. White’s presentation as out of place,” he wrote. After White briefly discussed the trial outcomes, noted Edwards, “he then spent the rest of his talk saying how unreasonable it was that patients did not gratefully accept this conclusion, indicating that this was an attack on science…

“I think it was unfortunate that Dr. White suggested that people were being unreasonable over the interpretation of the PACE study,” concluded Edwards. “Fortunately nobody seemed to take offence.”

Correction: The original text referred to the drug as an anti-inflammatory. 

TRIAL BY ERROR: The Troubling Case of the PACE Chronic Fatigue Syndrome Study (second installment)

By David Tuller, DrPH

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley. 

A few years ago, Dr. Racaniello let me hijack this space for a long piece about the CDC’s persistent incompetence in its efforts to address the devastating illness the agency itself had misnamed “chronic fatigue syndrome.” Now I’m back with an even longer piece about the U.K’s controversial and highly influential PACE trial. The $8 million study, funded by British government agencies, purportedly proved that patients could “recover” from the illness through treatment with one of two rehabilitative, non-pharmacological interventions: graded exercise therapy, involving a gradual increase in activity, and a specialized form of cognitive behavior therapy. The main authors, a well-established group of British mental health professionals, published their first results in The Lancet in 2011, with additional results in subsequent papers.

Much of what I report here will not be news to the patient and advocacy communities, which have produced a voluminous online archive of critical commentary on the PACE trial. I could not have written this piece without the benefit of that research and the help of a few statistics-savvy sources who talked me through their complicated findings. I am also indebted to colleagues and friends in both public health and journalism, who provided valuable suggestions and advice on earlier drafts. Yesterday, Virology Blog posted the first half of the story. Today’s installment was supposed to be the full second half. However, because the two final sections are each 4,000 words long, we decided to make it easier on readers, split the remainder into two posts, and publish them on successive days instead. I was originally working on this piece with Retraction Watch, but we could not ultimately agree on the direction and approach. 

After this article was posted, the PACE investigators replied, and in turn I responded to their criticisms. All the articles can be found at the ME/CFS page.

SUMMARY

This examination of the PACE trial of chronic fatigue syndrome identified several major flaws:

*The study included a bizarre paradox: participants’ baseline scores for the two primary outcomes of physical function and fatigue could qualify them simultaneously as disabled enough to get into the trial but already “recovered” on those indicators–even before any treatment. In fact, 13 percent of the study sample was already “recovered” on one of these two measures at the start of the study.

*In the middle of the study, the PACE team published a newsletter for participants that included glowing testimonials from earlier trial subjects about how much the “therapy” and “treatment” helped them. The newsletter also included an article informing participants that the two interventions pioneered by the investigators and being tested for efficacy in the trial, graded exercise therapy and cognitive behavior therapy, had been recommended as treatments by a U.K. government committee “based on the best available evidence.” The newsletter article did not mention that a key PACE investigator was also serving on the U.K. government committee that endorsed the PACE therapies.

*The PACE team changed all the methods outlined in its protocol for assessing the primary outcomes of physical function and fatigue, but did not take necessary steps to demonstrate that the revised methods and findings were robust, such as including sensitivity analyses. The researchers also relaxed all four of the criteria outlined in the protocol for defining “recovery.” They have rejected requests from patients for the findings as originally promised in the protocol as “vexatious.”

*The PACE claims of successful treatment and “recovery” were based solely on subjective outcomes. All the objective measures from the trial—a walking test, a step test, and data on employment and the receipt of financial information—failed to provide any evidence to support such claims. Afterwards, the PACE authors dismissed their own main objective measures as non-objective, irrelevant, or unreliable.

*In seeking informed consent, the PACE authors violated their own protocol, which included an explicit commitment to tell prospective participants about any possible conflicts of interest. The main investigators have had longstanding financial and consulting ties with disability insurance companies, having advised them for years that cognitive behavior therapy and graded exercise therapy could get claimants off benefits and back to work. Yet prospective participants were not told about any insurance industry links and the information was not included on consent forms. The authors did include the information in the “conflicts of interest” sections of the published papers.

Top researchers who have reviewed the study say it is fraught with indefensible methodological problems. Here is a sampling of their comments:

Dr. Bruce Levin, Columbia University: “To let participants know that interventions have been selected by a government committee ‘based on the best available evidence’ strikes me as the height of clinical trial amateurism.”

Dr. Ronald Davis, Stanford University: “I’m shocked that the Lancet published it…The PACE study has so many flaws and there are so many questions you’d want to ask about it that I don’t understand how it got through any kind of peer review.”

Dr. Arthur Reingold, University of California, Berkeley: “Under the circumstances, an independent review of the trial conducted by experts not involved in the design or conduct of the study would seem to be very much in order.”

Dr. Jonathan Edwards, University College London: “It’s a mass of un-interpretability to me…All the issues with the trial are extremely worrying, making interpretation of the clinical significance of the findings more or less impossible.”

Dr. Leonard Jason, DePaul University: “The PACE authors should have reduced the kind of blatant methodological lapses that can impugn the credibility of the research, such as having overlapping recovery and entry/disability criteria.”

************************************************************************

PART THREE:

The PACE Trial is Published

Trial recruitment and randomization into the four arms began in early 2005. In 2007, the investigators published a short version of their trial protocol in the journal BMC Neurology. They promised to provide the following results for their two primary measures:

*”Positive outcomes” for physical function, defined as achieving either an SF-36 score of 75 or more, or a 50% increase in score from baseline.

*“Positive outcomes” for fatigue, defined as achieving either a Chalder Fatigue Scale score of 3 or less, or a 50% reduction in score from baseline.

*“Overall improvers,” defined as participants who achieved “positive outcomes” for both physical function and fatigue.

The investigators also promised to provide results for what they defined as “recovery,” a secondary outcome that included four components:

*A physical function score of 85 or more.

*A fatigue score of 3 or less.

*A score of 1 (“very much better”) out of 7 on the Clinical Global Impression scale, a self-rated measure of overall health change.

*Not fulfilling any of the three case definitions used in the study (the Oxford criteria, the CDC criteria for chronic fatigue syndrome, and the myalgic encephalomyelitis criteria).

Tom Kindlon scrutinized the protocol for details on the promised objective outcomes. He knew that self-reported questionnaire responses could be influenced by extraneous factors like affection for the therapist or a desire to believe the treatment worked. He also knew that previous studies of rehabilitative treatments for the illness had shown that objective measurements often failed even when a study reported improvements in subjective measures.

“I’d make the analogy that if you’re measuring weight loss, you wouldn’t ask people if they think they’d lost weight, you’d measure them,” he said.

The protocol’s objective measures of physical capacity and function included:

*A six-minute walking test;

*A self-paced step-test (i.e. on a short stool);

*Data on employment, wages, and the receipt of benefits

***

On the trial website, the PACE team posted occasional “participants newsletters,” which featured updates on funding, recruitment and related developments. The third newsletter, dated December 2008, included words of praise for the trial from Prime Minister Gordon Brown’s office as well as an article about the government’s release of new clinical treatment guidelines for chronic fatigue syndrome.

The new U.K. clinical guidelines, the newsletter told participants, were “based on the best available evidence” and recommended treatment with cognitive behavior therapy and graded exercise therapy, the two rehabilitative approaches being studied in PACE. The newsletter did not mention that one of the key PACE investigators, physiotherapist Jessica Bavington, had also served on the U.K. government committee that endorsed the PACE therapies.

The same newsletter included a series of testimonials from participants about their positive outcomes from the “therapy” and “treatment,” although it did not mention the trial arms by name. The newsletter did not balance these positive accounts by including any comments from participants with poor outcomes. At that time, about a third of the participants—200 or so out of the final total of 641–still had one or more assessments to undergo, according to a recruitment chart in the same newsletter.

“The therapy was excellent,” wrote one participant. Another was “so happy that this treatment/trial has greatly changed my sleeping!” A third wrote: “Being included in this trial has helped me tremendously. (The treatment) is now a way of life for me.” A fourth noted: “(The therapist) is very helpful and gives me very useful advice and also motivates me.” One participant’s doctor wrote about the “positive changes” in his patient from the “therapy,” declared that the trial “clearly has the potential to transform [the] lives of many people,” and congratulated the PACE team on its “successful programme”—although no trial findings had yet been published.

Arthur Reingold, the head of epidemiology at the University of California, Berkeley, School of Public Health (and a colleague of mine), has reviewed innumerable clinical trials and observational studies in his decades of work and research with state, national and international public health agencies. He said he had never before seen a case in which researchers themselves had disseminated, mid-trial, such testimonials and statements promoting therapies under investigation. The situation raised concerns about the overall integrity of the study findings, he said.

Although specific interventions weren’t named, he added, the testimonials could still have biased responses in all of the arms toward the positive, or exerted some other unpredictable effect—especially since the primary outcomes were self-reported. (He’d also never seen a trial in which participants could be disabled enough for entry and “recovered” on an indicator simultaneously.)

“Given the subjective nature of the primary outcomes, broadcasting testimonials from those who had received interventions under study would seem to violate a basic tenet of research design, and potentially introduce substantial reporting and information bias,” said Reingold. “I am hard-pressed to recall a precedent for such an approach in other therapeutic trials. Under the circumstances, an independent review of the trial conducted by experts not involved in the design or conduct of the study would seem to be very much in order.”

***

As soon as the Lancet article was released, Kindlon began sharing his impressions with others online. “It was like a hive mind,” he said. “Gradually people spotted different problems and would post those points, and you could see the flaws in it.”

In addition to asserting that cognitive behavior therapy and exercise therapy were modestly effective, the Lancet paper declared these treatments to be safe—no signs of serious adverse events, despite patients’ concerns. The pacing therapy proved little or no better than the baseline condition of specialist medical care. And the results for the two subgroups defined by other criteria did not differ significantly from the overall findings.

It didn’t take long for Kindlon and the others to notice something unusual—the investigators had made a great many mid-trial changes, including in both primary measures. Facing lagging recruitment eleven months into the trial, the PACE authors explained in The Lancet, they had decided to raise the physical function entry threshold, from the initial 60 to the healthier threshold of 65. With the fatigue scale, they had decided to abandon the 0 or 1 bimodal scoring system in favor of continuous scoring, with each answer ranging from 0 to 3; the reason, they wrote, was “to more sensitively test our hypotheses.” (As collected, the data allowed for both scoring methods.)

They did not explain why they made the decision about the fatigue scale in the middle of the trial rather than before, nor why they simply didn’t provide the results with both scoring methods. They did not mention that in 2010, the FINE trial—a smaller study for severely disabled and homebound ME/CFS patients that tested a rehabilitative intervention related to those in PACE–reported no significant differences in final outcomes between study arms, using the same physical function and fatigue questionnaires as in PACE.

The analysis of the Chalder Fatigue Scale responses in the FINE paper were bimodal, like those promised in the PACE protocol. However, the FINE researchers later reported that a post-hoc analysis, in which they rescored the Chalder Fatigue Scale responses using the continuous scale of 0 to 3, had found modest benefits. The following year, the PACE team adopted the same revised approach in The Lancet.

The FINE study also received funding in 2003 from the Medical Research Council, and the PACE team referred to it as its “sister” trial. Yet the text of the Lancet paper included nothing about the FINE trial and its negative findings.

***

Besides these changes, the authors did not include the promised protocol data: results for  “positive outcomes” for fatigue and physical function, and for the “overall improvers” who achieved “positive outcomes” on both measures. Instead, noting that changes had been approved by oversight committees before outcome data had been examined, they introduced other statistical methods to assess the fatigue and physical function scores. All of their results showed modest advantages for cognitive behavior therapy and graded exercise therapy.

First, they compared the one-year changes in each arm’s average scores for physical function and fatigue. Yet unlike the method outlined in the protocol, this new mean-based measure did not provide information about a key factor of interest—the actual numbers or proportion of participants in each group who reported having gotten better or worse.

In another approach, which they identified as a post-hoc analysis, they determined the proportion of participants in each arm who achieved what they defined as a “clinically useful” benefit–an increase of eight points on the physical function scale and a decrease of two points on the revised fatigue scale. Unlike the first analysis, this post-hoc analysis did provide individual-level rather than aggregate responses. Yet post-hoc results never enjoy the level of confidence granted to pre-specified ones.

Moreover, the improvements required for what the researchers now called a “clinically useful” benefit were smaller than the minimum improvements needed to achieve the protocol’s threshold scores for “positive outcomes”—an increase of ten points on the physical function scale, from the entry threshold of 65 to 75, and a drop of three points on the original fatigue scale, from the entry threshold of 6 to 3.

A third method in the Lancet paper was another post-hoc analysis, this one assessing how many participants in each group achieved what the researchers called the “normal ranges” for fatigue and physical function. They calculated these “normal ranges” from earlier studies that reported the responses of large population samples to the SF-36 and Chalder Fatigue Scale questionnaires. The authors reported that 30 and 28 percent of participants in, respectively, the cognitive behavior therapy and graded exercise therapy arms scored within the “normal ranges” of representative populations for both fatigue and physical function, about double the rate in the other groups.

Of the key objective measures mentioned in the protocol, the Lancet paper only included the results of the six-minute walking test. Those in the exercise arm averaged a modest increase in distance walked of 67 meters, from 312 at baseline to 379 at one year, while those in the other three arms, including cognitive behavior therapy, made no significant improvements, from similar baseline values.

But the exercise arm’s performance was still evidence of serious disability, lagging far behind the mean performances of relatively healthy women from 70 to 79 years (490 meters), people with pacemakers (461 meters), patients with Class II heart failure (558 meters), and cystic fibrosis patients (626 meters). About three-quarters of the PACE participants were women; the average age was 38.

***

In reading the Lancet paper, Kindlon realized that Trudie Chalder was highlighting the post-hoc “normal range” analysis of the two primary outcomes when she spoke at the PACE press conference of “twice as many” participants in the cognitive behavior and exercise therapy arms getting “back to normal.” Yet he knew that “normal range” was a statistical construct, and did not mean the same thing as “back to normal” or “recovered” in medical terms.

The paper itself did not include any results for “recovery” from the illness, as defined using the four criteria outlined in the protocol. Given that, Kindlon believed Chalder had created unneeded confusion in referring to participants as “back to normal.” Moreover, he believed the colleagues of the PACE authors had compounded the problem with their claim in the accompanying commentary of a 30 percent “recovery” rate based on the same “normal range” analysis.

But Kindlon and others also noticed something very peculiar about these “normal ranges”: They overlapped with the criteria for entering the trial. While a physical function score of 65 was considered evidence of sufficient disability to be a study participant, the researchers had now declared that a score of 60 and above was “within the normal range.” Someone could therefore enter the trial with a physical function score of 65, become more disabled, leave with a score of 60, and still be considered within the PACE trial’s “normal range.”

The same bizarre paradox bedeviled the fatigue measure, in which a lower score indicated less fatigue. Under the revised, continuous method of scoring the answers on the Chalder Fatigue Scale, the 6 out of 11 required to demonstrate sufficient fatigue for entry translated into a score ranging from 12 and higher. Yet the PACE trial’s “normal range” for fatigue included any score of 18 or below. A participant could have started the trial with a revised fatigue score of 12, become more fatigued to score 18 at the end, and yet still been considered within the “normal range.”

“It was absurd that the criteria for ‘normal’ fatigue and physical functioning were lower than the entry criteria,” said Kindlon.

That meant, Kindlon realized, that some of the participants whom Chalder described as having gotten “back to normal” because they met the “normal range” threshold might have actually gotten worse during the study. And the same was true of the Lancet commentary accompanying the PACE paper, in which participants who met the peculiar “normal range” threshold were said to have achieved “recovery” according to a “strict criterion”—a definition of “recovery” that apparently survived the PACE authors’ pre-publication discussion of the commentary’s content.

Tom Kindlon wasn’t surprised when these “back to normal” and “recovery” claims became the focus of much of the news coverage. Yet it bothered him tremendously that Chalder and the commentary authors were able to generate such positive publicity from what was, after all, a post-hoc analysis that allowed participants to be severely disabled and “back to normal” or “recovered” simultaneously.

***

Perplexed at the findings, members of the online network checked out the population-based studies cited in PACE as the sources of the “normal ranges.” They discovered a serious problem. In those earlier studies, the responses to both the fatigue and physical function questionnaires did not form the symmetrical, bell-shaped curve known as a normal distribution. Instead, the responses were highly skewed, with many values clustered toward the healthier end of the scales—a frequent phenomenon in population-based health surveys.  However, to calculate the PACE “normal ranges,” the authors used a standard statistical method—taking the mean value, plus/minus one standard deviation, which identifies a range that includes 68% of the values in a normally distributed sample.

A 2007 paper co-authored by White noted that this formula for determining normal ranges “assumed a normal distribution of scores” and yielded different results given “a violation of the assumptions of normality”—that is, when the data did not fall into a normal distribution. White’s 2007 paper also noted that the population-based responses to the SF-36 physical function questionnaire were not normally distributed and that using statistical methods specifically designed for such skewed populations would therefore yield different results.

To determine the fatigue “normal range,” the PACE team used a 2010 paper co-authored by Chalder, which provided population-based responses to the Chalder Fatigue Scale. Like the population-based responses to the SF-36 questionnaire, the responses on the fatigue scale were also not normally distributed but skewed toward the healthy end, as the Chalder paper noted.

Despite White’s caveats in his 2007 paper about “a violation of the assumption of normality,” the PACE paper itself included no similar warnings about this major source of distortion in calculating both the physical function and fatigue “normal ranges” using the formula for normally distributed data. The Lancet paper also did not mention or discuss the implications of the head-scratching results: having outcome criteria that indicated worse health than the entry criteria for disability.

Bruce Levin, the Columbia biostatistician, said there are simple statistical formulas for calculating ranges that would include 68 percent of the values when the data are skewed and not normally distributed, as with the population-based data sources used by PACE for both the fatigue and physical function “normal ranges.” To apply the standard formula to data sources that have highly skewed distributions, said Levin, can lead to “very misleading” results.

***

Raising tough questions about the changes to the PACE protocol certainly conformed to the philosophy of the journal that published it. BioMed Central, the publisher of BMC Neurology, notes on its site that a major goal of publishing trial protocols is “enabling readers to compare what was originally intended with what was actually done, thus preventing both ‘data dredging’ and post-hoc revisions of study aims.” The BMC Neurology “editor’s comment” linked to the PACE protocol reinforced the message that the investigators should be held to account.

Unplanned changes to protocols are never advisable, and they present particular problems in unblinded trials like PACE, said Levin, the Columbia biostatistician. Investigators in such trials might easily sense the outcome trends long before examining the actual outcome data, and that knowledge could influence how they revise the measures from the protocol, he said.

And even when changes are approved by appropriate oversight committees, added Levin, researchers must take steps to address concerns about the impacts on results. These steps might include reporting the findings under both the initial and the revised methods in sensitivity analyses, which can assess whether different assumptions or conditions would cause significant differences in the results, he said.

“And where substantive differences in results occur, the investigators need to explain why those differences arise and convince an appropriately skeptical audience why the revised findings should be given greater weight than those using the a priori measures.” said Levin, noting that the PACE authors did not take these steps.

***

Some PACE trial participants were unpleasantly surprised to learn only after the trial of the researchers’ financial and consulting ties to insurance companies. The researchers disclosed these links in the “conflicts of interest” section of the Lancet article. Yet the authors had promised to adhere to the Declaration of Helsinki, an international human research ethics code mandating that prospective trial participants be informed about “any possible conflicts of interest” and “institutional affiliations of the researcher.”

The sample participant information and consent forms in the final approved protocol did not include any of the information. Four trial participants interviewed, three in person and one by telephone, all said they were not informed before or during the study about the PACE investigators’ ties to insurance companies, especially those in the disability sector. Two said they would have agreed to be in the trial anyway because they lacked other options; two said it would have impacted their decision to participate.

Rhiannon Chaffer said she would likely have refused to be in the trial, had she known beforehand. “I’m skeptical of anything that’s backed by insurance, so it would have made a difference to me because it would have felt like the trial wasn’t independent,” said Chaffer, in her mid-30s, who became ill in 2006 and attended a PACE trial center in Bristol.

Another of the four withdrew her consent retroactively and forbade the researchers from using her data in the published results. “I wasn’t given the option of being informed, quite honestly,” she said, requesting anonymity because of ongoing legal matters related to her illness. “I felt quite pissed off and betrayed. I felt like they lied by omission.”

(None of the participants, including three in the cognitive behavior therapy arm, felt the trial had reversed their illness. I will describe these participants’ experiences at a later point).

Tomorrow: The Aftermath

TRIAL BY ERROR: The Troubling Case of the PACE Chronic Fatigue Syndrome Study

By David Tuller, DrPH

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley. 

A few years ago, Dr. Racaniello let me hijack this space for a long piece about the CDC’s persistent incompetence in its efforts to address the devastating illness the agency itself had misnamed “chronic fatigue syndrome.” Now I’m back with an even longer piece about the U.K’s controversial and highly influential PACE trial. The $8 million study, funded by British government agencies, purportedly proved that patients could “recover” from the illness through treatment with one of two rehabilitative, non-pharmacological interventions: graded exercise therapy, involving a gradual increase in activity, and a specialized form of cognitive behavior therapy. The main authors, a well-established group of British mental health professionals, published their first results in The Lancet in 2011, with additional results in subsequent papers.

Much of what I report here will not be news to the patient and advocacy communities, which have produced a voluminous online archive of critical commentary on the PACE trial. I could not have written this piece without the benefit of that research and the help of a few statistics-savvy sources who talked me through their complicated findings. I am also indebted to colleagues and friends in both public health and journalism, who provided valuable suggestions and advice on earlier drafts. Today’s Virology Blog installment is the first half; the second half will be posted in two parts, tomorrow and the next day. I was originally working on this piece with Retraction Watch, but we could not ultimately agree on the direction and approach. 

After this article was posted, the PACE investigators replied, and in turn I responded to their criticisms. All the articles can be found at the ME/CFS page.

SUMMARY

This examination of the PACE trial of chronic fatigue syndrome identified several major flaws:

*The study included a bizarre paradox: participants’ baseline scores for the two primary outcomes of physical function and fatigue could qualify them simultaneously as disabled enough to get into the trial but already “recovered” on those indicators–even before any treatment. In fact, 13 percent of the study sample was already “recovered” on one of these two measures at the start of the study.

*In the middle of the study, the PACE team published a newsletter for participants that included glowing testimonials from earlier trial subjects about how much the “therapy” and “treatment” helped them. The newsletter also included an article informing participants that the two interventions pioneered by the investigators and being tested for efficacy in the trial, graded exercise therapy and cognitive behavior therapy, had been recommended as treatments by a U.K. government committee “based on the best available evidence.” The newsletter article did not mention that a key PACE investigator was also serving on the U.K. government committee that endorsed the PACE therapies.

*The PACE team changed all the methods outlined in its protocol for assessing the primary outcomes of physical function and fatigue, but did not take necessary steps to demonstrate that the revised methods and findings were robust, such as including sensitivity analyses. The researchers also relaxed all four of the criteria outlined in the protocol for defining “recovery.” They have rejected requests from patients for the findings as originally promised in the protocol as “vexatious.”

*The PACE claims of successful treatment and “recovery” were based solely on subjective outcomes. All the objective measures from the trial—a walking test, a step test, and data on employment and the receipt of financial information—failed to provide any evidence to support such claims. Afterwards, the PACE authors dismissed their own main objective measures as non-objective, irrelevant, or unreliable.

*In seeking informed consent, the PACE authors violated their own protocol, which included an explicit commitment to tell prospective participants about any possible conflicts of interest. The main investigators have had longstanding financial and consulting ties with disability insurance companies, having advised them for years that cognitive behavior therapy and graded exercise therapy could get claimants off benefits and back to work. Yet prospective participants were not told about any insurance industry links and the information was not included on consent forms. The authors did include the information in the “conflicts of interest” sections of the published papers.

Top researchers who have reviewed the study say it is fraught with indefensible methodological problems. Here is a sampling of their comments:

Dr. Bruce Levin, Columbia University: “To let participants know that interventions have been selected by a government committee ‘based on the best available evidence’ strikes me as the height of clinical trial amateurism.”

Dr. Ronald Davis, Stanford University: “I’m shocked that the Lancet published it…The PACE study has so many flaws and there are so many questions you’d want to ask about it that I don’t understand how it got through any kind of peer review.”

Dr. Arthur Reingold, University of California, Berkeley: “Under the circumstances, an independent review of the trial conducted by experts not involved in the design or conduct of the study would seem to be very much in order.”

Dr. Jonathan Edwards, University College London: “It’s a mass of un-interpretability to me…All the issues with the trial are extremely worrying, making interpretation of the clinical significance of the findings more or less impossible.”

Dr. Leonard Jason, DePaul University: “The PACE authors should have reduced the kind of blatant methodological lapses that can impugn the credibility of the research, such as having overlapping recovery and entry/disability criteria.”

************************************************************************

PART ONE:

The PACE Trial, Deconstructed

On Feb 17, 2011, at a press conference in London, psychiatrist Michael Sharpe and behavioral psychologist Trudie Chalder, members of the British medical and academic establishments, unveiled the results of a controversial clinical trial of more than 600 people diagnosed with chronic fatigue syndrome. The findings were being published in The Lancet. As with many things about the illness, the news was expected to cause a stir.

The study, known as the PACE trial, was the largest ever of treatments for chronic fatigue syndrome. The authors were among a prominent group of British mental health professionals who had long argued that the devastating symptoms were caused by severe physical deconditioning. They recognized that many people experienced an acute viral infection or other illness as an initial trigger. However, they believed that the syndrome was perpetuated by patients’ “unhelpful” and “dysfunctional” notion that they continued to suffer from an organic disease—and that exertion would make them worse. According to the experts’ theory, patients’ decision to remain sedentary for prolonged periods led to muscle atrophy and other negative systemic physiological impacts, which then caused even more fatigue and other symptoms in a self-perpetuating cycle.

An estimated one to 2.5 million Americans, a quarter of a million British, and an unknown number of others around the world suffer from chronic fatigue syndrome. The illness often leaves patients too sick to work, attend school, or take care of their children, with a significant minority home-bound for months or years. It is a terrible public health burden, costing society billions of dollars a year in medical care and lost productivity. But what causes it and what to do about it have been fiercely debated for decades.

Patients and many leading scientists view the debilitating ailment as caused by pathological disease processes, not by physical deconditioning. Studies have shown that the illness is characterized by immunological and neurological dysfunctions, and many academic and government scientists say that the search for organic causes, diagnostic tests and drug interventions is paramount. Some recent research has generated excitement. In February, for example, a Columbia-led team reported distinct patterns of immune system response in early-stage patients—findings that could ultimately lead to a biomarker able to identify the presence of the illness.

In contrast, the British mental health experts have focused on non-pharmacological rehabilitative therapies, aimed at improving patients’ physical capacities and altering their perceptions of their condition through behavioral and psychological approaches. The PACE trial was designed to be a definitive test of two such treatments they had pioneered to help patients recover and get back to work. British government agencies, eager to stem health and disability costs related to the illness, had committed five million pounds—close to $8,000,000 at current exchange rates–to support the research.

At the press conference, Sharpe and Chalder touted the two treatments—an incremental increase in activity known as “graded exercise therapy,” and a specialized form of cognitive behavior therapy—as effective in reversing the illness. Citing participant responses on questionnaires about fatigue and physical function, Chalder declared that, compared to other study subjects, “twice as many people on graded exercise therapy and cognitive behaviour therapy got back to normal.”

A Lancet guest commentary, whose contents were discussed in advance with the PACE authors, amplified the positive news, stating that about 30 percent of patients in the two rehabilitative treatment arms had achieved “recovery.” Headlines and stories around the world trumpeted the results.

“Fatigued patients who go out and exercise have best hope of recovery, finds study,” declared The Daily Mail. “Psychotherapy Eases Chronic Fatigue Syndrome, Study Finds,” stated The New York Times headline (I wrote the accompanying story.) According to BMJ’s report about the trial, some PACE participants were “cured” of the illness.

***

Some 300 miles to the northwest in Castleknock, a middle-class suburb of Dublin, Tom Kindlon read and re-read the Lancet paper and reviewed the upbeat news coverage. The more he reviewed and re-reviewed everything, the more frustrated and angry he became. The investigators, he observed, were spinning the study as a success in one of the world’s preeminent scientific publications, and the press was lapping it up.

“The paper was pure hype for graded exercise therapy and cognitive behavior therapy,” said Kindlon in a recent interview. “But it was a huge trial, getting huge coverage, and they were getting a very influential base to push their views.”

Kindlon had struggled with the illness for more than two decades. In 1993, his health problems forced him to drop his math studies at Dublin’s prestigious Trinity College; he’d been largely homebound since. With his acumen for statistics, Kindlon was known in the advocacy community for his nuanced understanding of the research.

He shared this passion with a small group of other science-savvy patients he’d met through online networks. Kindlon and the others were particularly worried about the PACE trial, first announced in 2003. They knew the results would wield great influence on government health policies, public attitudes, and future research—not only in Great Britain, but in the U.S. and elsewhere as well.

Like others in the patient and advocacy communities, they believed the evidence clearly pointed to an ongoing biological disease, not physical debility caused by deconditioning. They bristled with offense at the suggestion they would get better if only they could change their perceptions about their condition. And pushing themselves to be more active not only wasn’t helpful, they insisted, but could trigger a serious and extended relapse.

In the four years since the Lancet publication, Kindlon and others have pressed for an independent review of the trial data. They have produced a sprawling online literature deconstructing the trial’s methodology, submitted dozens of freedom-of-information requests for PACE-related documents and data, and published their criticisms on the websites and letters columns of leading medical journals. Their concerns, if true, would raise serious questions about the study’s findings.

***

For their part, the PACE investigators have released additional results from the trial. These have included a 2012 paper on economic aspects in PLoS One, a 2013 paper on “recovery” in Psychological Medicine, and a “mediation analysis” paper last January in The Lancet Psychiatry suggesting that reducing patients’ purported fears of activity mediated improvement.

But this investigation–based on many dozens of interviews and a review of thousands of pages of documents–has confirmed that some of the major criticisms of the trial are accurate. (The documents reviewed included, among others, the trial protocol, the manuals for the trial’s four arms, participant information and consent forms, meeting minutes of oversight committees, critical reports written by patients and advocates, transcripts of parliamentary hearings, and many dozens of peer-reviewed studies. Some documents were obtained by patients under freedom-of-information requests and either posted online or provided to me.)

Among the findings:

*The trial included a bizarre paradox: Participants’ baseline scores for physical function and fatigue could qualify them simultaneously as sick enough to get into the trial but already “recovered” on those indicators–even before any treatment. In other words, the thresholds for being “recovered” demonstrated worse health than the scores required in the first place to demonstrate the severe disability needed to enter the trial. This anomaly meant that some participants could get worse on physical function and fatigue during the trial and still be included in the results as being “recovered.” Data obtained by a patient through a freedom-of-information request indicated that 13 percent of the participants were already “recovered” for physical function or fatigue, or both, when they joined the study—a fact not mentioned in any of the published papers. (In the 2011 Lancet paper, participants who met these unusual thresholds were referred to not as having “recovered” but as being “within normal range.” In the 2013 Psychological Medicine paper, the same thresholds were re-purposed as indicators of “recovery.”)

*During the study, the PACE team published a “participants newsletter” that included glowing testimonials from earlier trial subjects about how the “therapy” and “treatment” had improved their lives. An article in the same newsletter also reported that the U.K. government’s newly released clinical guidelines for the illness recommended the two rehabilitative treatments under investigation, cognitive behavior therapy and graded exercise therapy, “based on the best available evidence.” (The article didn’t mention that a key PACE investigator also served on the U.K. government committee that endorsed the two PACE therapies.) The testimonials and the statements promoting the two therapies could have biased the responses of the 200 or so remaining participants, about a third of the total study sample.

*The investigators abandoned all the criteria outlined in their protocol for assessing their two primary measures of fatigue and physical function, and adopted new ones (in the 2011 Lancet paper). They also significantly relaxed all four of their criteria for defining “recovery” (in the 2013 Psychological Medicine paper). They did not report having taken the necessary steps to assess the impacts of these changes, such as conducting sensitivity analyses. Such protocol changes contradicted the ethos of BMC Neurology, the journal that published the PACE protocol in 2007. An “editor’s comment” linked to the protocol urged readers to review the published results and to contact the authors “to ensure that no deviations from the protocol occurred during the study.” The PACE team has rejected freedom-of-information requests for the results as promised in the protocol as “vexatious.”

*The study’s two primary outcomes were subjective, but in the 2007 published protocol the investigators also included several “objective” secondary outcomes to assess physical capacity, fitness and function; these measures included a six-minute walking test, a self-paced step test, and data on employment, wages and financial benefits. These findings utterly failed to support the subjective reports that the authors had interpreted as demonstrating successful treatment and “recovery.” In subsequently published comments, the authors then disputed the relevance, reliability and “objectivity” of the main objective measures they themselves had selected.

*In seeking informed consent, the investigators violated a major international research ethics code that they promised, in their protocol, to observe. A key provision of the Declaration of Helsinki, developed after WW II to protect human research subjects, requires that study participants be “adequately informed” of researchers’ “possible conflicts of interest” and “institutional affiliations.” The key PACE authors have longstanding financial and consulting ties to the disability insurance industry; they have advised insurers for years that cognitive behavior therapy and graded exercise therapy can get patients off benefits and back to work. In the papers published in The Lancet and other journals, the PACE authors disclosed their industry ties, yet they did not reveal this information to prospective trial subjects. Of four participants interviewed, two said the knowledge would have impacted their decision to participate; one retroactively withdrew her consent and forbade the researchers from including her data.

***

I did not interview Chalder, Sharpe, and Peter White, also a psychiatrist and the lead PACE investigator, for this story. Chalder did not respond to an e-mail last December seeking interviews. Sharpe and White both e-mailed back, declining to be interviewed [see correction below]. In his message, White wrote that, after consulting with his colleagues and reviewing my past reporting on the illness, “I have concluded that it would not be worthwhile our having a conversation…We think our work speaks for itself.” A second request for interviews, sent last week to the three investigators, also proved unsuccessful.

(I did have a telephone conversation with Chalder in January of this year, organized as part of the media campaign for the Lancet Psychiatry paper published that month by the PACE team. In Chalder’s memory of the conversation, we talked at length about some of the major concerns examined here. In my memory, she mostly declined to talk about concerns related to the 2011 Lancet paper, pleading poor recall of the details.)

Richard Horton, the editor of The Lancet, was also not interviewed for this story. Last December, his office declined an e-mail request for an interview. A second e-mail seeking comment, sent to Horton last week, was not answered.

***

Experts who have examined the PACE study say it is fraught with problems.

“I’m shocked that the Lancet published it,” said Ronald Davis, a well-known geneticist at Stanford University and the director of the scientific advisory board of the Open Medicine Foundation. The foundation, whose board also includes three Nobel laureates, supports research on ME/CFS and is currently focused on identifying an accurate biomarker for the illness.

“The PACE study has so many flaws and there are so many questions you’d want to ask about it that I don’t understand how it got through any kind of peer review,” added Davis, who became involved in the field after his son became severely ill. “Maybe The Lancet picked reviewers who agreed with the authors and raved about the paper, and the journal went along without digging into the details.”

In an e-mail interview, DePaul University psychology professor Leonard Jason, an expert on the illness, said the study’s statistical anomalies were hard to overlook. “The PACE authors should have reduced the kind of blatant methodological lapses that can impugn the credibility of the research, such as having overlapping recovery and entry/disability criteria,” wrote Jason, a prolific researcher widely respected among scientists, health officials and patients.

Jason, who was himself diagnosed with the illness in the early 1990s, also noted that researchers cannot simply ignore their own assurances that they will follow specific ethical guidelines. “If you’ve promised to disclose conflicts of interest by promising to follow a protocol, you can’t just decide not to do it,” he said.

Jonathan Edwards, a professor emeritus of connective tissue medicine from University College London, pioneered a novel rheumatoid arthritis treatment in a large clinical trial published in the New England Journal of Medicine in 2004. For the last couple of years, he has been involved in organizing clinical trial research to test the same drug, rituximab, for chronic fatigue syndrome, which shares traits with rheumatoid arthritis and other autoimmune disorders.

When he first read the Lancet paper, Edwards was taken aback: Not only did the trial rely on subjective measures, but participants and therapists all knew which treatment was being administered, unlike in a double-blinded trial. This unblinded design made PACE particularly vulnerable to generating biased results, said Edwards in a phone interview, adding that the newsletter testimonials and other methodological flaws only made things worse.

“It’s a mass of un-interpretability to me,” said Edwards, who last year called the PACE results “valueless” in publicly posted comments. “Within the circle who are involved in this field, it seems there were a group who were prepared to all sing by the hymn sheet and agree that PACE was wonderful. But all the issues with the trial are extremely worrying, making interpretation of the clinical significance of the findings more or less impossible.”

Bruce Levin, a professor of biostatistics at Columbia University and an expert in clinical trial design, said that unplanned, post-protocol changes in primary outcomes should be made only when absolutely necessary, and that any such changes inevitably raised questions about interpretation of the results. In any event, he added, it would never be acceptable for such revisions to include “normal range” or “recovery” thresholds that overlapped with the study’s entry criteria.

“I have never seen a trial design where eligibility requirements for a disease alone would qualify some patients for having had a successful treatment,” said Levin, who has been involved in research on the illness and has reviewed the PACE study. “It calls into question the diagnosis of an illness whose patients already rate as ‘recovered’ or ‘within normal range.’ I find it nearly inconceivable that a trial’s data monitoring committee would have approved such a protocol problem if they were aware of it.”

Levin also said the mid-trial publication of the newsletter featuring participant testimonials and positive news about interventions under investigation created legitimate concerns that subsequent responses might have been biased, especially in an unblinded study with subjective outcomes like PACE.

“It is highly inappropriate to publish anything during an ongoing clinical trial,” said Levin. “To let participants know that interventions have been selected by a government committee ‘based on the best available evidence’ strikes me as the height of clinical trial amateurism.”

At the least, the PACE researchers should have evaluated the responses from before and afterwards to assess any resulting bias, he added.

***

Recent U.S. government reports have raised further challenges for the PACE approach. In June, a panel convened by the National Institutes of Health recommended that researchers abandon a core aspect of the PACE trial design—its method of identifying participants through the single symptom of prolonged fatigue, rather than a more detailed set of criteria. This method, the panel’s report noted, could “impair progress and cause harm” because it identifies people with many fatiguing conditions, making it hard to interpret the findings.

Last February, the Institute of Medicine released its own study, commissioned by several health agencies and based on an extensive literature review, which described the illness as a serious organic disease, not a cognitive or behavioral disorder characterized by “unhelpful beliefs” that lead to sedentary behavior. Two members of the IOM panel, in discussing their report with Medscape, cast sharp doubt on the central argument advanced for years by the British mental health professionals: that physical deconditioning alone perpetuates the devastating symptoms.

Ellen Wright Clayton, the panel chair and a professor of pediatrics and law at Vanderbilt University, said lack of activity could not possibly explain the scope and severity of patients’ symptoms. “The level of response is much more than would be seen with deconditioning,” she told Medscape. Peter Rowe, a pediatrician at Johns Hopkins and an expert on the disease, called the deconditioning hypothesis “flawed” and “untenable.”

The PACE investigators have strongly defended the integrity of their research and say that patients and advocacy groups have harassed and vilified them for years without justification. In 2011, The Guardian reported that Sharpe had been stalked by a woman who brought a knife to one of his lectures. A 2013 report in The Sunday Times noted that psychiatrist Simon Wessely, a senior colleague and adviser to the PACE authors, had received death threats, and that “one person rang him up and threatened to castrate him.”

No one is known to have been charged in these and other cases of reported threats or harassment.

 

Correction: The original text indicated that Sharpe did not respond to the December e-mail at all.

************************************************************************

PART TWO:

The Origins of the PACE Trial

Tom Kindlon, six feet tall and bulky, can only stand up for half a minute before dizziness and balance problems force him back down. He has a round face, wire-rimmed glasses, an engaging smile, and beard scruff. Direct light hurts his eyes. He wears a baseball cap to shield them.

Kindlon, 43, still lives with his parents in the two-story, four-bedroom house where he grew up. His mum, Vera, is his primary caretaker. He remains close with his three younger siblings— Ali, 40, and twins David and Deirdre, who are 35. All live nearby and help out when needed.

For the last 15 years, Kindlon has harnessed his limited energy for what he perceives as his primary mission: reviewing, and responding to, the literature on the illness. He has published more than a dozen peer-reviewed letters in scientific publications and regularly posts on the public forums and “rapid response” sections of journal websites, politely debating, dissecting and debunking questionable research claims.

“I haven’t read a fiction book in 20 years,” he noted, during a series of conversations ranging across Skype, Facebook, Twitter, and e-mail. “I need to be blinkered in what I do and don’t read, to concentrate and use my mental energy for this material.”

As a teenager, Kindlon loved playing rugby, cricket, tennis and soccer. When he was 16, he spent five days in western Ireland on a hiking and sailing trip with high school classmates. It was February, damp and chilly, and he was already suffering from a cold or some other bug; back in Dublin, he felt worse and stayed home for several days.

When he returned to school, he discovered something weird: After a round of sports, he now experienced muscle pains and a paralyzing exhaustion unlike anything he’d previously encountered. “I’d be totally whacked by the end of the day,” he recalled.

He saw a physiotherapist and then an orthopedic surgeon, who told him to exercise more. He tried swimming, but that also left him depleted. In 1991, despite his health struggles, he entered Trinity College. He slogged through two years of math studies but suffered more and more from problems with memory and concentration. “I was forgetting things, making silly errors,” he said.

Toward the end of the second year, he could no longer hold a pen in his hand. He developed tendonitis, first in one arm, then in the other. When he drove, pushing the pedals caused severe ankle pain. “Everything was magnified now,” he said. “I was just breaking down.” He took a leave from Trinity. His health continued to slide.

***

Then Kindlon read something about myalgic encephalomyelitis, or ME—an alternate name for chronic fatigue syndrome frequently used in the U.K., meaning “inflammation of the brain and spinal cord, with muscle pain.” A specialist confirmed the diagnosis.

Since there are no approved medical tests, diagnosis has generally been made based on symptoms, after other possibilities have been excluded. A major clue in Kindlon’s case was his experience of a prolonged collapse after sports. Almost all patients report this unusual symptom, called “post-exertional malaise”–a sustained relapse or worsening after a minimal amount of exertion or activity.

It was September, 1994. Tom Kindlon was 22 years old. He could just about drag himself to the toilet a few times a day. He could hold a brief conversation, though he often couldn’t remember what he or anyone else had said.

Soon after his diagnosis, he heard about a local support group called the Irish ME Association. Vera attended a meeting to learn more. She became a fixture at the monthly gatherings, and soon was voted chair of the group; her son was appointed assistant chair. Though his condition gradually stabilized and sometimes even seemed to improve a little, he never felt well enough to attend meetings and worked instead from home.

At the time, the organization only had a few dozen members. “I felt the group could get bigger than just people sitting in circles,” Kindlon said. “We needed to raise awareness. I wanted people’s stories to be told.”

On May 12, 1996, designated by U.K. advocates as International ME Day, the small Irish group held a public event. Vera spoke on national radio. The Kindlons, mother and son, publicized the group’s work, and by 2000 the membership list topped 400.

Through a leadership listserv, Kindlon maintained contact with dozens of patient support and advocacy groups around the UK and elsewhere; the network kept him abreast of the major scientific, public health, and political developments related to the illness. Then he learned about the PACE trial.

***

In the mid-1980s, several outbreaks of a disabling and prolonged flu-like illness popped up across the U.S. Although clinicians treating some of the patients believed it was associated with the Epstein-Barr virus, which causes mononucleosis, CDC investigators were unable to identify a link with that or other pathogens.

The CDC team called the mysterious condition “chronic fatigue syndrome” after rejecting the name “myalgic encephalomyelitis,” coined after a similar outbreak at a London hospital in the 1950s. The key symptom of myalgic encephalomyelitis had been identified as extreme muscle fatigue after minimal exertion, with delayed recovery—essentially, a description of post-exertional malaise, Tom Kindlon’s main symptom. The CDC also rejected “post-viral fatigue syndrome,” another common name. In contrast, the World Health Organization, which had years earlier classified “benign myalgic encephalomyelitis” as a neurological disorder, deemed both post-viral fatigue syndrome and chronic fatigue syndrome to be synonyms. (The word “benign” eventually fell out of common use.)

In the U.S., the disease is now often being called ME/CFS by government agencies; the recent report from the Institute of Medicine suggested renaming it “systemic exertion intolerance disease,” or SEID. In the U.K, it is often called CFS/ME.

Patients have always hated the name chronic fatigue syndrome. For one thing, the word “fatigue” does not come close to describing the profound depletion of energy that marks the illness. A few years ago, best-selling author and long-time patient Laura Hillenbrand (Unbroken; Seabiscuit) once told The New York Times: “This disease leaves people bedridden. I’ve gone through phases where I couldn’t roll over in bed. I couldn’t speak. To have it called ‘fatigue’ is a gross misnomer.”

Patients, clinicians and scientists say the name is also inaccurate because the hallmark is not fatigue itself but more specifically what Tom Kindlon experienced—the relapses known as post-exertional malaise. (Patients also criticize the word ‘malaise,’ like ‘fatigue,’ as inaccurate and inadequate, and many prefer to call the symptom ‘post-exertional relapse.’) Other core symptoms are cognitive and neurological problems, sleep disorders, and in many cases muscle pain.

Researchers have not been able to identify a specific cause—at least in part because investigators have used many different criteria to define the illness and identify study subjects, making it hard to compare results. In many cases, as in the 1980s outbreaks, ME/CFS appears to be triggered by a viral or other infection from which people never recover. Since patients often don’t seek treatment and are not diagnosed until they have been sick for a long time, research on triggering events has often been based on self-reports of an initial infection rather than laboratory confirmation. However, a prospective 2006 study from Australian researchers and the CDC found that 11 percent of more than 250 patients who were followed after acute cases of mononucleosis, Q fever, and Ross River virus met diagnostic criteria for chronic fatigue syndrome six months later.

Although in some cases patients report a gradual start to the illness, a 2011 definition of myalgic encephalomyelitis developed by an international expert committee noted that “most patients have an acute infectious onset with flu-like and/or respiratory symptoms.” In fact, many experts believe ME/CFS is likely a cluster of related illnesses, in which one or more infections, or exposures to toxins, mold, stress, trauma or other physiological insults, spark the immune system into a persistent state of hyper-activation, with the resulting inflammation and other systemic effects causing the symptoms. Like the varying methods for defining the illness, the heterogeneity of potential triggering events among chronic fatigue syndrome populations has also complicated research. Without accurate sub-grouping, the findings from such samples can undermine rather than promote the search for causes, biomarkers and treatments.

The illness can fluctuate over time. Long-term patients sometimes experience periods of moderate remission, but few appear to recover completely. Most treatment has involved symptomatic relief.

Although research has been hampered by limited government support, studies over the years have documented a wide range of biological abnormalities as well as associations with a host of pathogens. But some promising leads have not panned out, most spectacularly several years ago when an apparent association with mouse retroviruses turned out to be the result of lab contamination—a devastating blow to patients.

***

For their part, the PACE investigators have collectively published hundreds of studies and reports about the illness, which they prefer to call chronic fatigue syndrome. In their model, the syndrome starts when people become sick—often from a virus, sometimes from other causes. This short-term illness leaves them exhausted; when the infection or other cause passes and they try to resume normal activity, they feel weakened and symptomatic again. This response is expected given their deconditioned state, according to the model, yet patients become fearful that they are still sick and decide they need more rest.

Then, instead of undergoing a normal recovery, they develop what the PACE authors have called “unhelpful beliefs” or “dysfunctional cognitions”–more specifically, the unhelpful belief that they continue to suffer from an infection or some other medical disease that will get worse if they exert themselves. Patients guided by these faulty cognitions further reduce their activity and, per the theory, become even more deconditioned, ultimately leading to “a chronic fatigue state in which symptoms are perpetuated by a cycle of inactivity, deterioration in exercise tolerance and further symptoms,” noted a 1989 article whose authors included Chalder and Simon Wessely, the PACE investigators’ longtime colleague.

The two rehabilitative therapies were designed to interrupt this downward spiral and restore patients’ sense of control over their health, in part through positive reinforcement and encouragement that recovery was possible. The course of cognitive behavior therapy, known as CBT, was specifically designed and structured to help chronic fatigue syndrome patients alleviate themselves of the “unhelpful beliefs” that purportedly kept them sedentary, and to encourage them to re-engage with daily life. (Standard forms of cognitive behavior therapy are recommended for helping people deal with all kinds of adversity, including major illness, yet doctors do not suggest that it is an actual treatment for cancer, multiple sclerosis, or renal failure.) The increase in activity known as graded exercise therapy, or GET, sought to counteract the deconditioning by getting people moving again in planned, incremental steps.

Through their extensive writings and their consulting roles with government agencies, Sharpe, Chalder, White, and their colleagues have long exerted a major impact on treatment. In the U.K., the National Health Service has primarily favored cognitive behavior therapy and graded exercise therapy, or related approaches, even in specialized clinics.

In the U.S., the Centers for Disease Control and Prevention has collaborated with White, Sharpe and some of their colleagues for decades. The agency recommends the two treatments on its website and in its now-archived CFS Toolkit for health professionals about how to treat the illness. The toolkit recommends contacting St. Bartholomew’s—the venerable London hospital that is one of White’s professional homes—for more information about graded exercise therapy.

White, the lead author of the Lancet paper, is a professor of psychological medicine at Queen Mary University of London and co-leads the chronic fatigue syndrome service at St. Bartholomew’s. Sharpe is a professor of psychological medicine at Oxford University, and Chalder is a professor of cognitive behavioral psychotherapy at King’s College London. Their faculty webpages currently credit them with, respectively, 90, 366 and 205 publications.

The PACE authors have been referred to as members of the “Wessely school”—or, less politely, the “Wessely cabal”– because of Simon Wessely’s prominence as a pioneer of this treatment approach for chronic fatigue syndrome. Wessely, a professor of psychological medicine at King’s College London, has published more than 700 papers, was knighted in 2013, and is the current president of the Royal College of Psychiatrists.

Over the years, members of the PACE team developed close consulting and financial relationships with insurance companies; they have acknowledged these ties in “conflict of interest” statements in published papers. They have advised insurers that rehabilitative, non-pharmacological therapies can help claimants with chronic fatigue syndrome return to work—as Sharpe noted in a 2002 UNUMProvident report on disability insurance trends.

In his article for the UNUMProvident report, Sharpe also criticized the “ME lobby” for playing a negative role in influencing patients’ self-perceptions of their condition, noting that “the patient’s beliefs may become entrenched and be driven by anger and the need to explain continuing disability.” Sharpe noted that economic and social factors, like receiving financial benefits or accepting the physiological illness claims made by patient groups, also represented roadblocks to both clinical improvement and the resolution of disability insurance claims.

“A strong belief and preoccupation that one has a ‘medical disease’ and a helpless and passive attitude to coping is associated with persistent disability,” Sharpe warned readers of the disability insurance report. “The current system of state benefits, insurance payments and litigation remain potentially major obstacles to effective rehabilitation…If the claimant becomes hostile toward employer or insurer the position is likely to be difficult to retrieve.”

***

Given the medical and social costs of the illness, the government wanted solid evidence from a large trial about treatments that could help people get better. In 2003, the U.K. Medical Research Council announced that it would fund the PACE trial—more formally known as “Comparison of adaptive pacing therapy, cognitive behavior therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome: a randomized trial.”

Three other government agencies–Scotland’s Chief Scientist Office, England’s Department of Health, and the U.K. Department for Work and Pensions—chipped in. The West Midlands Multicentre Research Ethics Committee approved the final study protocol.

The investigators selected two self-reported measures, for physical function and fatigue, as their primary outcomes. For physical function, they chose a section of a widely used questionnaire called the Medical Outcomes Study 36-Item Short Form Health Survey, or SF-36; with this physical function scale, they designated a score of 60 or less out of 100 as representing sufficient disability for trial entry.

For fatigue, they selected the Chalder Fatigue Scale, developed by one of the PACE investigators, on which higher scores represented greater fatigue. The response to each of the scale’s 11 questions would be scored as 0 or 1, and a score of 6 or more was deemed sufficient evidence of disability for trial entry.

In the proposed trial, participants would be randomized into four arms. All would be offered a few meetings with a specialist—the baseline condition ultimately called “specialist medical care.” Participants in three of the arms would receive additional interventions, of up to 14 sessions over six months, with a booster session three months later. Everyone would be assessed one year after entering the trial—that is, six months after the end of the main period of treatment. Home-bound patients were not eligible, since participation required attendance at multiple clinic sessions.

Besides the two rehabilitative treatments of cognitive behavior therapy and graded exercise therapy, the investigators planned to include an intervention based on a popular self-help strategy known as “pacing.” While the first two approaches challenged patients to adjust their thinking and push themselves beyond what they believed they could do, pacing involved accepting and adapting to the physical constraints of the illness, paying attention to symptoms, and not exceeding personal energy reserves to avoid triggering a relapse.

***

Previous studies conducted by the authors and other researchers, although smaller than PACE, had found that graded exercise therapy and cognitive behavior therapy led to modest improvements in self-reported outcomes, as a 2001 review in JAMA noted. But the same review also warned that the positive results on subjective measures in these studies did not mean that participants had actually improved their physical capacities.

“The person may feel better able to cope with daily activities because they have reduced their expectations of what they should achieve, rather than because they have made any recovery as a result of the intervention,” stated the review. “A more objective measure of the effect of any intervention would be whether participants have increased their working hours, returned to work or school, or increased their physical activities.”

Aware of such concerns, the PACE investigators planned to include some measures of physical function and fitness not dependent on subjective impressions.

Beyond the question of how to measure the effects of the intervention, the therapies themselves remained highly controversial among patients. Many understood that cognitive behavior therapy could be a useful tool for coping with a serious condition but resented and dismissed the PACE authors’ suggestion that it could treat the underlying illness. Encouraging an increase in exercise or exertion was even more controversial. Patients considered it dangerous because of the possibility of relapse from post-exertional malaise. In surveys, patients who had received graded exercise therapy were more likely to report that it had made them worse rather than better.

The psychiatrists and other mental health experts acknowledged that patients often felt worse after starting an activity program. To them, the resurgence of symptoms reflected the deconditioned body’s natural response to renewed exertion, not an underlying disease process—a point strongly conveyed to patients. According to the PACE manual for clinicians administering graded exercise therapy, “Participants are encouraged to see symptoms as temporary and reversible, as a result of their current physical weakness, and not as signs of progressive pathology.”

***

Patients and advocates, aware of the previous work of the PACE team, responded to the Medical Research Council’s announcement with alarm. Fearing the research would lead to calls for more funding for cognitive behavior therapy and exercise therapy and nothing else, patient groups demanded that the agency back research into biological causes and treatments of ME/CFS instead–something it was not doing.

“We believe that the money being allocated to the PACE trial is a scandalous way of prioritising the very limited research funding that the MRC [Medical Research Council] have decided to make available for ME/CFS,” declared the ME Association, a major advocacy organization, in a statement widely disseminated on social media. The statement demanded that the trial be halted and the money “held in reserve for research that is likely to be of real benefit to people with ME/CFS.”

Despite the anger in the patient community, the investigators were able to enlist Action For ME, another major advocacy group, to help design the pacing intervention. They called their operationalization of the strategy “adaptive pacing therapy,” or APT.

The trial protocol described the pacing therapy as “essentially an energy management approach, which involves assessment of the link between activity and subsequent symptoms and disability, establishing a stable baseline of activity using a daily diary, with advice to plan and pace activity in order to avoid exacerbations.” But many patients argued that pacing was an inherently personal, flexible approach. Packaging it as a structured “treatment” administered by a “therapist,” with a focus on daily diaries and advance planning, would inevitably alter its effect, they said.

***

Patients and other researchers also objected to the PACE study’s choice of “case definition”—a set of research or diagnostic criteria designed to include everyone with an illness and exclude those without it. Many challenged the decision to identify participants using the single-symptom case definition of chronic fatigue syndrome called the Oxford criteria—the same broad case definition that last June’s NIH report recommended for retirement because it could “impair progress and cause harm.”

Over the years, there have been many definitions proposed for both chronic fatigue syndrome and myalgic encephalomyelitis, for both clinical and research use. The most widely used has been the CDC’s 1994 definition for chronic fatigue syndrome, which required six months of fatigue, plus any four of eight other symptoms: cognitive problems, muscle pain, joint pain, headache, tender lymph nodes, sore throat, post-exertional malaise, and sleep disturbances.

Many patients, researchers and clinicians experienced in treating the illness prefer more recent and restrictive definitions that seek to reduce misdiagnoses by requiring the presence of the core symptom of post-exertional malaise as well as neurological and cognitive dysfunctions, unlike the more flexible CDC definition. In contrast, the Oxford criteria, published in 1991 by PACE investigator Michael Sharpe and colleagues, required only one symptom: six months of medically unexplained, disabling fatigue. Proponents argued that this broad scope ensured that research results could be applied to the largest number of people potentially suffering from the illness. If other symptoms were present, as often happened, the criteria required that fatigue be the primary complaint.

According to DePaul psychologist Leonard Jason, the Oxford criteria blurred the boundaries between “chronic fatigue,” a symptom of many conditions, and the distinct illness known as “chronic fatigue syndrome.” In particular, he said, an Oxford criteria sample would likely include many people with primary depression, which can cause prolonged fatigue and often responds to interventions like those being tested in PACE. (In contrast, many people with ME/CFS get depressed as a secondary result of their illness experience.)

“The Oxford criteria clearly select for a lot of patients with primary depression, and people who are depressed do react very well to CBT and exercise,” said Jason, who has published widely on the ME/CFS case definition problem. Positive outcomes in the sample among depressed patients without ME/CFS could therefore lead to the unwarranted conclusion that the therapies worked for people with the disease, he added.

***

The PACE investigators were aware of these concerns, and they promised to study as well two subgroups of participants from their Oxford criteria sample who met additional case definitions: an updated 2003 version of the CDC’s 1994 definition for chronic fatigue syndrome, and a separate definition for myalgic encephalomyelitis. That way, they hoped to be able to draw conclusions about whether the therapies worked, no matter how the illness was defined.

Yet this approach presented its own challenges. Neither of the two other definitions required fatigue to be the primary symptom, as did the Oxford criteria. The myalgic encephalomyelitis definition did not even include fatigue per se as a symptom at all; post-exertional malaise, not fatigue, was the core symptom. And under the CDC definition, patients could present with any of the other symptoms as their primary complaint, as long as they also experienced fatigue.

Given these major differences in the case definitions, an unknown number of patients might have been screened out of the sample by the Oxford criteria but still met one of the other sets of criteria, making it hard to interpret the subgroup findings, according to other researchers. (The PACE investigators and I debated this methodological issue in an exchange of letters in The New York Times in 2011, after an article I wrote about case definition and the PACE trial.)

Bruce Levin, the Columbia University biostatistician, said the PACE investigators should not have assumed that the experience of a subgroup within an already defined population would match the experience of a group that hadn’t been pre-screened. “I would not accept an extrapolation to people diagnosed with alternative criteria from a subgroup comprising people satisfying both sets of criteria rather than just the alternative set of criteria,” he said, adding that reviewers should catch such questionable assumptions before publication.

Tomorrow: Publication of the PACE trial

TWiV 345: How a vaccine got the nod

On episode #345 of the science show This Week in Virology, the TWiVonauts review how the weather affects West Nile virus disease in the US, benefit of B cell depletion for ME/CFS patients, and an autoimmune reaction induced by influenza virus vaccine that leads to narcolepsy.

You can find TWiV #345 at www.microbe.tv/twiv.

B cell depletion benefits ME/CFS patients

B cellPatients with myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) showed clinical improvement after extended treatment with the anti-B-cell monoclonal antibody rituximab. This result suggests that in a subset of patients, ME/CFS might be an autoimmune disease.

Rituximab is a monoclonal antibody against a protein on the surface of B cells known as CD20. When the antibody is given to patients, it leads to destruction of B cells, which are the producers of antibodies, proteins that are made by the immune system to counter infections. The drug has been approved by the US Food and Drug administration to treat diseases of B cells such as lymphomas, leukemias, and autoimmune conditions.

ME/CFS is a disease of unknown etiology and mechanism that includes symptoms of severe fatigue, post-exertional malaise, pain, cognitive and sleep problems that affects 0.1-0.2% of the population. A previous randomized, phase II trial of rituximab treatment showed clinical benefit in 20 of 30 patients. The improvements were evident 2-8 months after treatment, leading the study authors to suggest that remission requires elimination of long-lived antibodies after depletion of B cells.

The current study was done to determine the effects of sustained treatment with rituximab. Included patients (29) were 18-66 years of age and diagnosed with ME/CFS according to Fukuda 1994 criteria. All were given rutiximab infusions two weeks apart, then at 3, 6, 10, and 15 months, and followed up for 36 months. Self-reported symptoms were recorded every second week and used to calculate scores for fatigue (comprising post-exertional malaise, need for rest, daily functioning), pain (muscle, joint, and cutaneous pain and headache) and cognitive scores (concentration ability, memory disturbance, mental tiredness).

Clinically significant responses were found in 18/29 patients (64%), with a lag of 8-66 weeks. After 36 weeks 11 of 18 responding patients were still in clinical remission. Nine patients from the placebo group in the previous study were included in this trial; of these, six had clinical improvement.

These results show that some ME/CFS patients benefit from ablating B cells. The delayed response, coupled with the relapse after cessation of treatment and B cell regeneration, suggests that antibodies are involved in the pathogenesis of the disease. Because onset of ME/CFS in many patients correlates with a viral infection, it is possible that antibodies to viral proteins may cross-react with self proteins, leading to autoimmune reactions that cause disease. Treatment with rituximab would lead to reduced levels of such antibodies, thereby reducing symptoms.

These results warrant trials of larger numbers of ME/CFS patients in other countries (this study was carried out in Norway) to determine if ablation of B cells would have a similar effects elsewhere. It would also be useful to determine the total repertoire of antiviral antibodies produced by ME/CFS patients. Such antibodies can be identified using the newly developed VirScan assay, which requires a small amount of blood and is relatively inexpensive. The results will indicate whether certain viral infections in a large population of ME/CFS patients predispose to the illness. Furthermore, the results may also be used to guide efforts to determine whether such antibodies react with human cellular proteins. A similar approach was used to determine that antibodies to an influenza virus protein cross react with a neuropeptide receptor, leading to narcolepsy.

While these findings are promising, they also show that not all ME/CFS may involve autoimmune pathogenesis. Other creative approaches will be needed to determine the cause of disease in individuals who do not respond to rituximab.

TWiV 331: Why is this outbreak different from all other outbreaks?

On episode #331 of the science show This Week in Virology, the TWiV team discusses the possible association of the respiratory pathogen enterovirus D68 with neurological disease.

You can find TWiV #331 at www.microbe.tv/twiv.