On 17 December 2015, Ron Davis, Bruce Levin, David Tuller and I requested trial data from the PACE study of treatments for ME/CFS published in The Lancet in 2011. Below is the response to our request from the Records & Compliance Manager of Queen Mary University of London. The bolded portion of our request, noted in the letter, is the following: “we would like the raw data for all four arms of the trial for the following measures: the two primary outcomes of physical function and fatigue (both bimodal and Likert-style scoring), and the multiple criteria for “recovery” as defined in the protocol published in 2007 in BMC Neurology, not as defined in the 2013 paper published in Psychological Medicine. The anonymized, individual-level data for “recovery” should be linked across the four criteria so it is possible to determine how many people achieved “recovery” according to the protocol definition.”


Dear Prof. Racaniello

Thank you for your email of 17th December 2015. I have bolded your request below, made under the Freedom of Information Act 2000.

You have requested raw data, linked at an individual level, from the PACE trial. I can confirm that QMUL holds this data but I am afraid that I cannot supply it. Over the last five years QMUL has received a number of similar requests for data relating to the PACE trial. One of the resultant refusals, relating to Decision Notice FS50565190, is due to be tested at the First-tier Tribunal (Information Rights) during 2016. We believe that the information requested is similarly exempt from release in to the public domain. At this time, we are not in a position to speculate when this ongoing legal action will be concluded.

Any release of information under FOIA is a release to the world at large without limits. The data consists of (sensitive) personal data which was disclosed in the context of a confidential relationship, under a clear obligation of confidence. This is not only in the form of explicit guarantees to participants but also since this is data provided in the context of medical treatment, under the traditional obligation of confidence imposed on medical practitioners. See generally, General Medical Council, ‘Confidentiality’ (2009) available at http://www.gmc-uk.org/guidance/ethical_guidance/confidentiality.asp The information has the necessary quality of confidence and release to the public would lead to an actionable breach.

As such, we believe it is exempt from disclosure under s.41 of FOIA. This is an absolute exemption.

The primary outcomes requested are also exempt under s.22A of FOIA in that these data form part of an ongoing programme of research.

This exemption is subject to the public interest test. While there is a public interest in public authorities being transparent generally and we acknowledge that there is ongoing debate around PACE and research in to CFS/ME, which might favour disclosure, this is outweighed at this time by the prejudice to the programme of research and the interests of participants. This is because participants may be less willing to participate in a planned feasibility follow up study, since we have promised to keep their data confidential and planned papers from PACE, whether from QMUL or other collaborators, may be affected.

On balance we believe that the public interest in withholding this information outweighs the public interest in disclosing it.

In accordance with s.17, please accept this as a refusal notice.

For your information, the PACE PIs and their associated organisations are currently reviewing a data sharing policy.

If you are dissatisfied with this response, you may ask QMUL to conduct a review of this decision.  To do this, please contact the College in writing (including by fax, letter or email), describe the original request, explain your grounds for dissatisfaction, and include an address for correspondence.  You have 40 working days from receipt of this communication to submit a review request.  When the review process has been completed, if you are still dissatisfied, you may ask the Information Commissioner to intervene. Please see www.ico.org.uk for details.

Yours sincerely

Paul Smallcombe
Records & Information Compliance Manager

By David Tuller, DrPH

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley.

 

The PACE authors have long demonstrated great facility in evading questions they don’t want to answer. They did this in their response to correspondence about the original 2011 Lancet paper. They did it again in the correspondence about the 2013 recovery paper, and in their response to my Virology Blog series. Now they have done it in their answer to critics of their most recent paper on follow-up data, published last October in The Lancet Psychiatry.

(They published the paper just a week after my investigation ran. Wasn’t that a lucky coincidence?)

The Lancet Psychiatry follow-up had null findings: Two years or more after randomization,  there were no differences in reported levels of fatigue and physical function between those assigned to any of the groups. The results showed that cognitive behavior therapy and graded exercise therapy provided no long-term benefits because those in the other two groups reported improvement during the year or more after the trial was over. Yet the authors, once again, attempted to spin this mess as a success.

In their letters, James Coyne, Keith Laws, Frank Twist, and Charles Shepherd all provide sharp and effective critiques of the follow-up study. I’ll let others tackle the PACE team’s counter-claims about study design and statistical analysis. I want to focus once more on the issue of the PACE participant newsletter, which they again defend in their Lancet Psychiatry response.

Here’s what they write: “One of these newsletters included positive quotes from participants. Since these participants were from all four treatment arms (which were not named) these quotes were [not]…a source of bias.”

Let’s recap what I wrote about this newsletter in my investigation. The newsletter was published in December 2008, with at least a third of the study’s sample still undergoing assessment. The newsletter included six glowing testimonials from participants about their positive experiences with the trial, as well as a seventh statement from one participant’s primary care doctor. None of the seven statements recounted any negative outcomes, presumably conveying to remaining participants that the trial was producing a 100 % satisfaction rate. The authors argue that the absence of the specific names of the study arms means that these quotes could not be “a source of bias.”

This is a preposterous claim. The PACE authors apparently believe that it is not a problem to influence all of your participants in a positive direction, and that this does not constitute bias. They have repeated this argument multiple times. I find it hard to believe they take it seriously, but perhaps they actually do. In any case, no one else should. As I have written before, they have no idea how the testimonials might have affected anyone in any of the four groups—so they have no basis for claiming that this uncontrolled co-intervention did not alter their results.

Moreover, the authors now ignore the other significant effort in that newsletter to influence participant opinion: publication of an article noting that a federal clinical guidelines committee had selected cognitive behavior therapy and graded exercise therapy as effective treatments “based on the best available evidence.” Given that the trial itself was supposed to be assessing the efficacy of these treatments, informing participants that they have already been deemed to be effective would appear likely to impact participants’ responses. The PACE authors apparently disagree.

It is worth remembering what top experts have said about the publication of this newsletter and its impact on the trial results. “To let participants know that interventions have been selected by a government committee ‘based on the best available evidence’ strikes me as the height of clinical trial amateurism,” Bruce Levin, a biostatistician at Columbia University, told me.

My Berkeley colleague, epidemiologist Arthur Reingold, said he was flabbergasted to see that the researchers had distributed material promoting the interventions being investigated, whether they were named or not. This fact alone, he noted, made him wonder if other aspects of the trial would also raise methodological or ethical concerns.

“Given the subjective nature of the primary outcomes, broadcasting testimonials from those who had received interventions under study would seem to violate a basic tenet of research design, and potentially introduce substantial reporting and information bias,” he said. “I am hard-pressed to recall a precedent for such an approach in other therapeutic trials. Under the circumstances, an independent review of the trial conducted by experts not involved in the design or conduct of the study would seem to be very much in order.”

TWiVOn episode #372 of the science show This Week in Virology, the TWiV-osphere introduces influenza D virus, virus-like particles encoded in the wasp genome which protect its eggs from caterpillar immunity, and a cytomegalovirus protein which counters a host restriction protein that prevents establishment of latency.

You can find TWiV #372 at microbe.tv/twiv

bivalent OPVIn four months, 155 countries will together switch from using trivalent to bivalent oral poliovirus vaccine. Will this change lead to more cases of poliomyelitis?

There are three serotypes of poliovirus, each of which can cause paralytic poliomyelitis. The Sabin oral poliovirus vaccine (OPV), which has been used globally by WHO in the eradication effort, is a trivalent vaccine that contains all three serotypes.

In September 2015 WHO declared that wild poliovirus type 2 has been eradicated from the planet – no cases caused by this serotype had been detected since November 1999. However, in 2015, there were 9 cases of poliomyelitis caused by the type 2 vaccine. For these reasons WHO decided to remove the type 2 Sabin strain from OPV, and switch from trivalent to bivalent vaccine in April 2016.

After OPV is ingested, the viruses replicate in the intestinal tract, providing immunity to subsequent infection. During replication in the intestine, the vaccine viruses lose the mutations that prevent them from causing paralysis. Everyone who receives OPV sheds these revertant viruses in the feces. In rare cases (about one in 1.5 million) the revertant viruses cause poliomyelitis in the vaccine recipient (these cases are called VAPP for vaccine-associated paralytic poliomyelitis). Vaccine-derived polioviruses can also circulate in the human population, and in under-vaccinated populations, they can cause poliomyelitis.

There were 26 reported cases of poliomyelitis caused by the type 1 or type 2 vaccine viruses in 2015. Nine cases of type 2 vaccine-associated polio were detected in four countries: Pakistan, Guinea, Lao People’s Democratic Republic, and Myanmar. Removing the type 2 strain from OPV will eliminate vaccine-associated poliomyelitis in recipients caused by this serotype. When the US switched from OPV to the inactivated poliovaccine (IPV) in 2000, VAPP was eliminated.

The problem with the trivalent to bivalent switch is that vaccine-derived type 2 poliovirus is likely still circulating somewhere on Earth. The last two reported cases of type 2 vaccine-associated polio in 2015 were reported in Myanmar in October. The viruses isolated from these cases were genetically related to strains that had been circulating in the same village in April of the that year. In other words, type 2 vaccine-derived strains have been circulating for an extended period of time in Myanmar; they have been known to persist for years elsewhere. If these viruses continue to circulate past the time that immunization against type 2 virus stops, they could pose a threat to the growing numbers of infants and children who have not been immunized against this serotype.

Eventually as type 3, and then type 1 polioviruses are eradicated, it will also be necessary to stop immunizing with the respective Sabin vaccine strains. The switch from trivalent to bivalent vaccine in April 2016 is essentially an experiment to determine if it is possible to stop immunizing with OPV without placing newborns at risk from circulating vaccine-derived strains.

Over 18 years ago Alan Dove and I argued that the presence of circulating vaccine-derived polioviruses made stopping immunization with OPV a bad idea. We suggested instead a switch from OPV to IPV until circulating vaccine-derived viruses disappeared. At the time, WHO disagreeed, but now they recommend that all countries deliver at least one dose of IPV as part of their immunization program. Instead of simply removing the Sabin type 2 strain from the immunization programs of 155 countries, it should be replaced with the inactivated type 2 vaccine. This change would maintain immunity to this virus in children born after April 2016. Such a synchronized replacement is currently not in the WHO’s polio eradication plans. I hope that their strategy is the right one.

MicrobeTV

MicrobeTVI started my first podcast, This Week in Virology, in September 2008, together with Dickson Despommier, father of the Vertical Farm. Although I viewed the creation of a science podcast as an experiment, I was surprised when people began to listen. Since then I have created five other podcasts, scattered at different websites. Now you can find all of them at MicrobeTV.

MicrobeTV is a podcast network for people who are interested in the life sciences. More specifically, the podcasts of MicrobeTV use conversations among scientists as teaching tools. Although I have been a research scientist my entire career, I have also had opportunities to teach graduate students, medical students, and undergraduate students. A long time ago I realized that I love to teach, and my podcasts are the outside-the-classroom expression of that sentiment.

My original idea behind TWiV was to teach virology to the broader public by recording conversations among scientists. The success of this approach led me to create This Week in Parasitism, This Week in Microbiology, Urban Agriculture, and This Week in Evolution, all of which can now be found at MicrobeTV.

You may ask why I do so many podcasts. The answer is simple – because I love talking about science and teaching others about this amazing field that makes our lives better. I could not do all these podcasts without my terrific co-hosts. I am also grateful to the American Society for Microbiology for their assistance and support for many years, especially Chris Condayan and Ray Ortega and the Communications Department.

MicrobeTV is the home for all of the podcasts that I have produced (and there are more to come!). But I’d also like to use MicrobeTV as a platform to showcase other science shows. The requirements are few: you should be passionate about your subject, you should have a great relationship with your audience, and your podcast audio must be excellent. If you are interested in joining MicrobeTV, send a note to shows@microbe.tv.

MicrobeTV – Science Shows by Scientists.

TWiVOn episode #371 of the science show This Week in Virology, the TWiVologists discuss the finding of a second transmissible cancer in Tasmanian devils, and development of new poliovirus strains for the production of inactivated vaccine in the post-eradication era.

You can find TWiV #371 at www.microbe.tv/twiv.

I have worked on poliovirus for over thirty-six years, first as a posdoctoral fellow with David Baltimore in 1979, and then in my laboratory at Columbia University. The end of that research commences this year with the destruction of my stocks of polioviruses.

In 2015 there were 70 reported cases of poliomyelitis caused by wild type 1 poliovirus, and 26 cases of poliomyelitis caused by circulating vaccine derived polioviruses (cVDPV) types 1 and 2. The last case of type 2 poliovirus occurred in India in 1999, and the virus was declared eradicated in 2015. Consequently the World Health Organization has decided that all remaining stocks of wild type 2 poliovirus should be destroyed by the end of 2015.

My laboratory has worked extensively with type 2 polioviruses. Before we produced transgenic mice susceptible to poliovirus, we had studied the Lansing strain of type 2 poliovirus because it had the unusual ability to infect wild type mice (polioviruses normally only infect certain primates). We determined the nucleotide sequence of the viral genome, identified the capsid as a determinant of the ability of the virus to infect wild type mice, and showed that swapping an eight amino acid sequence of capsid protein VP1 from a type 1 strain with that from Lansing conferred the ability to infect non-transgenic mice. These findings indicate that the ability of the Lansing strain of poliovirus to infect mice is likely due to recognition by the viral capsid of a receptor in the mouse central nervous system. In the past year we took advantage of the ability to produce mouse neurons from stem cells to attempt to identify the murine cellular receptor for Lansing virus.

To prevent further cases of poliomyelitis caused by cVDPVs, WHO has decided that there will be a synchronized, global switch from trivalent OPV to bivalent OPV in April 2016. By July of 2016 all remaining stocks of the Sabin type 2 poliovirus strains, which are used to produce OPV, will also be destroyed.

No wild type 3 poliovirus has been detected since November 2012, and it is likely that this virus will be declared eradicated within the next several years. At that time we will have to destroy our stocks of type 3 poliovirus. That leaves wild poliovirus type 1, which circulates only in Pakistan and Afghanistan. Given the small number of cases of paralysis caused by this type, it is reasonable to believe that eradication will occur within the next five years. If this timeline is correct, it means that I will be destroying my last vials of poliovirus around 2020.

It is of course necessary to destroy stocks of wild and vaccine polioviruses to prevent reintroduction of the virus and the disease that it causes. The 1978 release of smallpox virus from a laboratory in the United Kingdom, which caused one death, lead to requests for reducing the number of laboratories that retained the virus. Today there are just two official repositories of smallpox virus in the United States and Russia.

It is rare for an investigator to be told to destroy stocks of the virus that is the subject of his or her research. Over the years we have published 81 papers on poliovirus replication, vaccines, and pathogenesis. While I realize that it is absolutely essential to stop working on this virus, I do so with a certain amount of sadness. What other emotion could I have for a virus on which I have expended so much thought and effort?

Image: Poliovirus by Jason Roberts

Correction: The synchronized switch in April 2016 is from trivalent to bivalent OPV, not OPV to IPV. Consequently I have removed comments related to an OPV-IPV switch.

By Julie Rehmeyer and David Tuller, DrPH

Julie Rehmeyer is a journalist and Ted Scripps Environmental Journalism Fellow at the University of Colorado, Boulder, who has written extensively about ME/CFS.

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley.

Joining me for this episode of our ongoing saga is my friend and colleague Julie Rehmeyer. In my initial series, I only briefly touched on the PACE trial’s blanket claim of safety. Here we examine this key aspect of the study in more detail, which is complicated and requires a deep dive into technicalities. Sorry about that, but the claim is too consequential to ignore.    

 

One of the most important and controversial claims from the PACE Trial was that graded exercise therapy is safe for patients with chronic fatigue syndrome (or ME/CFS, as U.S. government agencies now call it).

“If this treatment is done by skilled people in an appropriate way, it actually is safe and can stand a very good chance of benefiting [patients],” Michael Sharpe, one of the principal PACE investigators, told National Public Radio in 2011, shortly after The Lancet published the first results.

But to many in the ME/CFS community, this safety claim goes against the very essence of the disease. The hallmark of chronic fatigue syndrome, despite the name, is not actually fatigue but the body’s inability to tolerate too much exertion — a phenomenon that has been documented in exercise studies. All other symptoms, like sleep disorders, cognitive impairment, blood pressure regulation problems, and muscle pain, are exacerbated by physical or mental activity. An Institute of Medicine report this year even recommended that the illness be renamed to emphasize this central problem, choosing the name “systemic exertion intolerance disease,” or SEID. [see correction below]

A careful analysis shows that the PACE researchers’ attempts to prove safety were as flawed as their attempts to prove efficacy. However, while the trial reports gave enough information to establish that the treatments were not effective (in spite of the claims of success and “recovery”), they did not give enough information to establish whether they were safe (also in spite of their claims). We simply do not know.

“I would be very skeptical in recommending a blanket statement that GET is safe,” says Bruce Levin, a biostatistician at Columbia University, who has reviewed the PACE trial and found other methodological aspects indefensible. “The aphorism that absence of evidence is not evidence of absence applies here. There is real difficulty interpreting these results.”

*          *          *          *          *          *

Assessing the PACE team’s safety claims is critical, because the belief that graded exercise is safe has had enormous consequences for patients. In the UK, graded exercise therapy is recommended for all mild to moderate ME/CFS patients by the National Institute for Health and Care Excellence, which strongly influences treatment across the country. In the US, the Centers for Disease Control and Prevention also recommends graded exercise.

Exertion intolerance—also called “post-exertional malaise”—presents ME/CFS patients with a quandary: They want to do as much as they can when they’re able, while not doing so much that they make themselves sicker later. Among themselves, they’ve worked out a strategy to accomplish that, which they call “pacing.” Because their energy levels fluctuate, they carefully monitor how they are feeling and adapt their activities to stay within the day’s “energy envelope.”  This requires sensitive attunement to their symptoms in order to pick up on early signs of exacerbation and avoid exceeding their limits.

But according to the hypothesis behind the PACE study, this approach is all wrong. Because the investigators believe physical deconditioning rather than an organic disease perpetuated the many symptoms, they theorized that the key to getting better was to deliberately exceed current limits, gradually training the body to adapt to greater levels of activity. Rather than being sensitively attuned to symptoms, patients should ignore them, on the grounds that they have become obsessed about sensations most people would consider normal. Any increase in symptoms from exertion was explained as expected, transient and unimportant—the result of the body’s current state of weakness, not an underlying disease.

Many patients in the UK have tested this theory, since graded exercise therapy, or GET, is one of the few therapies available to patients there. And patient reports on the approach are very, very bad. In May 2015, the ME Association, a British charity, released a survey of patients’ experiences with GET, cognitive behavioral therapy, and pacing. The results suggested that GET was far and away the most dangerous. Of patients who received GET, 74 percent said that it had made them worse. In contrast, 18 percent said they were worse after cognitive behavior therapy and only 14 percent after pacing.

The survey is filled with reports similar to this one: “My condition deteriorated significantly, becoming virtually housebound, spending most of my day in bed in significant pain and with extreme fatigue.”

Anecdotal reports, however, don’t provide the proof of a randomized clinical trial. So this was one of the central issues at stake in the PACE study: Is it safe for patients to increase their activity on a set schedule while ignoring their symptoms?

*          *          *          *          *          *

In the 2011 Lancet article with the first PACE results, the researchers reported that eight percent of all participants experienced a “serious deterioration” and less than two percent experienced a “serious adverse reaction” over the course of the year, without significant differences between the arms of the trial.

For patients to have a “serious deterioration,” their physical function score needed to drop by at least 20 points and they needed to report that their overall health was “much worse” or “very much worse” at two consecutive assessment periods (out of a total of three).

To have a “serious adverse reaction,” the patient needed to experience a persistent “severe, i.e. significant deterioration,” which was not defined, or to experience a major health-related event, such as a hospitalization or even death. Furthermore, a doctor needed to determine that the event was directly caused by the treatment—a decision that was made after the doctor was told which arm of the trial the patient was in.

Subsequent “safety” results were published in a 2014 article in the Journal of Psychosomatic Research. And this paper revealed a critical detail unmentioned in the Lancet paper: the six centers around England participating in the study appear to have applied the methods for assessing safety differently. That raises questions about how to interpret the results and whether the overall claims of “safety” can be taken at face value.

Beyond that issue, a major problem with the PACE investigators’ reporting on harms from exercise is that it looks as though participants might not have actually done much exercise. While the researchers stated the ambitious goal that participants would exercise for at least 30 minutes five times a week, they gave no information on how much exercise participants in fact did.

The trial’s objective outcomes suggest it may not have been much. The exercise patients were only able to walk 11 percent further in a walking test at the end of the trial than patients who hadn’t exercised. Even with this minimal improvement, participants were still severely disabled, with a poorer performance than patients with chronic heart failure, severe multiple sclerosis, or chronic obstructive pulmonary disorder.

On top of that, almost a third of those in the exercise arm who finished other aspects of the trial never completed the final walking test; if they couldn’t because they were too sick, that would skew the results. In addition, the participants in GET showed no improvement at all on a step test designed to measure fitness. Presumably, if the trial’s theory that patients suffered from deconditioning was correct, participants who had managed to exercise should have become more fit and performed better on these tests.

Tom Kindlon, a long-time patient and an expert on the clinical research, suggests that even if those in the exercise arm performed more graded exercise under the guidance of trial therapists, they may have simply cut back on other activities to compensate, as has been found in other studies of graded activity. He also notes that the therapists in the trial were possibly more cautious than therapists in everyday practice.

“In the PACE Trial, there was a much greater focus on the issue of safety [than in previous studies of graded activity], with much greater monitoring of adverse events,” says Kindlon, who published an analysis of the reporting of harms from trials of graded activity in ME/CFS, including PACE. “In this scenario, it seems quite plausible that those running the trial and the clinicians would be very cautious about pushing participants to keep exercising when they had increased symptoms, as this could increase the chances the patients would say such therapies caused adverse events.”

*          *          *          *          *          *

Had the investigators stuck to their original plan, we would have more evidence to evaluate participants’ activity levels.  Originally, participants were going to wear a wristwatch-sized ankle band called an actometer, similar to a FitBit, that would measure how many steps they took for a week at the beginning of the trial and for a week at the end.

A substantial increase in the number of steps over the course of the trial would have definitively established both that participants were exercising and that they weren’t decreasing other activity in order to do so.

But in reviewing the PACE Trial protocol, which was published in 2007, Kindlon noticed, to his surprise, that the researchers had abandoned this plan. Instead, they were asking participants to wear the actometers only at the beginning of the trial, but not at the end. Kindlon posted a comment on the journal’s website questioning this decision. He pointed out that in previous studies of graded activity, actometer measurements showed that patients were not moving more, even if they reported feeling better. Hence, the “exercise program” in that case in fact did not raise their overall activity levels.

In a posted response, White and his colleagues explained that they “decided that a test that required participants to wear an actometer around their ankle for a week was too great a burden at the end of the trial.” However, they had retained the actometer as a baseline measure, they wrote, to test as “a moderator of outcome”—that is, to determine factors that predicted which participants improved. The investigators also noted that the trial contained other objective outcome measures. (They subsequently dismissed the relevance of these objective measures after they failed to demonstrate efficacy.)

That answer didn’t make sense to Kindlon. “They clearly don’t find it that great a burden that they drop it altogether as it is being used on patients before the start,” he wrote in a follow-up comment. “If they feel it was that big of a burden, it should probably have been dropped altogether.”

*          *          *          *          *

The other major flaws that make it impossible to assess the validity of their safety claims are related to those that affected the PACE trial as a whole.  In particular, problems related to four issues affected their methods for reporting harms: the case definition, changes in outcome measures after the trial began, lack of blinding, and encouraging participants to discount symptoms in a trial that relied on subjective endpoints.

First, the study’s primary case definition for identifying participants, called the Oxford criteria, was extremely broad; it required only six months of medically unexplained fatigue, with no other symptoms necessary. Indeed, 16% of the participants didn’t even have exercise intolerance—now recognized as the primary symptom of ME/CFS—and hence would not be expected to suffer serious exacerbations from exercise. The trial did use two additional case definitions to conduct sub-group analyses, but they didn’t break down the results on harms by the definition used. So we don’t know if the participants who met one of the more stringent definitions suffered more setbacks due to exercise.

Second, after the trial began, the researchers tightened their definition of harms, just as they had relaxed their methods of assessing improvement. In the protocol, for example, a steep drop in physical function since the previous assessment, or a slight decline in reported overall health, both qualified as a “serious deterioration.” However, as reported in The Lancet, the steep drop in physical function had to be sustained across two out of the trial’s three assessments rather than just since the previous one. And reported overall health had to be “much worse” or “very much worse,” not just slightly worse. The researchers also changed their protocol definition of a “serious adverse reaction,” making it more stringent.

The third problem was that the study was unblinded, so both participants and therapists knew the treatment being administered. Many participants were probably aware that the researchers themselves favored graded exercise therapy and another treatment, cognitive behavior therapy, which also involved increasing activity levels. Such information has been shown in other studies to lead to efforts to cooperate, which in this case could lead to lowered reporting of harms.

And finally, therapists were explicitly instructed to urge patients in the graded exercise and cognitive behavioral therapy arms to “consider increased symptoms as a natural response to increased activity”—a direct encouragement to downplay potential signals of physiological deterioration. Since the researchers were relying on self-reports about changes in functionality to assess harms, these therapeutic suggestions could have influenced the outcomes.

“Clinicians or patients cannot take from this trial that it is safe to undertake graded exercise programs,” Kindlon says. “We simply do not know how much activity was performed by individual participants in this trial and under what circumstances; nor do we know what was the effect on those that did try to stick to the programs.”

Correction: The original text stated that the Institute of Medicine report came out “this” year; that was accurate when it was written in late December but inaccurate by the time of publication.

By David Tuller, DrPH

David Tuller is academic coordinator of the concurrent masters degree program in public health and journalism at the University of California, Berkeley.

I have been seeking answers from the PACE researchers for more than a year. At the end of this post, I have included the list of questions I’d compiled by last September, when my investigation was nearing publication. Most of these questions remain unanswered.

The PACE researchers are currently under intense criticism for having rejected as “vexatious” a request for trial data from psychologist James Coyne—an action called “unforgivable” by Columbia statistician Andrew Gelman and “absurd” by Retraction Watch. Several colleagues and I have filed a subsequent request for the main PACE results, including data for the primary outcomes of fatigue and physical function and for “recovery” as defined in the trial protocol. The PACE team has two more weeks to release this data, or explain why it won’t.

Any data from the PACE trial will likely confirm what my Virology Blog series has already revealed: The results cannot stand up to serious scrutiny. But the numbers will not provide answers to the questions I find most compelling. Only the researchers themselves can explain why they made so many ill-advised choices during the trial.

In December, 2014, after months of research, I e-mailed Peter White, Trudie Chalder and Michael Sharpe—the lead PACE researcher and his two main colleagues–and offered to fly to London to meet them. They declined to talk with me. In an email, Dr. White cited my previous coverage of the illness as a reason. (The investigators and I had already engaged in an exchange of letters in The New York Times in 2011, involving a PACE-related story I had written.) “I have concluded that it would not be worthwhile our having a conversation,” Dr. White wrote in his e-mail.

I decided to postpone further attempts to contact them for the story until it was near completion. (Dr. Chalder and I did speak in January 2015 about a new study from the PACE data, and I previously described our differing memories of the conversation.) In the meantime, I wrote and rewrote the piece and tweaked it and trimmed it and then pasted back in stuff that I’d already cut out. Last June, I sent a very long draft to Retraction Watch, which had agreed to review it for possible publication.

I still hoped Dr. White would relent and decide to talk with me. Over the summer, I drew up a list of dozens of questions that covered every single issue addressed in my investigation.

I had noticed the kinds of non-responsive responses Dr. White and his colleagues provided in journal correspondence and other venues whenever patients made cogent and incontrovertible points. They appeared to excel at avoiding hard questions, ignoring inconvenient facts, and misstating key details. I was surprised and perplexed that smart journal editors, public health officials, reporters and others accepted their replies without pointing out glaring methodological problems—such as the bizarre fact that the study’s outcome thresholds for improvement on its primary measures indicated worse health status than the entry criteria required to demonstrate serious disability.

So my list of questions included lots of follow-ups that would help me push past the PACE team’s standard portfolio of evasions. And if, as I suspected, I wouldn’t get the chance to pose the questions myself, I hoped the list would be a useful guide for anyone who wanted to conduct a rigorous interview with Dr. White or his colleagues about the trial’s methodological problems. (Dr. White never agreed to talk with me; I sent my questions to Retraction Watch as part of the fact-checking process.)

In September, Retraction Watch interviewed Dr. White in connection with my piece, as noted in a recent post about Dr. Coyne’s data request. Retraction Watch and I subsequently determined that we differed on the best approach and direction for the story. On October 21st to 23rd, Virology Blog ran my 14,000-word investigation.

But I still don’t have the answers to my questions.

*****************

List of Questions, September 1, 2015:

I am posting this list verbatim, although if I were pulling it together today I would add, subtract and rephrase some questions. (I might have misstated a statistical concept or two.) The list is by no means exhaustive. Patients and researchers could easily come up with a host of additional items. The PACE team seems to have a lot to answer for.

1) In June, a report commissioned by the National Institutes of Health declared that the Oxford criteria should be “retired” because the case definition impeded progress and possibly caused harm. As you know, the concern is that it is so non-specific that it leads to heterogeneous study samples that include people with many illnesses besides ME/CFS. How do you respond to that concern?

2) In published remarks after Dr. White’s presentation in Bristol last fall, Dr. Jonathan Edwards wrote: “What Dr White seemed not to understand is that a simple reason for not accepting the conclusion is that an unblinded trial in a situation where endpoints are subjective is valueless.” What is your response to Dr. Edward’s position?

3) The December 2008 PACE participants’ newsletter included an article about the UK NICE guidelines. The article noted that the recommended treatments, “based on the best available evidence,” included two of the interventions being studied–CBT and GET. (The article didn’t mention that PACE investigator Jessica Bavington also served on the NICE guidelines committee.) The same newsletter included glowing testimonials from satisfied participants about their positive outcomes from the trial “therapåy” and “treatment” but included no statements from participants with negative outcomes. According to the graph illustrating recruitment statistics in the same newsletter, about 200 or so participants were still slated to undergo one or more of their assessments after publication of the newsletter.

Were you concerned that publishing such statements would bias the remaining study subjects? If not, why not? A biostatistics professor from Columbia told me that for investigators to publish such information during a trial was “the height of clinical trial amateurism,” and that at the very least you should have assessed responses before and after disseminating the newsletter to ensure that there was no bias resulting from the statements. What is your response? Also, should the article about the NICE guidelines have disclosed that Jessica Bavington was on the committee and therefore playing a dual role?

4) In your protocol, you promised to abide by the Declaration of Helsinki. The declaration mandates that obtaining informed consent requires that prospective participants be “adequately informed” about “any possible conflicts of interest” and “institutional affiliations of the researcher.” In the Lancet and other papers, you disclosed financial and consulting ties with insurance companies as “conflicts of interest.” But trial participants I have interviewed said they did not find out about these “conflicts of interest” until after they completed the trial. They felt this violated their rights as participants to informed consent. One demanded her data be removed from the study after the fact. I have reviewed participant information and consent forms, including those from version 5.0 of the protocol, and none contain the disclosures mandated by the Declaration of Helsinki.

Why did you decide not to inform prospective participants about your “conflicts of interest” and “institutional affiliations” as part of the informed consent process? Do you believe this omission violates the Declaration of Helsinki’s provisions on disclosure to participants? Can you document that any PACE participants were told of your “possible conflicts of interest” and “institutional affiliations” during the informed consent process?

5) For both fatigue and physical function, your thresholds for “normal range” (Lancet) and “recovery” (Psych Med) indicated a greater level of disability than the entry criteria, meaning participants could be fatigued or physically disabled enough for entry but “recovered” at the same time. Thirteen percent of the sample was already “within normal range” on physical function, fatigue or both at baseline, according to information obtained under a freedom-of-information request.

Can you explain the logic of that overlap? Why did the Lancet and Psych Med papers not specifically mention or discuss the implication of the overlaps, or disclose that 13 percent of the study sample were already “within normal range” on an indicator at baseline? Do you believe that such overlaps affect the interpretation of the results? If not, why not? What oversight committee specifically approved this outcome measure? Or was it not approved by any committee, since it was a post-hoc analysis?

6) You have explained these “normal ranges” as the product of taking the mean value +/- 1 SD of the scores of  representative populations–the standard approach to obtaining normal ranges when data are normally distributed. Yet the values in both those referenced source populations (Bowling for physical function, Chalder for fatigue) are clustered toward the healthier ends, as both papers make clear, so the conventional formula does not provide an accurate normal range. In a 2007 paper, Dr. White mentioned this problem of skewed populations and the challenge they posed to calculation of normal ranges.

Why did you not use other methods for determining normal ranges from your clustered data sets from Bowling and Chalder, such as basing them on percentiles? Why did you not mention the concern or limitation about using conventional methods in the PACE papers, as Dr. White did in the 2007 paper? Is this application of conventional statistical methods for non-normally distributed data the reason why you had such broad normal ranges that ended up overlapping with the fatigue and physical function entry criteria?

7) According to the protocol, the main finding from the primary measures would be rates of “positive outcomes”/”overall improvers,” which would have allowed for individual-level. Instead, the main finding was a comparison of the mean performances of the groups–aggregate results that did not provide important information about how many got better or worse. Who approved this specific change? Were you concerned about losing the individual-level assessments?

8) The other two methods of assessing the primary outcomes were both post-hoc analyses. Do you agree that post-hoc analyses carry significantly less weight than pre-specified results? Did any PACE oversight committees specifically approve the post-hoc analyses?

9) The improvement required to achieve a “clinically useful benefit” was defined as 8 points on the SF-36 scale and 2 points on the continuous scoring for the fatigue scale. In the protocol, categorical thresholds for a “positive outcome” were designated as 75 on the SF-36 and 3 on the Chalder fatigue scale, so achieving that would have required an increase of at least 10 points on the SF-36 and 3 points (bimodal) for fatigue. Do you agree that the protocol measure required participants to demonstrate greater improvements to achieve the “positive outcome” scores than the post-hoc “clinically useful benefit”?

10) When you published your protocol in BMC Neurology in 2007, the journal appended an “editor’s comment” that urged readers to compare the published papers with the protocol “to ensure that no deviations from the protocol occurred during the study.” The comment urged readers to “contact the authors” in the event of such changes. In asking for the results per the protocol, patients and others followed the suggestion in the editor’s comment appended to your protocol. Why have you declined to release the data upon request? Can you explain why Queen Mary has considered requests for results per the original protocol “vexatious”?

11) In cases when protocol changes are absolutely necessary, researchers often conduct sensitivity analyses to assess the impact of the changes, and/or publish the findings from both the original and changed sets of assumptions. Why did you decide not to take either of these standard approaches?

12) You made it clear, in your response to correspondence in the Lancet, that the 2011 paper was not addressing “recovery.” Why, then, did Dr. Chalder refer at the 2011 press conference to the “normal range” data as indicating that patients got “back to normal”–i.e. they “recovered”? And since you had input into the accompanying commentary in the Lancet before publication, according to the press complaints commission, why did you not dissuade the writers from declaring a 30 percent “recovery” rate? Do you agree with the commentary that PACE used “a strict criterion for recovery,” given that in both of the primary outcomes participants could get worse and be counted as “recovered,” or “back to normal” in Dr. Chalder’s words?

13) Much of the press coverage focused on “recovery,” even though the paper was making no such claim. Were you at all concerned that the media was mis-interpreting or over-interpreting the results, and did you feel some responsibility for that, given that Dr. Chalder’s statement of “back to normal” and the commentary claim of a 30 percent “recovery” rate were prime sources of those claims?

14) You changed your fatigue outcome scoring method from bimodal to continuous mid-trial, but cited no references in support of this that might have caused you to change your mind since the protocol. Specifically, you did not explain that the FINE trial reported benefits for its intervention only in a post-hoc re-analysis of its fatigue data using continuous scoring.

Were the FINE findings the impetus for the change in scoring in your paper? If so, why was this reason not mentioned or cited? If not, what specific change prompted your mid-trial decision to alter the protocol in this way? And given that the FINE trial was promoted as the “sister study” to PACE, why were that trial and its negative findings not mentioned in the text of the Lancet paper? Do you believe those findings are irrelevant to PACE? Moreover, since the Likert-style analysis of fatigue was already a secondary outcome in PACE, why did you not simply provide both bimodal and continuous analyses rather than drop the bimodal scoring altogether?

15)  The “number needed to treat” (NNT) for CBT and GET was 7, as Dr. Sharpe indicated in an Australian radio interview after the Lancet publication. But based on the “normal range” data, the NNT for SMC was also 7, since those participants achieved a 15% rate of “being within normal range,” accounting for half of the rate experienced under the rehabilitative interventions.

Is that what Dr. Sharpe meant in the radio interview when he said: “What this trial wasn’t able to answer is how much better are these treatments and really not having very much treatment at all”? If not, what did Dr. Sharpe mean? Wasn’t the trial designed to answer the very question Dr. Sharpe cited? Since each of the rehabilitative intervention arms as well as the SMC arm had an NNT of 7, would it be accurate to interpret the “normal range” findings as demonstrating that CBT and GET worked as well as SMC, but not any better?

16) The PACE paper was widely interpreted, based on your findings and statements, as demonstrating that “pacing” isn’t effective. Yet patients describe “pacing” as an individual, flexible, self-help method for adapting to the illness. Would packaging and operationalizing it as a “treatment” to be administered by a “therapist” alter its nature and therefore its impact? If not, why not? Why do you think the evidence from APT can be extrapolated to what patients themselves call “pacing”? Also, given your partnership with Action4ME in developing APT, how do you explain the organization rejection of the findings in the statement issued after the study was published?

17) In your response to correspondence in the Lancet, you acknowledged a mistake in describing the Bowling sample as a “working age” rather than “adult” population–a mistake that changes the interpretation of the findings. Comparing the PACE participants to a sicker group but mislabeling it a healthier one makes the PACE results look better than they were; the percentage of participants scoring “within normal range” would clearly have been even lower had they actually been compared to the real “working age” population rather than the larger and more debilitated “adult” population. Yet the Lancet paper itself has not been corrected, so current readers are provided with misinformation about the measurement and interpretation of one of the study’s two primary outcomes.

Why hasn’t the paper been corrected? Do you believe that everyone who reads the paper also reads the correspondence, making it unnecessary to correct the paper itself? Or do you think the mistake is insignificant and so does not warrant a correction in the paper itself? Lancet policy calls for corrections–not mentions in correspondence–for mistakes that affect interpretation or replicability. Do you disagree that this mistake affects interpretation or replicability?

18) In our exchange of letters in the NYTimes four years ago, you argued that PACE provided “robust” evidence for treatment with CBT and GET “no matter how the illness is defined,” based on the two sub-group analyses. Yet Oxford requires that fatigue be the primary complaint–a requirement that is not a part of either of your other two sub-group case definitions. (“Fatigue” per se is not part of the ME definition at all, since post-exertional malaise is the core symptom; the CDC obviously requires “fatigue,” but not that it be the primary symptom, and patients can present with post-exertional malaise or cognitive problems as being their “primary” complaint.)

Given that discrepancy, why do you believe the PACE findings can be extrapolated to others “no matter how the illness is defined,” as you wrote in the NYTimes? Is it your assumption that everyone who met the other two criteria would automatically be screened in by the Oxford criteria, despite the discrepancies in the case definitions?

19) None of the multiple outcomes you cited as “objective” in the protocol supported the subjective outcomes suggesting improvement (excluding the extremely modest increase in the six-minute walking test for the GET group)? Does this lack of objective support for improvement and recovery concern you?  Should the failure of the objective measures raise questions about whether people have achieved any actual benefits or improvements in performance?

20) If wearing the actometer was considered too much of a burden for patients to wear at the end of the trial, when presumably many of them would have been improved, why wasn’t it too much of a burden for patients at the beginning of the trial? In retrospect, given that your other objective findings failed, do you regret having made that decision?

21) In your response to correspondence after publication of the Psych Med paper, you mentioned multiple problems with the “objectivity” of the six-minute walking test that invalidated comparisons with other studies. Yet PACE started assessing people using this test when the trial began recruitment in 2005, and the serious limitations–the short corridors requiring patients to turn around more than was standard, the decision not to encourage patients during the test, etc.–presumably become apparent quickly.

Why then, in the published protocol in 2007, did you describe the walking test as an “objective” measure of function? Given that the study had been assessing patients for two years already, why had you not already recognized the limitations of the test and realized that it was apparently useless as an objective measure? When did you actually recognize these limitations?

22) In the Psych Med paper, you described “recovery” as recovery only from the current episode of illness–a limitation of the term not mentioned in the protocol. Since this definition describes what most people would refer to as “remission,” not “recovery,” why did you choose to use the word “recovery”–in the protocol and in the paper–in the first place? Would the term “remission” have been more accurate and less misleading? Not surprisingly, the media coverage focused on “recovery,” not on “remission.” Were you concerned that this coverage gave readers and viewers an inaccurate impression of the findings, since few readers or viewers would understand that what the Psych Med paper examined was in fact “remission” and not “recovery,” as most people would understand the terms?

23) In the Psychological Medicine definition of “recovery,” you relaxed all four of the criteria. For the first two, you adopted the “normal range” scores for fatigue and physical function from the Lancet paper, with “recovery” thresholds lower than the entry criteria. For the Clinical Global Impression scale, “recovery” in the Psych Med paper required a 1 or 2, rather than just a 1, as in the protocol. For the fourth element, you split the single category of not meeting any of the three case definitions into two separate categories–one less restrictive (‘trial recovery’) than the original proposed in the protocol (now renamed ‘clinical recovery’).

What oversight committee approved the changes in the overall definition of recovery from the protocol, including the relaxation of all four elements of the definition? Can you cite any references for your reconsideration of the CGI scale, and explain what new information prompted this reconsideration after the trial? Can you provide any references for the decision to split the final “recovery” element into two categories, and explain what new information prompted this change after the trial?

24) The Psychological Medicine paper, in dismissing the original “recovery” threshold of 85 on the SF-36, asserted that 50 percent of the population would score below this mean value and that it was therefore not an appropriate cut-off. But that statement conflates the mean and median values; given that this is not a normally distributed sample and that the median value is much higher than the mean in this population, the statement about 50 percent performing below 85 is clearly wrong.

Since the source populations were skewed and not normally distributed, can you explain this claim that 50 percent of the population would perform below the mean? And since this reasoning for dismissing the threshold of 85 is wrong, can you provide another explanation for why that threshold needed to be revised downward so significantly? Why has this erroneous claim not been corrected?

25) What are the results, per the protocol definition of “recovery”?

26) The PLoS One paper reported that a sensitivity analysis found that the findings of the societal cost-effectiveness of CBT and GET would be “robust” even when informal care was measured not by replacement cost of a health-care worker but using alternative assumptions of minimum wage or zero pay. When readers challenged this claim that the findings would be “robust” under these alternative assumptions, the lead author, Paul McCrone, agreed in his responses that changing the value for informal care would, in fact, change the outcomes. He then criticized the alternative assumptions because they did not adequately value the family’s caregiving work, even though they had been included in the PACE statistical plan.

Why did the PLoS One paper include an apparently inaccurate sensitivity analysis that claimed the societal cost-effectiveness findings for CBT and GET were “robust” under the alternative assumptions, even though that wasn’t the case? And if the alternative assumptions were “controversial” and “restrictive, as the lead author wrote in one of his posted responses, then why did the PACE team include them in the statistical plan in the first place?

TWiV 370: Ten out of 15

On episode #370 of the science show This Week in Virology, the TWiVomics review ten captivating virology stories from 2015.

You can find TWiV #370 at www.microbe.tv/twiv.