Trial By Error: More on that Norwegian CBT/Music Therapy Study

UPDATE, MAY 16: As I mentioned, the trial registration did not cite “recovery” as an outcome. However, the various study documents include a number of different statements about the status of physical activity, fatigue, and recovery as endpoints. Of four relevant documents besides the trial registration, one included the definition of recovery used in reporting the study results–three, like the trial registration, did not.

I will outline in more detail when time permits. Those too impatient to wait can access the documents here.

**********

By David Tuller, DrPH

After the debacle with the Lightning Process study, you would think that BMJ would have learned an important lesson, editors and peer-reviewers should scrutinize the background materials for the trials they publish. That’s the best way to prevent selective outcome reporting and ensure that findings are reported as described in the trial registration and/or protocol.

To recap: In 2017, Archives of Disease in Childhood, a BMJ journal, published a study of the Lightning Process as a pediatric treatment for what the investigators called chronic fatigue syndrome. The Bristol University investigators recruited more than half the participants before trial registration and swapped outcome measures midway through. Then they failed to disclose these salient details in the published paper. Although these actions appeared to meet standard definitions of research misconduct and violated BMJ’s own strict guidelines, the journal decided last year to let the reported findings stand–albeit with a 3,000-word correction notice.

Now BMJ has published another paper that significantly diverges from the metrics outlined in the study documentation, in this case, the trial registration. The study, posted last month by BMJ Paediatrics Open, is called “Cognitive-behavioural therapy combined with music therapy for chronic fatigue following Epstein-Barr virus infection in adolescents: a feasibility study.”

From March 2015, until November 2016, the investigators recruited participants in the three counties of Oslo, Akershus and Buskerud. The senior author, Vegard Bruun Wyller, is a professor at the University of Oslo’s Institute of Clinical Medicine and an adherent of the cognitive behavior therapy/graded exercise therapy (CBT/GET) treatment paradigm.

Here’s the short take. Scores for the predesignated primary outcome, mean steps per day at three months, dropped in both the intervention and comparison groups. Not only that, participants in the intervention group performed more poorly, taking on average more than 1000 fewer steps per day. Moreover, two outcomes reported as demonstrating “tendencies” toward the positive, post-exertional malaise and recovery, were not even mentioned in the trial registration. How they came to be added as outcomes in the first place is not explained, an omission that undermines the credibility and integrity of the research.

(Nothing here should be construed as criticizing music therapy as a treatment modality. Our bodies evolved to respond to sound patterns in all sorts of ways that we don’t understand. The issue here is whether the research and the reported results are solid.)

**********

Is CBT plus music better than CBT alone?

The main goal of a feasibility study is to gather some preliminary data and decide whether it makes sense to conduct a full trial of whatever it is. (Of course, it’s important to remember that the Lightning Process study began as a feasibility trial.) Feasibility trials are by definition small. They aren’t designed to deliver authoritative and actionable findings about interventions. Oddly, the registration for the Norwegian trial does not appear to identify it as a feasibility study. Perhaps at some point the investigators decided to change gears; if so, it would be interesting to learn what prompted that shift.

The study’s premise, as I understand it, is something like this: CBT has been shown to work as a treatment for CF and CFS, but only modestly, and multidisciplinary approaches are a good thing, so why not add something extra like music and see if that amplifies the effects? This premise is obviously flawed. The investigators cite both the PACE trial and the Dutch FITNET study for the claim that CBT is effective. But they do not mention that these studies have been shown to feature serious flaws that undermine their claims. (I analyzed PACE here and FITNET here.) Nor do they mention that the CBT/GET approach has lost its status as the undisputed international standard-of-care, as manifested in the 2017 decision by the US Centers for Disease Control and Prevention to stop recommending it.

Beyond that issue, the investigators appear confused about whether they were studying adolescents with chronic fatigue or adolescents with chronic fatigue syndrome. The trial registration described the intervention as “mental training for CFS following EBV infection in adolescents.” The title of the published paper refers to chronic fatigue. In the paper, the investigators note that people with symptoms in addition to fatigue might meet case definitions of CFS, but they lump everyone together in the analysis.

They appear to believe that the two conditions are more or less the same, except for those extra symptoms. Or that they exist on a continuum. Or maybe that’s not what they believe. I actually found it hard to tell. Furthering the confusion, at the start of the study the participants averaged around 8,000 steps a day or more, while the paper cites previous research in which adolescents with CFS averaged much less–around 4500 steps a day. In other words, the chronically fatigued adolescents in this sample appear to have been much more physically robust than would often be expected if someone had chronic fatigue syndrome.

The intervention consisted of ten sessions of a “mental training programme merging elements from music therapy with elements from CBT,” with homework assignments. The first session included the adolescent, parents or guardians, both therapists, and a researcher. On top of that, “personal experiences were also shared by a young adult voluntary patient who had himself recovered from CFS.” Interesting. It is not usually considered appropriate in a clinical trial to tell participants that the intervention that are receiving will lead to recovery, since that could bias the results–especially on subjective measures.

Much of the onus for success seems to have been on the patients. “The treatment programme assumes active participation from the patient between the sessions, and the therapists tried to communicate the necessity of individual effort,” the study noted. It makes sense to suggest that patients who want to get better should seek to engage in efforts to get better. However, this framing can also provide investigators with an easy way to blame participants if the intervention fails to deliver the hoped-for benefits.

Out of 91 eligible study candidates who remained fatigued six months after acute Epstein-Barr virus infection, only 43, less than half, agreed to participate. Of those, 21 were randomized to the intervention group and 22 to a group receiving treatment as usual (TAU), which essentially meant no treatment at all. The intervention group suffered major attrition, with six drop-outs by the three-month assessment period compared to just one for the TAU group. According to the investigators, adolescents who declined to participate in the research or who dropped out reported that they were concerned about missing school.

Ok, but so what if adolescents gave investigators that reason? It would likely be easier for many adolescents to tell investigators they were concerned about missing school than to say they thought the intervention wasn’t worth their time. In any event, more than half declined the offer and almost a third dropped out near the start. Those facts would seem to raise questions about the acceptability of the intervention and the feasibility of the trial. (On the other hand, those who did not drop out had high compliance with their scheduled therapy sessions, a favorable indicator.)

**********

A zealous over-interpretation of the data

Despite their preliminary nature, feasibility trials are still subject to basic scientific standards. And that means presenting the outcomes as promised in trial documents, unless investigators can provide excellent reasons for making changes. Adherence to this principle is critical for preventing selective reporting of results, an unfortunately popular violation of methodological principles.

In the current case, the trial registration listed one primary outcome: the mean number of steps per day at 12 weeks, assessed by an accelerometer worn for a week. After that, the investigators listed 41 secondary measures assessed at 12 and 64 weeks, including “symptoms (fatigue, pain, insomnia), cognitive function (executive functions) and markers of disease mechanisms (autonomic, endocrine, and immune responses).” Mean number of steps per day at 64 weeks was also a secondary measure. (In the paper, these times were rendered in months rather than weeks.)

Given the investigators’ expectation that the intervention would boost activity levels, the results for the primary outcome were disappointing. At three months, both groups showed measurable declines in their activity levels, and the score for participants in the intervention group was even lower than for those who received TAU. In other words, the treatment not only failed to increase activity levels, it actually led to worse outcomes.

Here’s the study’s abstract on the findings: “Endpoints included physical activity (steps/day), symptom scores, recovery rate…In intention-to-treat analyses, number of steps/day tended to decrease (difference=ˆ’1158, 95% CI ˆ’2642 to 325), whereas post-exertional malaise tended to improve (difference=ˆ’0.4, 95% CI ˆ’0.9 to 0.1) in the intervention group at 3 months. At 15 months’ follow-up, there was a trend towards higher recovery rate in the intervention group (62% vs 37%)”

And here’s the conclusion: “An intervention study of combined CBT and music therapy in postinfectious CF is feasible, and appears acceptable to the participants. The tendencies towards positive effects on patients’ symptoms and recovery might justify a full-scale clinical trial.”

This is a zealous over-interpretation, even with the hedging language (“tendencies toward,” “might justify”). In any study, the predesignated primary outcome is the most important metric and is highlighted as such by honest and transparent investigators. In this case, a strength of the primary outcome was that it was an objective measure, not a subjective assessment easily influenced by multiple kinds of bias. In comparing the intervention to TAU, the study found no benefits for its primary outcome. Moreover, both groups took fewer steps at three months than at the start, and the intervention group did worse.

These unfortunate findings cannot be airbrushed away. Yet the abstract seems written to create the impression that the physical activity measure is one of multiple endpoints of equal status. The abstract’s conclusion doesn’t even mention the unfortunate results for the primary outcome. This omission is unacceptable. (In the full text, the investigators appropriately mention that the mean number of steps per day was the primary outcome–but they ignore the implications of that inconvenient detail.)

Moreover, the trial registration did not mention PEM and recovery among the 41 secondary outcomes. These two items were apparently post-designated as outcomes–that is, at some point after trial registration but before production of the final draft of the paper. The paper does not explain the reason for introducing these new outcome measures. However, it is worth noting that the abstract largely rests its argument for the “positive” possibilities of the intervention on the PEM and recovery results while ignoring the disappointing results for the predesignated primary outcome.

Hm. This isn’t how scientific research is supposed to be reported. First-year epidemiology students at Berkeley know better than to pull an amateurish stunt like this.

********

A closer look at PEM and recovery

Now let’s look at the outcomes for which the investigators claimed a “tendency” toward positive effects–PEM and recovery.

Here’s what they wrote about how they tracked PEM: “The symptom of postexertional malaise, often considered a hallmark of CFS, was charted with one single item (‘How often do you experience more fatigue the day after an exertion?’).” That single question is a very crude way to measure PEM. Even so, both groups reported a reduction of this symptom, with minimal differences between the two.

This is not surprising. Participants were taking many less steps, so it is understandable they would report less PEM. Since the intervention was expected to increase activity levels, it seems questionable and unjustified in this context to interpret reduced PEM as an indication of potential success rather than as a marker of reduced activity levels.

In the paper, recovery was defined as a score of three or less on the Chalder Fatigue Scale, on which lower numbers represent less fatigue. A score of four or above on the scale was the threshold for trial entry. Since recovery as an outcome was not included in the trial registration, the acceptable way to present the results would have been as a reduction in reported fatigue without reference to recovery at all. Clearly, “recovery” sounds better than “reduced fatigue.”

The investigators could perhaps argue that the intervention was seeking to alleviate “chronic fatigue,” so it would be fair to consider a reduction to three on the fatigue scale as demonstrating recovery. In that case, they needed to make that point when they predesignated their outcomes, not at some unspecified later date when the choice could have been biased by emerging developments during the trial.

Moreover, it is important to note that the statistics provided for recovery–62% in the intervention group vs 37% in the TAU group–are from a per-protocol analysis, not the intention-to-treat analyses provided for the other outcomes. That is, the investigators have simply divided the number of people who met the recovery threshold by the number of participants remaining in that study arm, overlooking those who dropped out. In contrast, an intention-to-treat analysis takes into account the fact that some have dropped out of each arm and that their outcomes are unknown.

Intention-to-treat analyses are generally viewed as more conservative and a better reflection of real-world experience. The intention-to-treat analysis of the scores for the fatigue scale shows very little difference between the intervention and the TAU groups. Transforming the same scores into a recovery outcome and then providing a per-protocol analysis is a clever way of making the same findings look much better. It is fair to assume that the decision to conduct this recovery analysis was made after trial registration. (Also, Table #4, which includes the recovery data, seems to be missing information about one person in each of the study arms; the totals do not add up.)

And another thing…It is hard to understand a definition of recovery that ignores a trial’s primary outcome. This trial documented that people were taking fewer steps per day than before, not more, which undermines the argument for the effectiveness of the intervention. When people take fewer steps, it should not be surprising if they report less fatigue, or less PEM. It takes a certain kind of hubris to argue for a “tendency” toward recovery when patients who received the intervention performed worse on a predesignated–and objectively measured–primary outcome.

In summary, this published paper is deficient in multiple ways. BMJ Paediatrics Open should not have accepted it without insisting that the results be reported according to the predesignated measures in the trial registration. BMJ professes to maintain a rigid stance against selective outcome reporting, but its journals can’t seem to stop publishing papers that indulge in this unfortunate practice.

This post is already very long, and I haven’t even mentioned the study’s peer reviews and treatment manual, which make for interesting reading. Hopefully I’ll get to that soon.

Start typing and press enter to search