The draft systematic evidence review on the Diagnosis and Treatment of ME/CFS was published online last week. It’s a monster – 416 pages in total. I know many ME/CFS patients may not be able to read this report, so in this post I’m going to focus on three things: the purpose of the report, the lumping of multiple case definitions, and the high quality rating given to the PACE trial. If you read nothing else about this systematic review, then these are the biggest takeaway messages.
The Purpose of the Systematic Review
NIH requested the review for the purposes of the P2P Workshop, and the Agency for Healthcare Research and Quality contracted with the Oregon Health & Sciences University to perform the review for about $350,000.
The primary purpose of the review is to serve as the cornerstone of knowledge for the P2P Panel. The Panel will be made up entirely of non-ME/CFS experts. In order to give them some knowledge base for the Workshop presentations, the Panel will receive this review and a presentation by the review authors (behind closed doors). Until the Workshop itself, this review will be the Panel’s largest source of information about ME/CFS.
But that is not the only use for this report. AHRQ systematic reviews are frequently published in summary form in peer reviewed journals, as was the 2001 CFS review. The report will be available online, and will be given great credence simply because it is an AHRQ systematic review. The conclusions of this review – including the quality rating of the PACE trial – will be entrenched for years to come.
You can expect to see this review again and again and again. In the short term, this review will be the education given to the P2P Panel of non-ME/CFS experts in advance of the Workshop. But the review will also be published, cited, and relied upon by others as a definitive summary of the state of the science on diagnosing and treating ME/CFS.
Case Definition: I Told You So
When the protocol for this systematic review was published in May 2014, I warned that the review was going to lump all case definitions together, including the Oxford definition. After analyzing the review protocol and the Workshop agenda, Mary Dimmock and I wrote that the entire P2P enterprise was based on the assumption that all the case definitions described the same single disease, albeit in different ways, and that this assumption put the entire effort at risk. Some people may have hoped that a systematic review would uncover how different Oxford and Canadian Consensus Criteria patients were, and would lead to a statement to that effect.
Unfortunately, Mary and I were correct.
The systematic review considered eight case definitions, including Oxford, Fukuda, Canadian, Reeves Empirical, and the International Consensus Criteria, and treated them as describing a single patient population. They lumped all these patient cohorts together, and then tried to determine what was effective in diagnosing and treating this diverse group. The review offers no evidence to support their assumption, beyond a focus on the unifying feature of fatigue.
What I find particularly disturbing is that the review did acknowledge that maybe Oxford didn’t belong in the group:
We elected to include trials using any pre- defined case definition but recognize that some of the earlier criteria, in particular the Oxford (Sharpe, 1991) criteria, could include patients with 6 months of unexplained fatigue and no other features of ME/CFS. This has the potential of inappropriately including patients that would not otherwise be diagnosed with ME/CFS and may provide misleading results. (p. ES-29, emphasis added)
But then they did it anyway.
This is inexplicably bad science. How can they acknowledge that Oxford patients may not have ME/CFS and acknowledge that including them may provide misleading results, and then include them anyway? Is it just because Oxford papers claim to be about CFS and include people with medically unexplained fatigue? The systematic review authors clearly believed that this was a sufficient minimum standard for inclusion in analysis, despite the acknowledged risk that it could produce misleading results.
I will have a lot more to say on this topic and the problems in the review’s analysis. For now, the bottom line takeaway message is that the systematic review combined all the case definitions, including Oxford, and declared them to represent a single disease entity based on medically unexplained fatigue.
PACE is Ace
One of the dangers of the review’s inclusion of the Oxford definition and related studies was the risk that PACE would be highly regarded. And that is exactly what happened.
The PACE trial is one of seven treatment studies (out of a total of thirty-six) to receive the “Good” rating, which has a specific technical meaning in this context (Appendix E). In the systematic review, a randomized control trial is “Good” if it includes comparable groups, uses reliable and valid measurement instruments, considers important outcomes, and uses an intention-to-treat analysis. I’m certainly no expert in these issues, but I can spot a couple problems.
First of all, the PACE trial may have used comparable groups within the study, but that internal consistency is different from whether the PACE cohort was comparable to other ME/CFS patients. The systematic review already acknowledged that the Oxford cohort may include people who do not actually have ME/CFS, and in my opinion that is the comparable group that matters.
In terms of important outcomes, the systematic review focused on patient-centered outcomes related to overall function, quality of life, ability to work and measures of fatigue. Yet there is no discussion or acknowledgement that patient performance on a 6 minute walking test at the end of PACE showed that they remained severely impaired. There is also no acknowledgement that a patient could enter PACE with an SF-36 score of 65, leave the trial with a score of 60, and be counted as recovered. That is because so many changes were made to the study in post-hoc analysis, including a change to the measures of recovery. Incredibly, the paper in which the PACE authors admit to those post-hoc changes is not cited in the systematic review. It is also important to point out that much of the discussion of the PACE flaws has occurred in Letters to the Editor and other types of publications, many of which were wholly excluded from the systematic review.
Again, I will have a lot more to say about how the systematic review assessed treatment trials, particularly trials like PACE. For now, the takeaway message is that the systematic review gave PACE its highest quality rating, willfully ignoring all the evidence to the contrary.
Final Equation
Where does this leave us, at the most basic and simple level?
- The review lumped eight case definitions together.
- The review acknowledged that the Oxford definition could include patients without ME/CFS, but forged ahead and included those patients anyway.
- The review included nine treatment studies based on the Oxford definition.
- The review rated the PACE trial and two other Oxford CBT/GET/counseling studies as good.
- The review concluded that it had moderate confidence in the finding that CBT/GET are effective for ME/CFS patients, regardless of definition.
If that does not make sense to you, join the club. I do not understand how it can be scientifically acceptable to generalize treatment trial results from patients who have fatigue but not ME/CFS to patients who do have ME/CFS. Can anyone imagine generalizing treatment results from a group of patients with one disorder to patients with another disease? For example, would the results of a high cholesterol medicine trial be generalized to patients with high blood pressure? No, even though some patients with high blood pressure may have elevated cholesterol, we would not assume the risk of generalizing results from one patient population to another.
But the systematic review’s conclusion is the predictable output of an equation that begins with treating all the case definitions as a single disease entity.
I will be submitting a detailed comment on the systematic evidence review. I encourage everyone to do the same because the report authors must publicly respond to all comments. More detailed info will be forthcoming this week on possible points to consider in commenting.
This review is going to be with us for a long time. I think it is fair and reasonable to ask the authors to address the multitude of mistakes they have made in their analysis.
Edited to add: Erica Verillo posted a great summary of problems with the review, as well.

With no announcement or fanfare, the CFS Advisory Committee has posted a response from HHS to the June 2014 recommendations. My information is that – inexplicably – even CFSAC members were not notified when the response was posted. I urge you to read 


They Know What They’re Doing (Not)
This post comes via Mary Dimmock, with assistance from Claudia Goodell, Denise Lopez-Majano, and myself. You are welcome to publish it on your site with attribution to Mary Dimmock.
Last week, Jennie Spotila and Erica Verillo posted summaries of just some of the issues with AHRQ’s Draft Systematic Evidence Review, conducted for P2P.
Jennie and Erica highlighted serious and sometimes insurmountable flaws with this Review, including:
In this post, I will describe several additional key problems with the AHRQ Evidence Review.
Keep in mind that comments must be submitted by October 20, 2014. Directions for doing so are at the end of this post.
We Don’t Need No Stinking Diagnostic Gold Standard
Best practices for diagnostic method reviews state that a diagnostic gold standard is required as the benchmark. But there is no agreed upon diagnostic gold standard for this disease, and the Review acknowledges this. So what did the Evidence Review do? The Review allowed any of 8 disparate CFS or ME definitions to be used as the gold standard and then evaluated diagnostic methods against and across the 8 definitions. But when a definition does not accurately reflect the disease being studied, that definition cannot be used as the standard. And when the 8 disparate definitions do not describe the same disease, you cannot draw conclusions about diagnostic methods across them.
What makes this worse is that the reviewers recognized the importance of PEM but failed to consider the implications of Fukuda’s and Oxford’s failure to require it. The reviewers also excluded, ignored or downplayed substantial evidence demonstrating that some of these definitions could not be applied consistently, as CDC’s Dr. Reeves demonstrated about Fukuda.
Beyond this, some diagnostic studies were excluded because they did not use the “right” statistics or because the reviewer judged the studies to be “etiological” studies, not diagnostic methods studies. Was NK-Cell function eliminated because it was an etiological study? Was Dr. Snell’s study on the discriminative value of CPET excluded because it used the wrong statistics? And all studies before 1988 were excluded. These inclusion/exclusion choices shaped what evidence was considered and what conclusions were drawn.
Erica pointed out that the Review misinterpreted some of the papers expressing harms associated with a diagnosis. The Review failed to acknowledge the relief and value of finally getting a diagnosis, particularly from a supportive doctor. The harm is not from receiving the diagnostic label, but rather from the subsequent reactions of most healthcare providers. At the same time, the Review did not consider other harms like Dr. Newton’s study of patients with other diseases being diagnosed with “CFS” or another study finding some MS patients were first misdiagnosed with CFS. The Review also failed to acknowledge the harm that patients face if they are given harmful treatments out of a belief that CFS is really a psychological or behavioral problem.
The Review is rife with problems: Failing to ask whether all definitions represent the same disease. Using any definition as the diagnostic gold standard against which to assess any diagnostic method. Excluding some of the most important ME studies. It is no surprise, then, that the Review concluded that no definition had proven superior and that there are no accepted diagnostic methods.
But remarkably, reviewers felt that there was sufficient evidence to state that those patients who meet CCC and ME-ICC criteria were not a separate group but rather a subgroup with more severe symptoms and functional limitations. By starting with the assumption that all 8 definitions encompass the same disease, this characterization of CCC and ICC patients was a foregone conclusion.
But Don’t Worry, These Treatment Trials Look Fine
You would think that at this point in the process, someone would stand up and ask about the scientific validity of comparing treatments across these definitions. After all, the Review acknowledged that Oxford can include patients with other causes of the symptom of chronic fatigue. But no, the Evidence Review continued on to compare treatments across definitions regardless of the patient population selected. Would we ever evaluate treatments for cancer patients by first throwing in studies with fatigued patients? The assessment of treatments was flawed from the start.
But the problems were then compounded by how the Review was conducted. The Review focused on subjective measures like general function, quality of life and fatigue, not objective measures like physical performance or activity levels. In addition, the Review explicitly decided to focus on changes in the symptom of fatigue, not PEM, pain or any other symptom. Quality issues with individual studies were either not considered or ignored. Counseling and CBT studies were all lumped into one treatment group, without consideration of the dramatic difference in therapeutic intent of the two. Some important studies like Rituxan were not considered because the treatment duration was considered too short, regardless of whether it was therapeutically appropriate.
And finally, the Review never questioned whether the disease theories underlying these treatments were applicable across all definitions. Is it really reasonable to expect that a disease that responds to Rituxan or Ampligen is going to also respond to therapies that reverse the patient’s “false illness beliefs” and deconditioning? Of course not.
If their own conclusions on the diagnostic methods and the problems with the Oxford definition were not enough to make them stop, the vast differences in disease theories and therapeutic mechanism of action should have made the reviewers step back and raise red flags.
At the Root of It All
This Review brings into sharp relief the widespread confusion on the nature of ME and the inappropriateness of having non-experts attempt to unravel a controversial and conflicting evidence base about which they know nothing.
But just as importantly, this Review speaks volumes about the paltry funding and institutional neglect of ME reflected in the fact that the study could find only 28 diagnostic studies and 9 medication studies to consider from the last 26 years. This Review speaks volumes about the institutional mishandling that fostered the proliferation of disparate and sometimes overly broad definitions, all branded with the same “CFS” label. The Review speaks volumes about the institutional bias that resulted in the biggest, most expensive and greatest number of treatment trials being those that studied behavioral and psychological pathology for a disease long proven to be the result of organic pathology.
This institutional neglect, mishandling and bias have brought us to where we are today. That the Evidence Review failed to recognize and acknowledge those issues is stunning.
Shout Out Your Protest!
This Evidence Review is due to be published in final format before the P2P workshop and it will affect our lives for years to come. Make your concerns known now.
The following information provides additional background to prepare your comments:
However you choose to protest, make your concerns known!