Incredible research is sometimes not credible

The controversy around COVID-19 research from the Surgisphere database certainly got my attention. Not because I have any expertise regarding treatments for COVID-19. But because I wondered how that controversy would affect trust in research like ours using “big data” from health records

In case you didn’t follow the story, here’s my brief summary: Surgisphere is a data analytics company that claims to gather health records data from 169 hospitals across the world. In early May, Surgisphere researchers and academic collaborators published a paper in NEJM showing that drugs used to treat cardiovascular disease did not increase severity of COVID-19. That drew some attention. In late May, a second paper in Lancet reported that hydroxychloroquine treatment of COVID-19 had no benefit and could increase risk of abnormal heart rhythms and death. Given the scientific/political controversy around hydroxychloroquine, that finding drew lots of attention. Intense scrutiny across the academic internet and twitterverse revealed that many of the numbers in both the Lancet paper and the earlier NEJM paper didn’t add up. After several days of controversy and “expressions of concern”, Surgisphere’s academic collaborators retracted the article, reporting that they were not able to access the original data to examine discrepancies and replicate the main results.

Subsequent discussions questioned why journal reviewers and editors didn’t recognize the signs that Surgisphere data were not credible. Spotting the discrepancies, however, would have required detailed knowledge regarding the original health system records and the processes for translating those records data into research-ready datasets. I suspect most reviewers and editors rely more on the reputation of the research team than on detailed inspection of the technical processes.

Our MHRN research also gathers records data from hundreds – or even thousands – of facilities across 15 health systems. We double-check the quality of those data constantly, looking for oddities that might indicate some technical problem in health system databases or some error in our processes. We can’t expect that journal reviewers and editors will triple-check all of our double-checking.

I hope MHRN has established a reputation for trustworthiness, but I’m ambivalent about the scientific community relying too much on individual or institutional reputation. I do believe that the quality of MHRN’s prior work is relevant when evaluating both the integrity of MHRN data and the capability of MHRN researchers. Past performance does give some indication of future performance. But relying on reputation to assess new research will lead to the same voices being amplified. And those loudest voices will tend to be older white men (insert my picture here).

I’d prefer to rely on transparency rather than reputation, but there are limits to what mental health researchers can share. Our MHRN projects certainly share detailed information about the health systems that contribute data to our research. The methods we use to process original health system records into research-ready databases are well-documented and widely imitated. Our individual research projects publish the detailed computer code that connects those research databases to our published results. But we often cannot share patient-level data for others to replicate our findings. Our research nearly always considers sensitive topics like mental health diagnoses, substance use, and suicidal behavior. The more we learn about risk of re-identifying individual patients, the more careful we get about sharing sensitive data. At some point, people who rely on our research findings will need to trust the veracity of our data and the integrity of our analyses.

We can, however, be transparent regarding shortcomings of our data and mistakes in our interpretation. I can remember some of our own “incredible” research that could have turned into Surgisphere-style embarrassments. For example: We found an alarmingly high rate of death by suicide in the first few weeks after people lost health insurance coverage. But then we discovered we were using a database that registered months of complete insurance coverage – so people were counted as not insured during the month they died. I’m grateful we didn’t publish that dramatic news. Another example: We found a disturbingly high number of people seen in emergency department visits for a suicide attempt had another visit with a diagnosis of self-harm during the next few days. That was alarming and disturbing. But then we discovered that the bad news was actually good news. Most of those visits that looked like repeat suicide attempts turned out to be timely follow-up visits after a suicide attempt. Another embarrassing error avoided.

While we try to publicize those lessons, the audience for those stories is narrow. There is no Journal of Silly Mistakes We Almost Made, but there ought to be. If that journal existed, research groups could establish their credibility by publishing detailed accounts of their mistakes and near-mistakes. As I’ve often said in our research team meetings: If we’re trying anything new, we are bound to get some things wrong. Let’s try to find or mistakes before other people point them out to us.

Greg Simon

Leave a Reply Cancel reply