Delivering on the Real Promise of Virtual Mental Healthcare

Many of our MHRN investigators were early boosters of telehealth and virtual mental health care.  Beginning over 20 years ago, research in our health systems demonstrated the effectiveness and value of telehealth follow-up care for depression and bipolar disorder, telephone cognitive-behavioral or behavioral activation psychotherapy, depression follow-up by online messaging, online peer support for people with bipolar disorder, and telephone or email support for online mindfulness-based psychotherapy

Even in our MHRN health systems, however, actual uptake of telehealth and virtual care lagged far behind the research.  Reimbursement policies were partly to blame; telephone visits and online messaging were usually not “billable” services.  But economic barriers were only part of the problem.  Even after video visits were permitted as “billable” substitutes for in-person visits, they accounted for only 5 to 10 percent of mental health visits for either psychotherapy or medication management.  Only a few of our members seemed to prefer video visits, and our clinicians didn’t seem to be promoting them.

Then the COVID-19 pandemic changed everything almost overnight.  At Kaiser Permanente Washington, virtual visits accounted for fewer than 10% of all mental health visits during the week of March 9th.  That increased to nearly 60% the following week and over 95% the week after that.  I’ve certainly never seen anything change that quickly in mental health care.  Discussing this with our colleague Rinad Beidas from U Penn, I asked if the implementation science literature recognizes an implementation strategy called “My Hair is on Fire!”  Whatever we call it, it certainly worked in this case.

My anecdotal experience was that the benefits of telehealth or virtual care included the expected and the unexpected.  As expected, even my patients who had been reluctant to schedule video visits enjoyed the convenience of avoiding a trip to our clinic through Seattle traffic (or the way Seattle traffic used to be).  Also as expected, there were no barriers to serving patients in Eastern Washington where psychiatric care is scarce to nonexistent.  It was as if the Cascade mountains had disappeared.  One unexpected bonus was meeting some of the beloved pets I’d only heard about during in-person visits.  And I could sometimes see the signs of those positive activities we mental health clinicians try to encourage, like guitars and artwork hanging on the walls.  It’s good to ask, “Have you been making any art?”.  But it’s even better to ask, “Could you show me some of that art you’ve been making?”

As is often the case in health care, the possible negative consequences of this transformation showed up more in data than in anecdotes.  My clinic hours still seemed busy, but the data showed that our overall number of mental health visits had definitely decreased.  Virtual visits had not replaced all of the in-person visits.  That’s concerning, since we certainly don’t expect that the COVID-19 pandemic has decreased the need for mental health care.  So we’re now digging deeper into those data to understand who might be left behind in the sudden transition to virtual care.  Some of our questions:  Is the decrease in overall visits greater in some racial or ethnic groups?  Are we now seeing fewer people with less severe symptoms or problems – or are the people with more severe problems more likely to be left behind?

As our research group was starting to investigate these questions, I got a message from one of our clinical leaders asking about a new use for our suicide risk prediction models.  They were also thinking about people with the greatest need being left behind.  And they were thinking about remedies.  Before the COVID-19 outbreak, we were testing use of suicide risk prediction scores in our clinics, prompting our clinicians to assess and address risk of self-harm during visits.  But some people at high risk might not be making visits, even telephone or video visits, during these pandemic times.  So our clinical leaders proposed reaching out to our members at highest risk rather than waiting for them to appear (virtually) in our clinics. 

Collaborating with out clinical leaders on this new outreach program prompted me to look again at our series of studies on telehealth and virtual care.  Those studies weren’t about simply offering telehealth as an option.  Persistent outreach was a central element of each of those programs. Telehealth is about more than avoiding Seattle traffic or Coronavirus infection.  Done right, telehealth and virtual care enable a fundamental shift from passive response to active outreach.

Outreach is especially important in these chaotic times.  During March and April, video and telephone visits were an urgent work-around for doing the same work we’ve always done.  Now in May, we’re designing and implementing the new work that these times call for.

Greg Simon

Pragmatic Trials for Common Clinical Questions: Now More than Ever

Adrian Hernandez, Rich Platt, and I recently published a Perspective in New England Journal of Medicine about the pressing need for pragmatic clinical trials to answer common clinical questions.  We started writing that piece last summer, long before any hint of the COVID-19 pandemic.  But the need for high-quality evidence to address common clinical decisions is now more urgent than we could have imagined.

Leaving aside heated debates regarding the effectiveness of hydroxychloroquine or azithromycin, we can point to other practical questions regarding use of common treatments by people at risk for COVID-19.  Laboratory studies suggest that ibuprofen could increase virus binding sites.  Should we avoid ibuprofen and recommend acetaminophen for anyone with fever and respiratory symptoms?   Acetaminophen toxicity is not benign.  Laboratory studies suggest that ACE inhibitors, among the most common medications for hypertension, could also increase virus binding sites.  Should we recommend against ACE inhibitors for the 20% of older Americans now using them daily?  Stopping or changing medication for hypertension certainly has risks.  Laboratory studies can raise those questions, but we need clinical trials to answer them with any certainty.

Pragmatic or real-world clinical trials are usually the best method for answering those practical questions.  Pragmatic trials are embedded in everyday practice, involve typical patients and typical clinicians, and study the way treatments work under real-world conditions.  Compared to traditional clinical trials, done in specialized research centers under highly controlled conditions, pragmatic trials are both more efficient (we can get answers faster and cheaper) and more generalizable (the answers apply to real-world practice). 

Pragmatic trials are especially helpful when alternative interventions could have different balances of benefits and risks for different people – like ibuprofen and acetaminophen for reducing fever.  Laboratory studies – or even highly controlled traditional clinical trials – can’t sort out how that balance plays out in the real world.

In our perspective piece, we pointed out financial, regulatory, and logistical barriers to faster and more efficient pragmatic clinical trials.  But the most important barrier is cultural.  It’s unsettling to acknowledge our lack of evidence to guide common and consequential clinical decisions.  Clinicians want to inspire hope and confidence.  Patients and families making everyday decisions about healthcare might be dismayed to learn that we lack clear answers to important clinical questions.  We all must do the best we can with whatever evidence we have, but we should certainly not be satisfied with current knowledge.  If we hope to activate our entire health care system to generate better evidence, we’ll probably need to provoke more discomfort with the quality of evidence we have now.

Inadequate evidence can also lead to endless conflict.  My colleague Michael Von Korff used to use the term “German Argument” to describe people preferring to argue about a question when the answer is readily available to those willing to look.  Michael was fully entitled to use that expression, since his last name starts with “Von.”  Germany, however, now stands out for success in mitigating the impact of the COVID-19 pandemic.  Even if Michael’s ethnic joke no longer applies, the practice of arguing rather than examining evidence is widespread.  The best way to end those arguments is to say “I really don’t know the answer.  How could we find out as quickly as possible?” 

Greg Simon

“H1-H0, H1-H0” is a song we seldom hear in the real world

Arne Beck and I were recently revising the description of one of our Mental Health Research Network projects. We really tried to use the traditional scientific format, specifying H1 (our hypothesis) and H0 (the null hypothesis). But our research just didn’t fit into that mold. We eventually gave up and just used a plain-language description of our question: For women at risk for relapse of depression in pregnancy, how do the benefits and costs of a peer coaching program compare to those of coaching from traditional clinicians? 


Our real-world research often doesn’t fit into that H1-H0 format. We aim to answer practical questions of interest to patients, clinicians, and health systems. Those practical questions typically involve estimation (How much?), classification (For whom?), comparison (How much more or less?) or interaction (How much more or less for whom?). None of those are yes/no questions. While we certainly care whether any patterns or differences we observe might be due to chance, a p-value for rejecting a null hypothesis does not answer the practical questions we hope to address.


When we talk about our research with patients, clinicians, and health system leaders, we never hear questions expressed in terms of H1 or H0. I imagine starting a presentation to health system leaders with a slide showing H1 and H0—and then hearing them all break into that song from the Walt Disney classic Snow White. Out in the real world of patients, clinicians, and health system leaders, “H1-H0” most likely stands for that catchy tune sung by the Seven Dwarfs.

As someone who tries to do practical research, my dissatisfaction with the H1-H0 format is about more than just language or appearance. There are concrete reasons why that orientation doesn’t fit with the research our network does.

Pragmatic or real-world research often involves multiple outcomes and competing priorities.  For example, we hope our Suicide Prevention Outreach Trial will show that vigorous outreach reduces risk of suicidal behavior. But we already know some people will be upset or offended by our outreach. Our task is to accurately estimate and report both the beneficial effects on risk of suicidal behavior and the proportion of people who object or feel harmed.  We may have opinions about the relative importance of those competing effects, but different people will value those outcomes differently. It’s not a yes/no question with a single answer. Our research aims to answer questions about “How much?”. Each user of our research has to consider “How important to me or the people I serve?”   

The H1-H0 approach becomes even less helpful as sample sizes grow very large.  For example, our work on prediction of suicide risk used records data for millions of patients. With a sample that large, we can resoundingly reject hundreds of null hypotheses while learning nothing that’s useful to patients or clinicians. In fact, our analytic methods are designed to “shrink” or suppress most of those “statistically significant” findings. We hope to create a useful tool that can reliably and accurately identify people at highest risk of self-harm. For that task, too many “significant” p-values are really just a distraction.

I’m very fond of how the Patient-Centered Outcomes Research Institute’s describes the central questions of patient-centered research: What are my options? What are the potential benefits and harms of those options? Those patient-centered questions focus on choices that are actually available and the range of outcomes (both positive and negative) that people care about. I think those are the questions our research should aim to answer, and they can’t be reduced to H1-H0 terms.

I am certainly not arguing that real-world research should be less rigorous than traditional hypothesis-testing or “explanatory” research. Pragmatic or real-world researchers are absolutely obligated to clearly specify research questions, outcome measures, and analytic methods—and declare or register those things before the research starts. Presentations of results should stick to what was registered, and include clear presentations of confidence limits or precision. In my book, that’s actually more rigorous than just reporting a p-value for rejecting H0. There’s no conflict between being rigorous and being practical or relevant.

To be fair, I must admit that the Seven Dwarfs song is more real-world than I gave it credit for. We often misremember the lyrics “Hi-Ho, Hi-Ho, it’s off to work we go.” But the Seven Dwarfs were actually singing “Hi-Ho, Hi-Ho, it’s home from work we go.” That’s the sort of person-centered song we might actually hear in the real world.

Greg Simon

Read Marsha Linehan’s Book!

If the Mental Health Research Network had a book club, we’d start with Marsha Linehan’s memoir, Building a Life Worth Living.

Marsha is the creator of Dialectical Behavior Therapy or DBT, a treatment approach once seen as heretical that’s now the standard of care for people at risk of self-harm or suicide. Her memoir is much more than an academic autobiography, describing her groundbreaking research at the University of Washington. It is her personal story of recovery and spiritual evolution. The history of DBT doesn’t begin with a post-doctoral fellowship or a research grant for a pilot study. Instead, it begins with Marsha’s descent into severe depression and relentless suicidal ideation, leading to a two-year inpatient psychiatric stay as “one of the most disturbed patients in the hospital.”

Marsha’s remarkable story pushed me to think about all sorts of questions regarding mental health research and mental health treatment: Why does our clinical research often focus on reconfirming the modest benefits of treatments that are so disappointing? How could the scientific peer review system welcome true innovation rather than comfortable confirmation? Do mental health clinicians’ traditional “boundaries” really serve to protect patients from harm – or more to protect clinicians from upset or inconvenience? How central are spirituality and religion to recovery from mental illness? And – where would we be today if Marsha Linehan had chosen a traditional religious order over psychology graduate school?

For me, the book’s most valuable lesson was understanding the dialectical center of DBT. Dialectical thinking – holding two seemingly opposite ideas at the same time – is central to Marsha’s treatment approach and her life story. Following the first rule of good writing (“Show, don’t tell”), Marsha generously describes her own intellectual and spiritual journey to dialectically embracing the tension between radical acceptance and hunger for change. Her message to her clients is “I accept you as you are; how you feel and what you do make perfect sense.” And her message is also “There is a more effective way to be who you are and feel what you feel. Wouldn’t you like to learn about it?” Both are completely true, at the very same time. The tension between acceptance and change is not a problem to be solved but a creative space to inhabit – or even dance inside of. Marsha also has some important things to say about dancing!

Marsha reveals her dialectical approach most clearly in describing her own mental health treatment. She endured over-medication, forcible restraint, and weeks spent in a seclusion room. Her descriptions of those traumas are vivid, but they include not the slightest tinge of blame or resentment. Instead, she gracefully expresses gratitude and compassion for her caregivers, knowing they were doing the best they could with the knowledge and skills they had at the time. That is truly radical acceptance. At the same time, Marsha was passionate for change. She vowed that “I would get myself out of hell – and that once I did, I would find a way to get others out of hell, too.” It seems that the mental health care system is Marsha’s last client. And I think she is practicing a little DBT on mental health clinicians like me – compassionately accepting all of our failings and flailings while showing us a better way.

Greg Simon

Let’s not join the Chickens**t Club!

Long before he became famous or infamous (depending on your politics) as FBI Director, James Comey served as US Attorney for the Southern District of New York.  That’s the office responsible for prosecuting most major financial crimes in the US.  Jesse Eisinger’s book, The Chickens**t Club, recounts a speech Comey made to his staff after assuming that high-profile post.  He asked which of his prosecutors had never lost a case at trial, and many proudly raised their hands.  Comey then said, “You are all members of what we like to call The Chickens**t Club.”  By that he meant:  You are too timid to take a case to trial unless you already know you will win.

I worry that our clinical trials too often follow the same pattern as those white-collar criminal trials.  When we evaluate new treatments or programs, we may only pursue the trials likely to give the answer we hope for.  That might mean testing an intervention only slightly different from one already proven effective.  Or testing a treatment in an environment where it’s almost certain to succeed.  Our wishes come through in our language.  Trials showing that a new treatment is superior are “positive”, while trials finding no advantage for a new treatment or program are “negative.”

Our preference for trials with “positive” results reflects how researchers are rewarded or reinforced.  We know that positive trials are more likely to be published than so-called negative trials.  Grant review panels prefer positive research that yields good news and gives the appearance of continuous progress.

Looking back on my career, I can see some trials that might have put me in The Chickens**t Club.  For example, we probably didn’t need to do several clinical trials of collaborative care for depression in one health system before taking that idea to scale.

But here’s where the analogy between investigators and prosecutors does not apply:  A prosecutor in a criminal trial is supposed to take sides.  An investigator in a clinical trial shouldn’t have a strong preference for one result or the other.  In fact, clinical investigators have the greatest obligation to pursue trials when the outcome is uncertain.  “Equipoise” is the term to describe that balanced position.

Greg Simon

From 17 years to 17 months (or maybe 14)

Most papers and presentations about improving the quality of health care begin by lamenting the apocryphal 17-year delay from research to implementation.   The original evidence for that much-repeated 17-year statistic is pretty thin: a single table published in a relatively obscure journal.  But the 17-year statistic is frequently cited because it quantifies a widely recognized problem.  Long delays in the implementation of research evidence are the norm throughout health care.

Given that history, I’ve been especially proud of our MHRN health systems for their rapid and energetic implementation of suicide prevention programs.  This month, Kaiser Permanente Washington started implementing MHRN-developed suicide risk prediction models.  Therapists and psychiatrists at our Capitol Hill Mental Health clinic in Seattle now see pre-visit alerts in the electronic health record for patients who are at high risk of a suicide attempt over the next 90 days.  Implementation started only 17 months ─ not 17 years ─ after we published evidence that those models accurately predict short-term risk of suicide attempt or suicide death.

I certainly have experience developing other evidence-based interventions that are still waiting for wide implementation – some for longer than 17 years.  For those interventions, our published papers, PowerPoint presentations, and impressive p-values never prompted much action.  Our experience with suicide risk prediction models has been almost the opposite: health system leaders are pushing for faster implementation.  For months we’ve received frequent queries from regional and national Kaiser Permanente leaders: “When are you rolling out those risk prediction scores in our clinics?” 

In the middle of celebrating Kaiser Permanente’s rapid implementation of risk prediction models, I learned that our colleagues at HealthPartners were three months ahead of us.  I hadn’t realized that health plan care managers at HealthPartners have been using MHRN risk prediction scores to identify people at high risk for suicidal behavior since July.  In true Minnesota fashion, they just did the right thing with no boasting or fanfare.  Reducing the publication-to-implementation gap from 17 years to 14 months doesn’t have quite the same poetic symmetry as reducing it to 17 months.  But I’ll take faster progress over poetic symmetry.

Why did we see so little delay in our health systems’ implementation of suicide risk prediction models and other suicide prevention programs?  We didn’t need to do any marketing to create demand for better suicide prevention.  Health system leaders were clear about priorities and problems needing solutions.  The specific demand regarding risk prediction models was clear:  The questionnaires providers were using to identify suicide risk were better than nothing, but far short of satisfactory.  Knowing the limitations of the tools they were using, clinicians and health system leaders asked:  Can you build us something better?  We weren’t telling health system leaders what we thought they needed.  We were trying to build what they told us they needed.

When I now hear researchers or intervention developers lament the 17-year delay, I ask myself how that complaint might sound in some other industry.  I imagine giving this report to a board of directors: “We developed and tested an innovative product that our customers really should want.  We’ve spent 17 years telling them about it, but our market share is still trivial.  What’s wrong with our customers?”  I doubt the board of directors would say, “Let’s keep doing the same thing!”  Instead, they’d probably ask, “What are our customers trying to tell us about what they need?”

Greg Simon

Outreach is Meant for the People Left Outside!

Several years ago, Evette Ludman and I undertook a focus group study to learn about early dropout from psychotherapy.  We invited health system members who had attended a first therapy visit for depression and then did not return.  Only about one-third of people we invited agreed to speak with us, but that’s a pretty good success rate for focus group recruitment.

We soon learned, however, that the one-third of people who joined our focus group were not the people we needed to hear from.  Many were veterans of long-term psychotherapy who had returned for a single “refresher” visit.  Some had been seeing therapists in private practice (using other insurance) and scheduled a visit with a new therapist to explore other options.  None were people starting treatment for depression who gave up after a single visit.  Our first focus group turned into a hypothetical discussion of why some other people might give up on therapy after just one visit. 

In retrospect, we should have realized that we wouldn’t learn much about giving up on psychotherapy from people who volunteer to join a focus group about psychotherapy.  People living with depression who are frustrated or discouraged about treatment don’t tend to become motivated research volunteers.  We probably should have published something about that experience, but I’m still waiting for someone to establish The Journal of Instructive Failure.

That instructive failure, however, did shape our subsequent research about outreach to increase  engagement in mental health treatment.  Outreach and engagement interventions have been a major focus of our research, but we don’t study engagement interventions among people who are already engaged.  We aim to reach people who are disconnected, discouraged, and convinced that treatment has nothing to offer.  Volunteering to participate in research to increase engagement in treatment should probably make someone ineligible for research on that topic.  For example:  If we hope to learn whether outreach to people at risk can reduce suicide attempts, we certainly shouldn’t limit our research to people who volunteer for a study of outreach to prevent suicide attempt.

If we hope to find those who have been lost, we’ll have to look outside of the bright light under the lamppost.  So our studies of outreach or engagement interventions follow a “randomized encouragement” design.  We identify people who appear to need services but are not receiving them.  We randomly assign some people to receive extra outreach, such as messages and phone calls to offer support and problem-solve barriers to getting mental health care.  The rest continue to receive their usual care. 

That real-world research design answers the question we care about:  Among people who appear to have unmet need, will implementing an outreach intervention increase engagement in treatment – and ultimately lead to better outcomes.  That’s the design of our MHRN Suicide Prevention Outreach Trial, testing two outreach interventions for people at risk of suicidal behavior.  And it’s the design of our MHRN pilot study of Automated Outreach to Prevent Depression Treatment Dropout, testing systematic outreach to people who appear to have discontinued medication or dropped out of psychotherapy.

That real-world randomized encouragement design does impose some requirements, but I think they are features rather than bugs.  First, we must be able to identify people with unmet need before they ask us for help.  That’s been a central focus of our MHRN research, including our recent research on predicting suicidal behavior.  Second, we must be able to use health system records to assess any impact or benefit.  Relying on traditional research interviews or surveys would take us back to the problem of assessing outreach or engagement among people who volunteer to participate in research interviews or surveys.  Third, any benefit of an outreach or engagement intervention is diluted by absence of benefit in those who do not participate.  But that diluted effect is the true effect, if what we care about is the real-world effect of an outreach program.

Outreach interventions might seem to work well right under the lamppost, but that’s not where people get lost or left out.

Greg Simon

Marianne Williamson vs. the DSM-5

Psychiatric epidemiology has become a Presidential campaign issue!  Marianne Williamson has taken some heat for her past claim that diagnosis of clinical depression is “such a scam.”  She has backed away from that statement, but she’s stood by her point that “There is normal spectrum of human despair; it is a spiritual, not a medical issue.” And she’s stood by her claim that antidepressants are over-prescribed when “people are simply sad.”  Is there really any difference between depression and ordinary sadness?  Are antidepressants being prescribed inappropriately for “normal human despair”?

The American Psychiatric Association’s Diagnostic and Statistical Manual (or DSM-5) attempts to draw a line between depressive disorders and “periods of sadness that are inherent aspects of the human experience”.  But the line DSM-5 draws can seem artificial.  Why is a major depressive episode defined by five of nine specified symptoms rather than by four or six?  And why must that episode last at least two weeks rather than 13 days or 15 days?  There are no sharp boundaries, and Williamson’s critique about a “normal spectrum of human despair” seems reasonable.

But I think Marianne Williamson’s dismissal of depression diagnoses is built on a false premise: that real “medical” disorders are defined by sharp boundaries or discontinuities.  In fact, most chronic diseases lie at one end of a normal spectrum.  Diabetes, for instance, lies at the high end of the fasting glucose spectrum.  Hypertension, to offer another example, lies at the high end of the blood pressure spectrum.  If we require a sharp boundary or discontinuity to define a chronic illness, then most major causes of disability and premature death would be re-classified as part of a normal spectrum.  In general, chronic illnesses are not caused by a single genetic error or by unbounded growth of a single rogue cell.  Instead, chronic illnesses typically result from faulty regulation of complex systems.  Whether or not faulty regulation leads to illness and disability depends on social environment, life events, health behaviors, and (as Williamson has noticed) spiritual practices.  That’s true of depression, but it’s also true of heart disease or diabetes.

Williamson has pointed out that “There is no blood test” to diagnose depression.  But the same is true of Parkinson’s Disease.  The brain is more complicated than the liver or kidneys, and blood tests are just not useful to assess how it is functioning. 

While I think Williamson’s arguments regarding diagnosis of depression are fundamentally flawed, I also see fundamental problems with the DSM-5 (or ICD-10) scheme for separating depressive disorders from ordinary human experience.  DSM-5 criteria for diagnosing depression use one set of rules to address two very different, albeit related, questions. 

The first question is about the essential features of depression.  While the borders of depression may not be sharp, the core is well defined.  The central psychological, behavioral, and physiological features of depression are remarkably consistent across language, culture, and social environment.  Depression is not a social construction or a modern spiritual malaise.  Instead, it’s a well-described disorder and the leading cause of disability worldwide.  The DSM-5 gets that question right.

The second question is about drawing the line where depression becomes an illness or a disorder.  Not only are there no sharp boundaries, but the answer depends on the situation.  When is depression severe enough to warrant treatment?  That depends on the potential benefits and risks of the treatment.  The DSM-5 can’t get that question right, because the answer is “It depends.”  The same question about a severity threshold for diagnosis or treatment applies to hypertension, diabetes, or nearly any chronic illness.  And the answer is “It depends.”

Regarding Williamson’s claim that antidepressants are over-prescribed for “normal sadness”, we can point to some relevant epidemiology research.  Data from our MHRN health systems find very little prescribing of antidepressants for minimal or mild symptoms of depression.  While experts may disagree on the specific threshold above which benefits of antidepressants are clear, most people starting antidepressant medication have moderate or severe symptoms by our standard measures.

I’m a clinician and a researcher, so I’m reluctant to wade into the intersection of politics and spirituality.  That’s risky territory, so I’ll step into it cautiously.  I’ll stay in that territory just long enough to pull the diagnosis of depression back where it belongs – into the boring terrain of epidemiology.  For better or worse, the upcoming Presidential debates will probably not include any epidemiology questions.

Greg Simon

Who Decides when Science is Junk?

Photo by Ted Reingold. Creative Commons permission for non-commercial use with attribution.

During the last month, I found myself in several conversations about promoting open science.  We hope that sharing data, research methods, and early results will make research more rigorous and reproducible.  But those conversations all turned to the fear that data or preliminary results could be misinterpreted or even deliberately misused.  How can we protect the public from misleading “junk science”?

Some examples: The NIH All of Us Research Program will create an online data enclave where researchers from around the world can analyze questionnaires, medical records, and genomic data from hundreds of thousands of Americans.  What will protect against dredging through millions of possible genetic associations to find spurious evidence for racist theories?  The medRxiv online platform aims to accelerate scientific discovery and collaboration by posting research results prior to peer review.  What will protect against posting and publicizing flawed or fraudulent research that peer review would filter out?  Our Mental Health Research Network hopes to share data so other researchers can develop new methods for identifying risk of suicidal behavior.  What will protect against naïve or intentional conflation of correlation with causation?  People who get more mental health treatment will be more likely to attempt suicide, but describing the correlation in that order is misleading. 

Will open science inevitably lead to more junk science?

Our potential protections against junk science include both prevention and remediation.  Prevention depends on gatekeeping by institutions and designated experts.  We hope to prevent creation of junk science by peer review of research funding.  We hope to prevent dissemination of junk science by peer and editor review of journal manuscripts.  Remediation or repair happens after the fact and depends on the collective wisdom of scientists, science journalists, and policymakers.  We hope that the wisdom of the scientifically informed crowd will elevate credible research and ignore or discredit the junk. 

I am generally skeptical about intellectual or scientific gatekeeping, especially when it involves confidential decisions by insiders.  Our academic gatekeeping processes intend to identify and exclude junk science, but they often fall short in both sensitivity and specificity.  Junk certainly does get through; the peer-reviewed medical literature includes plenty of biased or seriously flawed research.  If you want me to cite examples, that would have to be an off-the-record conversation!  And peer review sometimes excludes science that’s not junk, especially if methods are unconventional or findings are unwelcome.  Those who created conventional wisdom tend to reject or delay challenges to it.

But abandoning gatekeeping and relying on after-the-fact remediation seems even more problematic.  Recent disinformation disasters don’t inspire confidence in the scientific wisdom of crowds or our ability to elevate good science over newsworthy junk.  Medical journals are certainly not immune to the lure of clickbait titles and social media impact metrics.  Media reporting of medical research often depends more on dramatic conclusions than on rigorous research methods.  Systematic comparisons of social media reporting with scientific sources often find misleading or overstated claims.  Plenty of discredited or even fraudulent research (e.g. vaccines causing autism) lives forever in the dark corners of the internet.  To paraphrase a quotation sometimes attributed to Mark Twain: Junk science will be tweeted halfway around the world before nerds like me start clucking about confounding by indication and immortal time bias.

I think some gatekeeping by designated experts will certainly be necessary.  But we can propose some strategies to improve the quality of gatekeeping and reduce bias or favoritism.  Whenever possible, gatekeeping decisions should be subject to public scrutiny.  Some journals now use an open peer review process, publishing reviewers’ comments and authors’ responses.  Could that open process become the norm for peer review of research proposals and publications?  Whenever possible, gatekeepers should evaluate quality of ideas rather than the reputations of people who propose them.  Should all journal and manuscript reviewers be blinded to authors’ identities and institutional affiliations?  Whenever possible, gatekeeping decisions should be based on quality of methods rather than comfort with results.  Could journals conduct reviews and make publication decisions based on the importance of the study question and rigor of the research methods – before viewing the results?  Most important, gatekeepers should cultivate skepticism regarding conventional wisdom.  Perhaps the criteria for scoring NIH grant applications could include “This research might contradict things I now believe.”  To be clear – that would be a point in favor rather than a point against!

Greg Simon

Who Owns the Future of Suicide Risk Prediction?

PLAY INTERVIEW 

On my plane rides to and from a recent meeting about big data in suicide prevention, I finally read Jaron Lanier’s 2013 book Who Owns the Future?  If you’ve read it -or read much about it – you can skip the rest of this paragraph.  Lanier is a legendary computer scientist often cited as the creator of virtual reality.  The first half of his book (my eastbound reading) is a Jeremiad about the modern information economy de-valuing human work and hollowing out the middle class.  Jobs are disappearing, income inequality is increasing, and personal information is devoured and monetized by big tech companies.  While Lanier gives a free pass to scientific users of big data, I think some of his criticisms still apply to our work.  The second half of the book (my westbound reading) proposes a solution.  It’s a new economic model rather than a new regulatory structure.  Lanier argues that those who create useful information should be paid by those who profit from it.  If you discover a secret shortcut around rush-hour traffic, then Google Maps may route other people to follow you.  Your shortcut might be ruined tomorrow, but your bank account would show a micro-payment from Google for “selling” your discovery to other drivers.  If you are an expert discoverer of efficient driving routes (like my wife), then Google might pay you over and over for the valuable driving data you create.    

Of course, individuals can have different preferences about sharing information.  In Lanier’s scheme, each person could own their data and set their own price.  Some people could choose to donate data for public good – not expecting any compensation.  That would be their perfect right.  At the other extreme, some people could set a price so high that no one would be willing to pay.  That would also be their perfect right.

In contrast to many big data skeptics, Lanier does not argue for requiring “opt-in” consent for data use.  In fact, he argues that the “Terms and Conditions” and “End User License Agreements” we click through so frequently are the antithesis of informed consent.  If we are asked for permission more and more often, we will just pay less and less attention to each request.  He proposes fair compensation as a just and more respectful alternative.

Lanier does admit that micro-payments for data use could be more of a thought experiment than a practical solution.  Even if our thousands of daily decisions create value for someone else, we don’t have time to send out all of those tiny invoices.  But Lanier thinks the internet could manage all that accounting for us – if we decided that it should.  And he certainly knows more than I do about what the internet can do.

Even if Lanier’s proposal is just a thought experiment, it did get me thinking.  His central point is that most so-called artificial intelligence is simply re-packaging of decisions by large numbers of old-fashioned intelligent humans.  People discover and create.  Machines just copy and aggregate.  Our current economic rules reward the copying and aggregating rather than the original discovery and creation.  If humans created the original value, shouldn’t they share the eventual profits? 

At first, I couldn’t see how Lanier’s proposed solution would apply to our use of health records data for suicide prevention or other public health research.  We don’t sell or profit from health information.  Answering scientific or public health questions won’t destroy middle-class jobs in health care.  But (one of the few good things about a 5-hour plane ride), I eventually saw how his proposal applies.  Even if no money changes hands when we use health data, we could do a better job acknowledging patients as the true creators or authors of health knowledge.  My friends at the Depression and Bipolar Support Alliance helped me understand that people who live with mood disorders are “experts by experience.” That expertise often goes unrecognized.

If we are using health records data to predict or understand suicidal behavior, then data from every member of our health systems could be useful.  But data from people who attempted suicide or died by suicide would be especially valuable.  We’d also highly value data from people at high risk who did not attempt or die by suicide.  Living with suicidal thoughts is hard work. 

Lanier’s proposed economic model reminds us who created the expertise that lives in health data.  Since it’s close to the 4th of July, I can’t resist a reference to “of the people, by the people, and for the people.” 

Greg Simon