Can you see me now?

As our health systems prepare to implement statistical models predicting risk of suicidal behavior, we’ve certainly heard concerns about how that information could be misused.  Well-intentioned outreach programs could stray into being intrusive or even coercive.  Information about risk of self-harm or overdose could be misused in decisions about employment or life insurance coverage.  But some concerns about predictions from health records are unrelated to specific consequences like inappropriate disclosure or intrusive outreach.  Instead, there is a fundamental concern about being observed or exposed.  When we reach out to people based on risk information in medical records, some are grateful for our concern, and some are offended that we know about such sensitive things.  And some people have both reactions at the same time.  The response is along the lines of “I appreciate that you care, but it’s wrong that you know.”  It’s being observed or known that’s the problem, even if nothing is ever said or done about it.
Our instinctual ambivalence about being known or observed was not created by big data or artificial intelligence.  In fact, that ambivalence is central to the oldest story that many of us have ever heard:  Eve and Adam’s fall from being caringly understood to being shamefully exposed.  Both the upside and the downside of being observed are also central to many of our iconic modern stories.  The benevolent portrayal of continuous observation and risk prediction is Clarence, the bumbling guardian angel who interrupts George Bailey’s suicide attempt in It’s a Wonderful Life.  The contrasting malevolent portrayal of observation and behavioral prediction includes Big Brother in George Orwell’s 1984 and the “precogs” in the 2002 film Minority Report (based on a science fiction short story from the 1950s).

Big data and artificial intelligence do, however, introduce a new paradox: The one who observes us and predicts our behavior is actually just a machine.  And that fact could make us feel better or worse.  An algorithm to predict suicide attempt or opioid overdose might consider all sorts of sensitive information and stigmatized behaviors to identify people at increased risk.  But the machine applying that algorithm would not need to reveal, or even retain, any of that sensitive information.  The predicting machine would only need to alert providers that a specific patient is at elevated risk at a specific time.  So we might feel reassured that predictive analytics can improve quality and safety while protecting privacy.  But the reassurance that “it’s just a machine” may not leave us feeling safe or cared for.  Ironically, we may be more likely to welcome the caring attention of Clarence the guardian angel exactly because he is very human – prone to error and unable to keep a secret.

Privacy is complicated, and we often have conflicting feelings about being known or understood.  Moreover, our feelings may not match with an “objective” assessment of exposure or risk.  But feelings about privacy matter, especially when we hope to maintain the trust of members our health systems serve.  We are in for some interesting conversations. 

MHRN Blog World Cup Edition: What Soccer Referees Know about Causal Inference

When Nico Lodeiro falls down in the penalty area, I hold my breath waiting for the referee’s call.  Was it really a foul – or just Nico simulating a foul?  The stakes are high.  If the referee calls it a foul, it’s a penalty kick and likely goal for my Seattle Sounders.  If she calls it a simulation or a dive, then it’s a yellow card warning for Nico.  After the call, we all watch the slow-motion replay.  I used to be surprised at how often the refs got it right until a referee friend of mine explained what the refs are looking for.

It goes back to Newton’s first law from high school physics: a body in motion will remain in motion unless acted on by an external force.  If Nico was actually shoved, then his speed or direction will change.  If instead he just threw himself to the ground, then his overall momentum won’t change.  If he throws his shoulders in one direction to simulate a foul, then his arms or his hips have to move in the other direction.  With no external force, his center of gravity continues on the same path.  That’s how the experienced referee distinguishes a foul from a dive.

When we use health records data to infer causation (e.g. Did a specific treatment make things better or worse?), we also depend on Newton’s first law.  We believe that an individual’s mental health path will maintain the same momentum unless it’s acted on by some external force.  That force could be a positive or negative effect of treatment or some positive or negative life event.  We try to predict an individual’s mental health path or trajectory so we can detect a change in speed or direction.

But mental health is more complicated than high school physics.  Any individual’s center of gravity is harder to pin down.  Our measures of mental health speed and direction are much less accurate.  Mental health paths are often not straight lines.  And many forces are often acting at the same time.  So we would be foolish to infer causation from a single case.  For any individual, our calls about causal inference will never be as accurate as a good soccer referee.  But we hope to make better calls by improving the accuracy of our measurement and by aggregating data across many, many cases.  

We’ve been surprised by some of our recent efforts to predict mental health paths or trajectories.  Risk of suicide attempt or suicide death after a mental health visit is surprisingly predictable.  So we expect to be able to detect the effects of external forces – like positive or negative effects of treatments – on risk of suicide attempt.  In contrast, probability of significant improvement after starting psychotherapy for depression is surprisingly unpredictable.  If we hope to detect effects of more or less effective psychotherapy, we’ll have to first make better predictions.  Going back to the soccer story, we would need a clearer view of exactly how Nico was moving before he fell.

Greg Simon

Suicide Risk Prediction Models: I Like the Warning Light, but I’ll Keep My Hands on the Wheel

Our recent paper about predicting risk of suicidal behavior following mental health visits prompted questions from clinicians and health system leaders regarding practical utility of risk predictions.  Our clinical partners asked, “Are machine learning algorithms accurate enough to replace clinicians’ judgment?”  My answer was, “No, but they are accurate enough to direct clinicians’ attention.”

Our 12-year old Subaru is now passed down to my son, and we bought a 2018 model.  The new car has a “blind spot” warning system built in.  When it senses another vehicle in my likely blind spot, a warning light appears on the outside rearview mirror.  If I start to merge in that direction anyway, the light starts blinking, and a warning bell chimes.  I like this feature a lot.  Even if I already know about the other car behind and to my right, the warning light isn’t too annoying or distracting.  The warning light may go on when there isn’t a car in my blind spot – a false positive.  Or it may not light up when there is a car in that dangerous area – a false negative.  But the warning system doesn’t fall asleep or get distracted by something up ahead.  It keeps its “eye” on one thing, all the time.

I hope that machine learning-based suicide risk prediction scores could work the same way.  A high-risk score would alert me and my clinical colleagues to a potentially unsafe situation that we might not have noticed.  If we’re already aware of the risk, the extra notice doesn’t hurt.  If we’re not aware of risk, then we’ll be prompted to pay attention and ask some additional questions.  There will be false positives and false negatives.  But risk prediction scores don’t get distracted or forget relevant facts.

We are clear that suicide risk prediction models are not the mental health equivalent of a driverless car.  We anticipate that alerts or warnings based on calculated risk scores would always be delivered to actual humans.  Those humans might include receptionists who would notice when a high-risk patient cancels an appointment, nurses who would call when a high-risk patient fails to attend a scheduled visit, or treating clinicians who would be alerted before a visit that a more detailed clinical assessment is probably indicated.  A calculated risk score alone would not be enough to prompt any “automated” clinical action – especially a clinical action that might be intrusive or coercive.  I hope and expect that our risk prediction methods will improve.  But I am skeptical they will ever replace a human driver.

My new car’s blind spot warning system does not have control of the steering wheel, the accelerator, or the brake pedal.  If the warning is a false positive, I can choose to ignore or override it.  But that blinking light and warning chime will get my attention.  If that alarm goes off, I’ll take a close look behind me before deciding that it’s actually safe to change lanes.

Greg Simon

Why I’ll Join the All of Us Research Program

NIH’s All of Us Research Program officially launched on Sunday, May 6th. It’s an ambitious national effort to bring together at least one million people from across the U.S. in a long-term study of health across the lifespan.  All of Us is not just a biobank or a genetic study. It’s a 360-degree view of health and disease, with just as much attention to environment as genetics and just as much attention to resilience as to vulnerability.

Our Mental Health Research Network has several connections to the All of Us program.  Henry Ford Health System and Baylor Scott & White Health are participating as health system partners.  Brian Ahmedani co-leads one of the health system networks.  I serve on the external advisory panel.

All of Us has been building toward this launch for more than a year.  A wide range of patient and community organizations have participated in every stage of planning.  Innovative communication tools were developed to make sure that informed consent is truly informed (and not just “a form”). Online surveys and mobile apps have been tested and refined to make them both accessible and secure. The pilot phase enrolled over 27,000 participants, and over two-thirds of them were from groups traditionally under-represented in biomedical research.

With all of that preparation complete, it’s an unfortunate coincidence that All of Us launches following a series of news stories about threats to privacy, including harvesting of Facebook data and law enforcement use genealogy websites. Of course, none of those events are directly relevant to participating in All of Us.   

This last year of preparation has focused on avoiding events like those.  That time was spent on extensive testing of information security, learning to communicate clearly and effectively about data privacy, and listening to potential participants about appropriate and inappropriate uses of their health and genetic information. Participants’ private information has special legal protection through a Certificate of Confidentiality. Leaders of the program are not naïve, so they are very thorough.

Knowing what I know, those recent news stories won’t deter me from joining the All of Us program as soon as I can. I will complete the baseline survey and share my medical records. I’ll contribute my blood samples when there’s a collection site close to me, and I’ll download the All of Us mobile app. I may not directly gain by participating, but I’m confident I have nothing to lose. I might actually have something useful to share, but no one will know unless I share it with All of Us.

Greg Simon

Advice To Young Researchers: Don’t Find Your Niche

University-based researchers often contact our network, hoping to do research in our health systems to evaluate new interventions or programs.  Those new interventions or programs are usually specific adaptations of treatments already proven to work.   Typical examples (slightly anonymized) include:  a care management program for people with depression and rheumatoid arthritis, a mobile health intervention teaching mindfulness skills after psychiatric hospital discharge, or a training program to help clinicians provide structured psychotherapy specific to bereavement.

When we bring these very specific ideas to leaders of our healthcare systems, they are not enthusiastic.  While they are very interested in care management for depression, they are less interested in a program specific to depression and arthritis.  They are certainly interested in mobile health programs and in mindfulness skills, but less interested in a program limited to the month after psychiatric hospitalization.  And they are definitely interested in training to deliver empirically supported psychotherapies, but less interested in condition-specific training packages. To summarize: they are interested in innovation, but only if that innovation can be affordable, widely acceptable, and broadly available across our health systems.

Researchers focused on narrower questions are following the standard advice to all young mental health or health services researchers:  Find your niche.

The traditional academic career path certainly encourages specialization.  During graduate school, an aspiring researcher might join a research team studying collaborative care interventions for depression in people with chronic medical illness.  During her postdoctoral fellowship, she works in the medical center’s Rheumatology clinic and customizes collaborative care materials for people with arthritis.  She writes a career development application to refine and pilot-test collaborative care interventions for people with depression and arthritis.  NIH reviewers point out that different types of arthritis are too heterogeneous, so she revises her approach to focus on rheumatoid arthritis specifically.  And then she contacts us about doing this research in MHRN health systems.  She has found her niche…. But leaders of our health systems are just not interested.

If you look at the career paths of our MHRN investigators, you’d think that none of us have found a niche.  You’d find a health psychologist who began studying school-based health promotion and now studies implementation and health disparities.  And you’d find a social worker who leads a large precision medicine consortium.  And a geriatric psychiatrist studying EHR decision support and psychosocial interventions to prevent suicide attempt.  Over time, they have traveled across diverse clinical topics, research methods, and care settings.  I expect they will keep traveling.  They are opportunists in the most positive sense of the word, always asking “What’s the most important problem that I could actually help with in the next few years?”

Maybe our niche is just to be as useful as we can.  Not all who wander are lost. And sometimes they find useful things.

Greg Simon

When does caring cross over into creepy?

A recent news article about the European Union’s new privacy rules prompted me to think more about population-based suicide prevention programs. Caring outreach that respects privacy is a difficult balance.

Our health systems are in various stages of implementing systematic programs to identify people at high risk of self-harm or suicide.  These programs are triggered by the standard depression questionnaires our patients complete before clinic visits. Whenever a patient reports frequent thoughts of death or self-harm, the treating clinician is expected to ask follow-up questions about suicide risk. That seems only natural; most patients who report thoughts of suicide would be unpleasantly surprised if no one bothered to ask. 

Our Suicide Prevention Outreach Trial extends that caring outreach, but with an added layer of separation.  Outreach specialists follow up on those visit questionnaires with an online message or phone call during the following week.  Most people are grateful for that extra outreach, but a few respond, “Who are you? How do you know about what I told my doctor in private?” 

Our health systems are considering new programs with one more level of separation.  Outreach messages or calls could be triggered by an algorithm that identifies risk from health records data. In other words, the program could be triggered by, “something a computer found that I never even told my doctor”.  The outreach calls or messages would come from a stranger (albeit a kind stranger) representing the health system. I suspect we’ll hear a few more complaints about invasions of privacy.  But should that stop us from trying?

Of course, serving up personalized invitations based on machine learning algorithms is the core business model of social media companies. While invitations from social media apps usually ask us to buy products and services, social media superpowers can also be used for good.  For example, Facebook (with help from our colleague Ursula Whiteside) has developed caring outreach interventions for people at risk for suicide. Facebook’s caring outreach interventions were originally directed at people identified by other human Facebook users.  That program can now be activated by algorithms that continuously monitor a range of data types: text or photos in Facebook posts, comments from friends, and even Facebook Live audio and video streams

Except in Europe. European data protection rules strictly limit how personal data can be used or shared without explicit consent. Facebook worries that mining social media data to identify people at risk for self-harm would violate those rules. So Facebook’s algorithm-driven suicide prevention outreach won’t be implemented for European users. I think that’s unfortunate. But I have to acknowledge the legitimate view that it’s intrusive or creepy to mine social media data and flag people at risk for self-harm.

Whether outreach is supportive or intrusive depends, of course, on what we do when we reach out. In our outreach programs, we are clear about the boundaries. We reach out to express concern, offer support, suggest resources, and facilitate connection with care. If people ask to be left alone, we stop.  

Nevertheless, some people are bothered by the fact that strangers are drawing conclusions based on information that’s usually private – especially conclusions about things (like suicidal thoughts) that are often stigmatized.  Some people will be offended that we know (or think we know) they are at risk, regardless of what we do or don’t do about it.

If you have followed the discussion about European privacy regulations, you may recognize one rallying cry of the privacy advocates: “The right to be forgotten.” That slogan asserts a right to control – or even erase –  any private information.  But that slogan feels a bit eerie when we apply it to suicide risk. Being forgotten sounds too much like the isolation or disconnectedness that increase risk for suicide. When it comes to suicide prevention, I’d prefer not to acknowledge that absolute European right to be forgotten. I’ll remain a perky (and possibly creepy) American, saying, “Excuse us, but we won’t forget about you!”

Greg Simon

Alexa, should I increase my dose of Celexa?

The evolution of depression care management programs can be described in terms of task shifting. Initial Collaborative Care programs actually shifted some tasks up to specialty providers.  Psychiatrists and psychologists joined the primary care team and assumed responsibility for routine follow-up of antidepressant treatment.

After that, the task shifting was all downhill. Care managers, either nurses or masters-prepared mental health clinicians, took on responsibility for outreach and care coordination. Then those tasks (including actual psychotherapy) shifted from in-person visits to briefer telephone contacts. Finally, follow-up of antidepressant treatment shifted to online messaging, relying on human care managers supported by simple decision rules.

Our MHRN Automated Outreach pilot project will take that task shifting one step further. People overdue to refill an initial antidepressant prescription will receive an automated outreach message, including an assessment of depressive symptoms, current medication use, and side effects. Assessment responses pass through a simple algorithm (including 29 possible clinical scenarios) to generate an automated response – with advice ranging from, “We are happy you’re doing well, and we will check with you again in a few weeks” to, “It sounds like you are still having significant problems with depression.  We recommend you should contact your doctor about trying some different treatment.”

Last week, Amazon announced a new collaboration with JP Morgan and Berkshire Hathaway to develop new delivery models to lower healthcare costs. The discussion about care processes ripe for disruption specifically cited the traditional requirement to see a physician in order to renew a prescription. I immediately thought of our Automated Outreach project. We already know a good bit about convenient, efficient, and effective models for routine follow-up of new antidepressant prescriptions.  I suspect that Alexa might improve on the tools we already have.  For example, more people might respond to Alexa’s live voice than to our plain text messages sent through the EHR patient portal. And Alexa can hear tone of voice when people respond. Amazon’s predictive analytics could probably improve on our 29-line algorithm for treatment adjustment.

Maybe it is time for some more radical task-shifting: turning routine follow-up of a first antidepressant prescription over to Alexa (or Siri or Cortana or OK Google).  Simple automated outreach programs would often be adequate, and would certainly be more convenient.  We could then reserve expert clinicians for more severe or complicated mental health problems. Unfortunately, there are more than enough of those complicated problems to go around.

Greg Simon

There might be no fish. But again, well, there might!

In the MHRN Suicide Prevention Outreach Trial, our coaches patiently and persistently reach out to people at risk for suicide attempt.  Coaching messages offer support and encouragement, with specific reminders to use our online program teaching skills for emotion regulation.

The coaches’ work can be discouraging.  Many people do respond to outreach messages with gratitude and enthusiasm.  A few angrily request to be left alone.  But often, there is no response at all.  Months of repeated outreach messages can yield nothing but silence.

After a year of outreach, coaches send a closing message explaining that the program will end soon.  That closing message sometimes does prompt a response.   After months of silence, a return message will sometimes describe how important and sustaining those many outreach messages were.  As we share those stories in our coaches’ meetings, we often say “You never can tell how much those outreach messages might help!” 

Those discussions reminded me of Dr. Seuss’s McElligot’s Pool.   I was the only person in our group old enough to remember the book, and I could only recall a few lines.  So I ordered a copy.  (Recommendation:  If you order Dr. Seuss on Amazon, the suggestions to your account get happier!)

When I re-read McElligot’s Pool after many years, the parallels to our outreach work were even clearer than I remembered.  The book opens with Young Marco fishing in a tiny pond.  He receives some tough-sounding advice:

“Young man,” laughed the farmer, “You’re sort of a fool!
You’ll never catch fish in McElligot’s Pool!
The pool is too small.  And, you might as well know it.
When people have junk, here’s the place that they throw it.”

That advice perfectly captures the discouragement we can feel after repeated outreach with no response.  Our work often focuses on people whose lives first appear small and filled with junk. 

Marco is thoughtful, but not discouraged.  He knows that he cannot always understand or predict what he cannot see:

“Hmm…” answered Marco.  “Maybe you’re right.
I’ve been here three hours without one single bite.
There might be no fish. But again, well, there might!
Cause you never can tell what goes on down below.
This pool might be bigger than you or I know!”

Marco then imagines all of the marvelous things that might be moving beneath the surface.  He allows that the small pool he can see might actually connect to the wide ocean.  Fish might be heading his way from the tropics or even the North Pole.  He imagines marvelous and exotic creatures never before seen or even described.  And, since this is Dr. Seuss, they all rhyme.

McElligot’s Pool ends just as it should.  Marco still hasn’t had one single bite.  But he is smiling confidently and not at all discouraged.

Oh, the sea is so full of a number of fish.
If a fellow is patient he might get his wish!
And that’s why I think that I’m not such a fool
When I sit here and fish in McElligot’s Pool!

Greg Simon

But My Patients Really Are More Difficult!

Comparisons of the quality or outcomes of care across providers or facilities often meet the objection: “But my patients really are more difficult!”  If we hope to improve the quality and outcomes of mental health care, we must address that concern.  However, that automatic objection shouldn’t invalidate comparisons or excuse all variations. Instead, it should prompt careful thinking.

Several of our MHRN projects include – or even focus on – comparisons of quality or outcomes across providers, facilities, or health systems.  For example, our Practice Variation project examined how adherence and outcomes of depression treatment (medication and psychotherapy) vary across providers.  A supplement to that project focused specifically on racial and ethnic disparities in care, examining whether those disparities more likely indicated differences in patient preference or provider performance. 

Our new project, examining implementation of Zero Suicide care improvement programs, will include numerous comparisons across clinics and health systems – incorporating comparisons of how well Zero Suicide strategies were implemented as well as comparisons of changes in actual suicidal behavior. 

Another new project will support health system’s efforts to implement measurement-based care or feedback-informed care for depression.  That project aims to address the trade-off between transparency (simple comparisons using simple outcomes) and accuracy (adjusted comparisons based on statistical models). 

Each of these projects deals with the same two questions:

  1. Are any differences between providers or facilities “real”? 
  2. Are comparisons across providers and facilities “fair”? 

The first question is a quantitative one that should be answered using the right math.  The second question is more complex, and reasonable people may disagree regarding the answer.

The quantitative question divides into two pieces.  We first ask how much any observed difference between providers or facilities exceeds what might be expected by chance.  If we observe more-than-chance variation, we can then ask how much of that variation is explained by measurable pre-existing differences.  This is usually accomplished in some sort of hierarchical or random effects model, in which we estimate a random effect for each provider or facility (accounting for that provider’s or facility’s number of patients) and then adjust those random effect estimates for any differences in baseline characteristics.  For example, our MHRN Practice Variation project found that differences between physicians in patients’ early adherence to antidepressant medication (An NCQA/HEDIS indicator of quality of depression care) were actually trivial after accounting for random variation.  Most of the apparent difference between “high-performing” and “low-performing” providers was an illusion, due to the small sample sizes for providers near the high and low ends of performance.  In contrast, similar analyses found much greater “true” variation among psychotherapists in patients’ early dropout from psychotherapy.  But some of that difference was accounted for by differences in the racial/ethnic composition of each provider’s caseload. 

When we ask whether comparisons are “fair”, we are asking whether baseline or pre-existing differences really should be adjusted away.  And math cannot usually answer that question.  For example, we have reported that much of the difference between health systems in patients’ early adherence to antidepressant medication is explained by differences in patients’ race and ethnicity across systems.  We argued that unadjusted comparisons of health systems using this NCQA/HEDIS measure are not fair to systems serving higher proportions of patients from traditionally under-served racial and ethnic groups.   Others have argued against adjusting for racial/ethnic differences, claiming that adjusting away racial/ethnic disparities would excuse or condone lower-quality care for the disadvantaged.

We anticipate that questions regarding fairness and racial/ethnic disparities will arise repeatedly in our evaluation of Zero Suicide programs across MHRN health systems.  Whether fair comparison requires adjustment for race and ethnicity will depend on the specific situation.  In general, we’d be more likely to adjust comparison of outcomes and less likely to adjust comparison of care processes.  For example:  rates of suicide attempt and suicide death are markedly lower for Hispanics, African Americans, and Asians than for Native Americans and Non-Hispanic Whites.  Our MHRN health systems (and facilities within those health systems) differ markedly in the racial/ethnic distribution of patients they serve.  Any unadjusted comparison of suicide mortality or suicide attempt rates would likely tell us more about the racial/ethnic composition of patient populations than about effectiveness of suicide prevention efforts. 

A fairer comparison would be to compare each system (or facility) with its geographic area or its own historical performance.  In contrast, some care processes (such as scheduling follow-up care after an emergency department visit for self-harm) are clearly indicated regardless of race, ethnicity, income, age, gender, etc.

The rationale for this approach is: if my patients really are more difficult, it may not be fair to hold me accountable for less favorable outcomes.  But it is fair to hold me accountable for offering everyone high-quality treatment.

Greg Simon

Do You Believe An Algorithm Or Your Own Lying Eyes?

Our work on machine learning to predict suicide attempts and suicide deaths is moving rapidly to implementation. Among mental health specialty visits, we are now able to accurately identify the 5% of patients with a 5% likelihood of a treated suicide attempt in the following 90 days.  Our health system partners believe those predictions are accurate enough to be actionable.  For example, clinicians might be expected to conduct and record a structured suicide risk assessment in any visit with a predicted risk of greater than 5%.  And clinics or health systems might be expected to reach out more assertively when a high-risk patient cancels or fails to attend a mental health appointment.  With interventions like this in mind, mental health leaders at Kaiser Permanente Washington are planning to import computed risk scores into our electronic health record.  We will use those predictions to prompt systematic risk assessment and outreach in our mental health specialty clinics.  We expect that other MHRN health systems will soon follow.

As MHRN researchers have discussed this implementation with mental health system leaders, one question comes up over and over:  How should we integrate risk predictions based on machine learning models with risk predictions based on direct clinical assessment?  In many of our health systems, clinicians use responses to Item 9 of the PHQ9 (regarding thoughts of death or self-harm) to identify patients in need of additional risk assessment.  This standard work is now built into many of our electronic health records systems.

We have no doubt that machine learning algorithms outperform our current clinical assessment tools.  Among mental health specialty visits, a response of “more than half the days” or “nearly every day” to Item 9 of the PHQ9 identifies approximately 6% of visits.  That group has a 2% likelihood of treated suicide attempt in the following 90 days.  This compares to a 5% likelihood following visits with risk scores in the top 5%.  No matter where we set the threshold, machine learning risk scores are both more sensitive (i.e. miss fewer subsequent suicide attempts) and more efficient (i.e. accurately predict a large number of suicide attempts in a smaller number of people).

Nevertheless, it’s difficult to advise providers to now ignore those PHQ9 responses regarding thoughts of death or self-harm.  Our mental health leaders have invested significant time and energy in training providers to respond systematically to self-reported risk.  Reversing course now certainly could be confusing.  Our clinicians might wonder how an algorithm could outperform a face-to-face conversation.  Out in the wider world, trust in algorithms and big data is not having a good year.

I jokingly refer to this as “The Richard Pryor Problem,” after the quote sometimes attributed to the comedian Richard Pryor and sometimes to an even older comedian, Chico Marx:  “Are you going to believe me, or your own lying eyes?”  In each case, that punch line is delivered by a cheating man hoping to confuse and undermine his suspicious partner.  That makes for an uncomfortable analogy to our suggestion that clinicians believe an algorithm over direct observation.

But the analogy does not fit in one important respect.  Our suicide risk prediction models depend entirely on what clinicians have seen with their own eyes.  The strongest predictors of suicide attempts and suicide deaths include prior mental health diagnoses and mental health treatments determined by treating clinicians in traditional clinical encounters.  Our algorithms do not discover anything that treating clinicians could not see; they simply average across millions of clinical judgments to make more reliable predictions.

For now, our solution to the Richard Pryor Problem is a compromise.  When we import risk prediction scores in the electronic health records, we will advise providers to use structured risk assessment tools whenever a patient has a computed risk score in the top 5% OR directly reports frequent thoughts of death or self-harm on the PHQ9.  In other words: You should believe the algorithm, but you don’t have to ignore what your own eyes are telling you.

Greg Simon