Return of the Repressed

Last month I moved to a temporary office to make way for painting and installing new carpet.  Being forced to pack up every book and file folder was actually a good thing.  I recycled stacks and stacks of paper journals.  Saving those was a habit left over from the last century.  I cleaned out file drawers full of old paper grant proposals and IRB applications, relics from the days before electronic submission.  I purged stacks and stacks of paper reprints (remember those?) from the 1990s.  I recycled books I hadn’t opened in 15 years.  Tidying up did feel good, but my purging frenzy paused when I came to this stack of books from my residency days.

Those old books were a greatest hits collection from some of the legends of psychoanalysis and psychoanalytic psychotherapy.  Some of the authors were my own teachers (Jerry Adler, Terry Maltsberger).  Some were the heroes of my teachers (Heinz Kohut, Michael Balint, Otto Kernberg, Elvin Semrad).  Thirty years ago, those books were central to my thinking, or even my identity.  But I hadn’t opened any of them since my last office move in 1999.

I paged through a few of them, trying to remember what had been so valuable back then.  From my current perspective, I can’t say that psychoanalysis, or even psychoanalytic psychotherapy as practiced in 1980s Boston, can really succeed as a public mental health strategy.  As a care delivery model, psychoanalysis scores poorly on all three of my “three As” tests.  It’s not really affordable.  It’s unlikely to ever be widely available.  And it’s not always acceptable to the diverse range of people who live with mental health conditions.

But there is still some valuable wisdom in those old books.  Studying Kohut certainly helped me to understand how a fragile sense of self disrupts emotion regulation and how persistent empathy can help to rebuild self-regulating capacity.  And studying Kernberg helped me to understand the nonlinear relationship between the internal mental world and external reality.  The persistent outreach and care management interventions that we now develop and evaluate do owe a significant debt to those psychoanalytic legends – and most especially to Kohut and Semrad.

So my psychoanalytic books survived this office move.  To borrow some psychoanalytic terms, they continue to serve as healthy introjects.  I’ll keep holding on to them as transitional objects.

Greg Simon

Machine learning and Clever Hans, the Calculating Horse

Clever Hans, the Calculating Horse, was a sensation of the early 1900s.  He appeared to be able to count, spell, and solve math problems – including fractions!  Only after careful investigation did everyone learn that Hans was just responding to unconscious nonverbal cues from his trainer.  Hans couldn’t actually calculate, but he could sense the answer his trainer was hoping for. 

But even if Hans could actually do arithmetic, a true calculating horse would still be no more than a novelty act.  Even in the early 1900s, we had much more accurate tools for calculating.  And we had (and still have) much more impressive and useful work for horses.

Some applications of machine learning in mental health seem (at least to me) similar to a performance by Clever Hans.  They may involve “predicting” something we already believe and want to have confirmed.  Or they may develop and apply a complex tool when existing simple tools are more than adequate.  We can use big data and neural networks to mine social media posts or search engine queries and discover that depression increases in the winter.  But that’s not news – at least for those of us living in Seattle in November. 

As with calculating horses, we also have more impressive and useful work for machine learning tools to accomplish.  I think that our work on prediction of suicidal behavior following outpatient mental health visits is a good example.  Traditional clinical assessments are hardly better than chance.  Simple questionnaires are certainly helpful, but still lack in sensitivity and ability to identify very high risk.  There’s clearly an unmet need.  Prediction models built from hundreds of candidate predictors can accurately identify patients who are at high or very high near-term risk for suicidal behavior.  Our health systems have decided that those predictions are accurate enough to prompt specific clinical actions.

I think we could list some general characteristics of problems suitable for machine learning tools.  In practical terms, we are interested in problems that are both difficult (i.e. we can’t predict well now) and important (i.e. we can do something useful with the prediction).  In technical terms, prediction models will probably be most useful when the outcome is relatively rare, the true predictors are diverse, and the available data include many potential indicators of those true predictors.  In those situations, complex models will usually outperform simple decision rules fit for human (or equine) calculation.

Horses are capable of many marvelous things.  I’d pay to see performances by racing horses or jumping horses.  And especially mini-horses.   But counting horses or spelling horses….not so much.

Greg Simon

Gold Standard or Golden Calf?

The Adoration of the Golden Calf by Nicolas Poussin, produced between 1633 and 1634

Most of our measures and measurement tools were created in conference rooms or conference calls dominated by older white men – like me, I guess.  Over time, those “expert opinion” measures acquire a patina of authority.  As time passes, we can start to equate familiarity or habit with accuracy or validity.  Like the biblical story about worshipping a false idol that we created ourselves, we start to see a gold standard rather than just a statue of a golden calf. 

Our experiences with NCQA/HEDIS measures regarding antidepressant medication adherence illustrate the tendency to over-value the familiar.  Development of those measures in the 1990s reflected increasing awareness of the prevalence and negative consequences of early dropout from antidepressant treatment.  Availability of a simple, transparent, and widely accepted quality measure helped to build the momentum for implementation of effective collaborative care programs.   But time has revealed some significant concerns.  Increasing use of 90- or 100-day-long prescriptions now leads to over-estimating early adherence and over-estimating improvements in depression care.  And we now appreciate that early dropout from antidepressant treatment is more common in traditionally under-served racial and ethnic groups.  Failing to account for that potential bias disadvantages health systems caring for a larger proportion of disadvantaged patients.  That’s certainly a perverse incentive.  We didn’t anticipate either of those issues in the 1990s.

When we have questioned some specifics of the Antidepressant Medication Management measures, we sometimes hear that these measures are well-validated and time-tested.  I’m tempted to respond, “But I was one of the people who made them up in the first place!  And now I know better!”

We could tell a similar story about nearly any of our measures or metrics in mental health:  diagnostic criteria, assessment questionnaires, structured interviews, quality metrics.  We created them using everything we knew at the time.  And (Good news!) now we know more.

We shouldn’t forget the value of continuity in measuring the quality or outcomes of mental health care.  Tracking improvement in care processes over time is a defining characteristic of learning health systems, and changing measures can interfere with our ability to accurately measure improvement.  But rather than hold on to measures based on what we knew back then, our improvement efforts might be better served by using improved new measures.  When possible, we should apply new standards to old data when comparing our past and current performance.

While I try to maintain a healthy skepticism about any specific measure, I am not at all skeptical about the importance of measurement.  Challenging the specifics of any measure should be intended to improve accountability rather than avoiding it.  I firmly believe that we can only improve what we choose to measure.  And our measures should be one of the first things we aim to improve.

Greg Simon

What’s so Funny About Dimensionality Reduction?

My wife handed me a recent issue of The New Yorker and recommended the Shouts and Murmurs column.  It parodied a whistle-blowing data scientist testifying before Parliament about modern analytic methods.  He grows increasingly frustrated as legislators can’t follow his explanations of eigenvectors and dimensionality reduction.

At first read, I didn’t think it was very funny.  Then I realized:  If you don’t think Shouts and Murmurs is very funny, then it’s probably about you.

Much of our work is dimensionality reduction, even if we don’t call it that.  Models to predict suicidal behavior or outcomes of depression treatment are all about reducing tens or hundreds of characteristics to a single probability.  Old-fashioned regression models are also a tool for dimensionality reduction; they just typically consider a smaller number of dimensions.  Moving from the statistical to the clinical, diagnoses are also dimensionality reducers.  For example, DSM criteria for the diagnosis of a depressive disorder take nine diverse characteristics and summarize them as a single classification.  Going farther back in our psychological history, Sigmund Freud’s The Interpretation of Dreams was all about reducing dimensionality – explaining the wild diversity of human mental life in terms of a few basic instincts and countervailing defenses.

So it should not surprise us that objections to modern statistical dimensionality reduction echo older objections to diagnostic or psychoanalytic dimensionality reduction.  Human beings are not one-dimensional.  Or even two-dimensional.  Our wide experience of joy, pain, hope, loneliness, passion, fear, and tenderness just cannot be contained in a single mathematical model or even a few instinctual drives.

Still, I put more stock in statistical dimensionality reduction than diagnostic or psychoanalytic dimensionality reduction.  To generalize, I’m skeptical about any reductionism determined by human “experts”.  When we humans try to simplify complex reality, we too often over-reach.  Statistical dimensionality reduction tends to be more realistic, or even humble.  Our mathematical models only claim to explain or predict a single dependent variable.  A statistical model to predict likelihood of psychiatric hospitalization makes no claims to predict success in relationships or finding meaning in life.  Predicting risk of hospitalization is practically useful – in a one-dimensional way.  Let’s leave it at that.

You may be about to ask, “Then what is an eigenvector?”  I’ll pass on that one.

Greg Simon

Can you see me now?

As our health systems prepare to implement statistical models predicting risk of suicidal behavior, we’ve certainly heard concerns about how that information could be misused.  Well-intentioned outreach programs could stray into being intrusive or even coercive.  Information about risk of self-harm or overdose could be misused in decisions about employment or life insurance coverage.  But some concerns about predictions from health records are unrelated to specific consequences like inappropriate disclosure or intrusive outreach.  Instead, there is a fundamental concern about being observed or exposed.  When we reach out to people based on risk information in medical records, some are grateful for our concern, and some are offended that we know about such sensitive things.  And some people have both reactions at the same time.  The response is along the lines of “I appreciate that you care, but it’s wrong that you know.”  It’s being observed or known that’s the problem, even if nothing is ever said or done about it.
 
Our instinctual ambivalence about being known or observed was not created by big data or artificial intelligence.  In fact, that ambivalence is central to the oldest story that many of us have ever heard:  Eve and Adam’s fall from being caringly understood to being shamefully exposed.  Both the upside and the downside of being observed are also central to many of our iconic modern stories.  The benevolent portrayal of continuous observation and risk prediction is Clarence, the bumbling guardian angel who interrupts George Bailey’s suicide attempt in It’s a Wonderful Life.  The contrasting malevolent portrayal of observation and behavioral prediction includes Big Brother in George Orwell’s 1984 and the “precogs” in the 2002 film Minority Report (based on a science fiction short story from the 1950s).

Big data and artificial intelligence do, however, introduce a new paradox: The one who observes us and predicts our behavior is actually just a machine.  And that fact could make us feel better or worse.  An algorithm to predict suicide attempt or opioid overdose might consider all sorts of sensitive information and stigmatized behaviors to identify people at increased risk.  But the machine applying that algorithm would not need to reveal, or even retain, any of that sensitive information.  The predicting machine would only need to alert providers that a specific patient is at elevated risk at a specific time.  So we might feel reassured that predictive analytics can improve quality and safety while protecting privacy.  But the reassurance that “it’s just a machine” may not leave us feeling safe or cared for.  Ironically, we may be more likely to welcome the caring attention of Clarence the guardian angel exactly because he is very human – prone to error and unable to keep a secret.

Privacy is complicated, and we often have conflicting feelings about being known or understood.  Moreover, our feelings may not match with an “objective” assessment of exposure or risk.  But feelings about privacy matter, especially when we hope to maintain the trust of members our health systems serve.  We are in for some interesting conversations. 

MHRN Blog World Cup Edition: What Soccer Referees Know about Causal Inference

When Nico Lodeiro falls down in the penalty area, I hold my breath waiting for the referee’s call.  Was it really a foul – or just Nico simulating a foul?  The stakes are high.  If the referee calls it a foul, it’s a penalty kick and likely goal for my Seattle Sounders.  If she calls it a simulation or a dive, then it’s a yellow card warning for Nico.  After the call, we all watch the slow-motion replay.  I used to be surprised at how often the refs got it right until a referee friend of mine explained what the refs are looking for.

It goes back to Newton’s first law from high school physics: a body in motion will remain in motion unless acted on by an external force.  If Nico was actually shoved, then his speed or direction will change.  If instead he just threw himself to the ground, then his overall momentum won’t change.  If he throws his shoulders in one direction to simulate a foul, then his arms or his hips have to move in the other direction.  With no external force, his center of gravity continues on the same path.  That’s how the experienced referee distinguishes a foul from a dive.

When we use health records data to infer causation (e.g. Did a specific treatment make things better or worse?), we also depend on Newton’s first law.  We believe that an individual’s mental health path will maintain the same momentum unless it’s acted on by some external force.  That force could be a positive or negative effect of treatment or some positive or negative life event.  We try to predict an individual’s mental health path or trajectory so we can detect a change in speed or direction.

But mental health is more complicated than high school physics.  Any individual’s center of gravity is harder to pin down.  Our measures of mental health speed and direction are much less accurate.  Mental health paths are often not straight lines.  And many forces are often acting at the same time.  So we would be foolish to infer causation from a single case.  For any individual, our calls about causal inference will never be as accurate as a good soccer referee.  But we hope to make better calls by improving the accuracy of our measurement and by aggregating data across many, many cases.  

We’ve been surprised by some of our recent efforts to predict mental health paths or trajectories.  Risk of suicide attempt or suicide death after a mental health visit is surprisingly predictable.  So we expect to be able to detect the effects of external forces – like positive or negative effects of treatments – on risk of suicide attempt.  In contrast, probability of significant improvement after starting psychotherapy for depression is surprisingly unpredictable.  If we hope to detect effects of more or less effective psychotherapy, we’ll have to first make better predictions.  Going back to the soccer story, we would need a clearer view of exactly how Nico was moving before he fell.

Greg Simon

Suicide Risk Prediction Models: I Like the Warning Light, but I’ll Keep My Hands on the Wheel

Our recent paper about predicting risk of suicidal behavior following mental health visits prompted questions from clinicians and health system leaders regarding practical utility of risk predictions.  Our clinical partners asked, “Are machine learning algorithms accurate enough to replace clinicians’ judgment?”  My answer was, “No, but they are accurate enough to direct clinicians’ attention.”

Our 12-year old Subaru is now passed down to my son, and we bought a 2018 model.  The new car has a “blind spot” warning system built in.  When it senses another vehicle in my likely blind spot, a warning light appears on the outside rearview mirror.  If I start to merge in that direction anyway, the light starts blinking, and a warning bell chimes.  I like this feature a lot.  Even if I already know about the other car behind and to my right, the warning light isn’t too annoying or distracting.  The warning light may go on when there isn’t a car in my blind spot – a false positive.  Or it may not light up when there is a car in that dangerous area – a false negative.  But the warning system doesn’t fall asleep or get distracted by something up ahead.  It keeps its “eye” on one thing, all the time.

I hope that machine learning-based suicide risk prediction scores could work the same way.  A high-risk score would alert me and my clinical colleagues to a potentially unsafe situation that we might not have noticed.  If we’re already aware of the risk, the extra notice doesn’t hurt.  If we’re not aware of risk, then we’ll be prompted to pay attention and ask some additional questions.  There will be false positives and false negatives.  But risk prediction scores don’t get distracted or forget relevant facts.

We are clear that suicide risk prediction models are not the mental health equivalent of a driverless car.  We anticipate that alerts or warnings based on calculated risk scores would always be delivered to actual humans.  Those humans might include receptionists who would notice when a high-risk patient cancels an appointment, nurses who would call when a high-risk patient fails to attend a scheduled visit, or treating clinicians who would be alerted before a visit that a more detailed clinical assessment is probably indicated.  A calculated risk score alone would not be enough to prompt any “automated” clinical action – especially a clinical action that might be intrusive or coercive.  I hope and expect that our risk prediction methods will improve.  But I am skeptical they will ever replace a human driver.

My new car’s blind spot warning system does not have control of the steering wheel, the accelerator, or the brake pedal.  If the warning is a false positive, I can choose to ignore or override it.  But that blinking light and warning chime will get my attention.  If that alarm goes off, I’ll take a close look behind me before deciding that it’s actually safe to change lanes.

Greg Simon

Why I’ll Join the All of Us Research Program

NIH’s All of Us Research Program officially launched on Sunday, May 6th. It’s an ambitious national effort to bring together at least one million people from across the U.S. in a long-term study of health across the lifespan.  All of Us is not just a biobank or a genetic study. It’s a 360-degree view of health and disease, with just as much attention to environment as genetics and just as much attention to resilience as to vulnerability.

Our Mental Health Research Network has several connections to the All of Us program.  Henry Ford Health System and Baylor Scott & White Health are participating as health system partners.  Brian Ahmedani co-leads one of the health system networks.  I serve on the external advisory panel.

All of Us has been building toward this launch for more than a year.  A wide range of patient and community organizations have participated in every stage of planning.  Innovative communication tools were developed to make sure that informed consent is truly informed (and not just “a form”). Online surveys and mobile apps have been tested and refined to make them both accessible and secure. The pilot phase enrolled over 27,000 participants, and over two-thirds of them were from groups traditionally under-represented in biomedical research.

With all of that preparation complete, it’s an unfortunate coincidence that All of Us launches following a series of news stories about threats to privacy, including harvesting of Facebook data and law enforcement use genealogy websites. Of course, none of those events are directly relevant to participating in All of Us.   

This last year of preparation has focused on avoiding events like those.  That time was spent on extensive testing of information security, learning to communicate clearly and effectively about data privacy, and listening to potential participants about appropriate and inappropriate uses of their health and genetic information. Participants’ private information has special legal protection through a Certificate of Confidentiality. Leaders of the program are not naïve, so they are very thorough.

Knowing what I know, those recent news stories won’t deter me from joining the All of Us program as soon as I can. I will complete the baseline survey and share my medical records. I’ll contribute my blood samples when there’s a collection site close to me, and I’ll download the All of Us mobile app. I may not directly gain by participating, but I’m confident I have nothing to lose. I might actually have something useful to share, but no one will know unless I share it with All of Us.

Greg Simon

Advice To Young Researchers: Don’t Find Your Niche

University-based researchers often contact our network, hoping to do research in our health systems to evaluate new interventions or programs.  Those new interventions or programs are usually specific adaptations of treatments already proven to work.   Typical examples (slightly anonymized) include:  a care management program for people with depression and rheumatoid arthritis, a mobile health intervention teaching mindfulness skills after psychiatric hospital discharge, or a training program to help clinicians provide structured psychotherapy specific to bereavement.

When we bring these very specific ideas to leaders of our healthcare systems, they are not enthusiastic.  While they are very interested in care management for depression, they are less interested in a program specific to depression and arthritis.  They are certainly interested in mobile health programs and in mindfulness skills, but less interested in a program limited to the month after psychiatric hospitalization.  And they are definitely interested in training to deliver empirically supported psychotherapies, but less interested in condition-specific training packages. To summarize: they are interested in innovation, but only if that innovation can be affordable, widely acceptable, and broadly available across our health systems.

Researchers focused on narrower questions are following the standard advice to all young mental health or health services researchers:  Find your niche.

The traditional academic career path certainly encourages specialization.  During graduate school, an aspiring researcher might join a research team studying collaborative care interventions for depression in people with chronic medical illness.  During her postdoctoral fellowship, she works in the medical center’s Rheumatology clinic and customizes collaborative care materials for people with arthritis.  She writes a career development application to refine and pilot-test collaborative care interventions for people with depression and arthritis.  NIH reviewers point out that different types of arthritis are too heterogeneous, so she revises her approach to focus on rheumatoid arthritis specifically.  And then she contacts us about doing this research in MHRN health systems.  She has found her niche…. But leaders of our health systems are just not interested.

If you look at the career paths of our MHRN investigators, you’d think that none of us have found a niche.  You’d find a health psychologist who began studying school-based health promotion and now studies implementation and health disparities.  And you’d find a social worker who leads a large precision medicine consortium.  And a geriatric psychiatrist studying EHR decision support and psychosocial interventions to prevent suicide attempt.  Over time, they have traveled across diverse clinical topics, research methods, and care settings.  I expect they will keep traveling.  They are opportunists in the most positive sense of the word, always asking “What’s the most important problem that I could actually help with in the next few years?”

Maybe our niche is just to be as useful as we can.  Not all who wander are lost. And sometimes they find useful things.

Greg Simon

When does caring cross over into creepy?

A recent news article about the European Union’s new privacy rules prompted me to think more about population-based suicide prevention programs. Caring outreach that respects privacy is a difficult balance.

Our health systems are in various stages of implementing systematic programs to identify people at high risk of self-harm or suicide.  These programs are triggered by the standard depression questionnaires our patients complete before clinic visits. Whenever a patient reports frequent thoughts of death or self-harm, the treating clinician is expected to ask follow-up questions about suicide risk. That seems only natural; most patients who report thoughts of suicide would be unpleasantly surprised if no one bothered to ask. 

Our Suicide Prevention Outreach Trial extends that caring outreach, but with an added layer of separation.  Outreach specialists follow up on those visit questionnaires with an online message or phone call during the following week.  Most people are grateful for that extra outreach, but a few respond, “Who are you? How do you know about what I told my doctor in private?” 

Our health systems are considering new programs with one more level of separation.  Outreach messages or calls could be triggered by an algorithm that identifies risk from health records data. In other words, the program could be triggered by, “something a computer found that I never even told my doctor”.  The outreach calls or messages would come from a stranger (albeit a kind stranger) representing the health system. I suspect we’ll hear a few more complaints about invasions of privacy.  But should that stop us from trying?

Of course, serving up personalized invitations based on machine learning algorithms is the core business model of social media companies. While invitations from social media apps usually ask us to buy products and services, social media superpowers can also be used for good.  For example, Facebook (with help from our colleague Ursula Whiteside) has developed caring outreach interventions for people at risk for suicide. Facebook’s caring outreach interventions were originally directed at people identified by other human Facebook users.  That program can now be activated by algorithms that continuously monitor a range of data types: text or photos in Facebook posts, comments from friends, and even Facebook Live audio and video streams

Except in Europe. European data protection rules strictly limit how personal data can be used or shared without explicit consent. Facebook worries that mining social media data to identify people at risk for self-harm would violate those rules. So Facebook’s algorithm-driven suicide prevention outreach won’t be implemented for European users. I think that’s unfortunate. But I have to acknowledge the legitimate view that it’s intrusive or creepy to mine social media data and flag people at risk for self-harm.

Whether outreach is supportive or intrusive depends, of course, on what we do when we reach out. In our outreach programs, we are clear about the boundaries. We reach out to express concern, offer support, suggest resources, and facilitate connection with care. If people ask to be left alone, we stop.  

Nevertheless, some people are bothered by the fact that strangers are drawing conclusions based on information that’s usually private – especially conclusions about things (like suicidal thoughts) that are often stigmatized.  Some people will be offended that we know (or think we know) they are at risk, regardless of what we do or don’t do about it.

If you have followed the discussion about European privacy regulations, you may recognize one rallying cry of the privacy advocates: “The right to be forgotten.” That slogan asserts a right to control – or even erase –  any private information.  But that slogan feels a bit eerie when we apply it to suicide risk. Being forgotten sounds too much like the isolation or disconnectedness that increase risk for suicide. When it comes to suicide prevention, I’d prefer not to acknowledge that absolute European right to be forgotten. I’ll remain a perky (and possibly creepy) American, saying, “Excuse us, but we won’t forget about you!”

Greg Simon