Gold Standard or Golden Calf?

The Adoration of the Golden Calf by Nicolas Poussin, produced between 1633 and 1634

Most of our measures and measurement tools were created in conference rooms or conference calls dominated by older white men – like me, I guess.  Over time, those “expert opinion” measures acquire a patina of authority.  As time passes, we can start to equate familiarity or habit with accuracy or validity.  Like the biblical story about worshipping a false idol that we created ourselves, we start to see a gold standard rather than just a statue of a golden calf. 

Our experiences with NCQA/HEDIS measures regarding antidepressant medication adherence illustrate the tendency to over-value the familiar.  Development of those measures in the 1990s reflected increasing awareness of the prevalence and negative consequences of early dropout from antidepressant treatment.  Availability of a simple, transparent, and widely accepted quality measure helped to build the momentum for implementation of effective collaborative care programs.   But time has revealed some significant concerns.  Increasing use of 90- or 100-day-long prescriptions now leads to over-estimating early adherence and over-estimating improvements in depression care.  And we now appreciate that early dropout from antidepressant treatment is more common in traditionally under-served racial and ethnic groups.  Failing to account for that potential bias disadvantages health systems caring for a larger proportion of disadvantaged patients.  That’s certainly a perverse incentive.  We didn’t anticipate either of those issues in the 1990s.

When we have questioned some specifics of the Antidepressant Medication Management measures, we sometimes hear that these measures are well-validated and time-tested.  I’m tempted to respond, “But I was one of the people who made them up in the first place!  And now I know better!”

We could tell a similar story about nearly any of our measures or metrics in mental health:  diagnostic criteria, assessment questionnaires, structured interviews, quality metrics.  We created them using everything we knew at the time.  And (Good news!) now we know more.

We shouldn’t forget the value of continuity in measuring the quality or outcomes of mental health care.  Tracking improvement in care processes over time is a defining characteristic of learning health systems, and changing measures can interfere with our ability to accurately measure improvement.  But rather than hold on to measures based on what we knew back then, our improvement efforts might be better served by using improved new measures.  When possible, we should apply new standards to old data when comparing our past and current performance.

While I try to maintain a healthy skepticism about any specific measure, I am not at all skeptical about the importance of measurement.  Challenging the specifics of any measure should be intended to improve accountability rather than avoiding it.  I firmly believe that we can only improve what we choose to measure.  And our measures should be one of the first things we aim to improve.

Greg Simon

Leave a Reply

Your email address will not be published. Required fields are marked *