Comparisons of the quality or outcomes of care across providers or facilities often meet the objection: “But my patients really are more difficult!” If we hope to improve the quality and outcomes of mental health care, we must address that concern. However, that automatic objection shouldn’t invalidate comparisons or excuse all variations. Instead, it should prompt careful thinking.
Several of our MHRN projects include – or even focus on – comparisons of quality or outcomes across providers, facilities, or health systems. For example, our Practice Variation project examined how adherence and outcomes of depression treatment (medication and psychotherapy) vary across providers. A supplement to that project focused specifically on racial and ethnic disparities in care, examining whether those disparities more likely indicated differences in patient preference or provider performance.
Our new project, examining implementation of Zero Suicide care improvement programs, will include numerous comparisons across clinics and health systems – incorporating comparisons of how well Zero Suicide strategies were implemented as well as comparisons of changes in actual suicidal behavior.
Another new project will support health system’s efforts to implement measurement-based care or feedback-informed care for depression. That project aims to address the trade-off between transparency (simple comparisons using simple outcomes) and accuracy (adjusted comparisons based on statistical models).
Each of these projects deals with the same two questions:
- Are any differences between providers or facilities “real”?
- Are comparisons across providers and facilities “fair”?
The first question is a quantitative one that should be answered using the right math. The second question is more complex, and reasonable people may disagree regarding the answer.
The quantitative question divides into two pieces. We first ask how much any observed difference between providers or facilities exceeds what might be expected by chance. If we observe more-than-chance variation, we can then ask how much of that variation is explained by measurable pre-existing differences. This is usually accomplished in some sort of hierarchical or random effects model, in which we estimate a random effect for each provider or facility (accounting for that provider’s or facility’s number of patients) and then adjust those random effect estimates for any differences in baseline characteristics. For example, our MHRN Practice Variation project found that differences between physicians in patients’ early adherence to antidepressant medication (An NCQA/HEDIS indicator of quality of depression care) were actually trivial after accounting for random variation. Most of the apparent difference between “high-performing” and “low-performing” providers was an illusion, due to the small sample sizes for providers near the high and low ends of performance. In contrast, similar analyses found much greater “true” variation among psychotherapists in patients’ early dropout from psychotherapy. But some of that difference was accounted for by differences in the racial/ethnic composition of each provider’s caseload.
When we ask whether comparisons are “fair”, we are asking whether baseline or pre-existing differences really should be adjusted away. And math cannot usually answer that question. For example, we have reported that much of the difference between health systems in patients’ early adherence to antidepressant medication is explained by differences in patients’ race and ethnicity across systems. We argued that unadjusted comparisons of health systems using this NCQA/HEDIS measure are not fair to systems serving higher proportions of patients from traditionally under-served racial and ethnic groups. Others have argued against adjusting for racial/ethnic differences, claiming that adjusting away racial/ethnic disparities would excuse or condone lower-quality care for the disadvantaged.
We anticipate that questions regarding fairness and racial/ethnic disparities will arise repeatedly in our evaluation of Zero Suicide programs across MHRN health systems. Whether fair comparison requires adjustment for race and ethnicity will depend on the specific situation. In general, we’d be more likely to adjust comparison of outcomes and less likely to adjust comparison of care processes. For example: rates of suicide attempt and suicide death are markedly lower for Hispanics, African Americans, and Asians than for Native Americans and Non-Hispanic Whites. Our MHRN health systems (and facilities within those health systems) differ markedly in the racial/ethnic distribution of patients they serve. Any unadjusted comparison of suicide mortality or suicide attempt rates would likely tell us more about the racial/ethnic composition of patient populations than about effectiveness of suicide prevention efforts.
A fairer comparison would be to compare each system (or facility) with its geographic area or its own historical performance. In contrast, some care processes (such as scheduling follow-up care after an emergency department visit for self-harm) are clearly indicated regardless of race, ethnicity, income, age, gender, etc.
The rationale for this approach is: if my patients really are more difficult, it may not be fair to hold me accountable for less favorable outcomes. But it is fair to hold me accountable for offering everyone high-quality treatment.
Greg Simon