Arne Beck and I were recently revising the description of one of our Mental Health Research Network projects. We really tried to use the traditional scientific format, specifying H1 (our hypothesis) and H0 (the null hypothesis). But our research just didn’t fit into that mold. We eventually gave up and just used a plain-language description of our question: For women at risk for relapse of depression in pregnancy, how do the benefits and costs of a peer coaching program compare to those of coaching from traditional clinicians?
Our real-world research often doesn’t fit into that H1-H0 format. We aim to answer practical questions of interest to patients, clinicians, and health systems. Those practical questions typically involve estimation (How much?), classification (For whom?), comparison (How much more or less?) or interaction (How much more or less for whom?). None of those are yes/no questions. While we certainly care whether any patterns or differences we observe might be due to chance, a p-value for rejecting a null hypothesis does not answer the practical questions we hope to address.
When we talk about our research with patients, clinicians, and health system leaders, we never hear questions expressed in terms of H1 or H0. I imagine starting a presentation to health system leaders with a slide showing H1 and H0—and then hearing them all break into that song from the Walt Disney classic Snow White. Out in the real world of patients, clinicians, and health system leaders, “H1-H0” most likely stands for that catchy tune sung by the Seven Dwarfs.
As someone who tries to do practical research, my dissatisfaction with the H1-H0 format is about more than just language or appearance. There are concrete reasons why that orientation doesn’t fit with the research our network does.
Pragmatic or real-world research often involves multiple outcomes and competing priorities. For example, we hope our Suicide Prevention Outreach Trial will show that vigorous outreach reduces risk of suicidal behavior. But we already know some people will be upset or offended by our outreach. Our task is to accurately estimate and report both the beneficial effects on risk of suicidal behavior and the proportion of people who object or feel harmed. We may have opinions about the relative importance of those competing effects, but different people will value those outcomes differently. It’s not a yes/no question with a single answer. Our research aims to answer questions about “How much?”. Each user of our research has to consider “How important to me or the people I serve?”
The H1-H0 approach becomes even less helpful as sample sizes grow very large. For example, our work on prediction of suicide risk used records data for millions of patients. With a sample that large, we can resoundingly reject hundreds of null hypotheses while learning nothing that’s useful to patients or clinicians. In fact, our analytic methods are designed to “shrink” or suppress most of those “statistically significant” findings. We hope to create a useful tool that can reliably and accurately identify people at highest risk of self-harm. For that task, too many “significant” p-values are really just a distraction.
I’m very fond of how the Patient-Centered Outcomes Research Institute’s describes the central questions of patient-centered research: What are my options? What are the potential benefits and harms of those options? Those patient-centered questions focus on choices that are actually available and the range of outcomes (both positive and negative) that people care about. I think those are the questions our research should aim to answer, and they can’t be reduced to H1-H0 terms.
I am certainly not arguing that real-world research should be less rigorous than traditional hypothesis-testing or “explanatory” research. Pragmatic or real-world researchers are absolutely obligated to clearly specify research questions, outcome measures, and analytic methods—and declare or register those things before the research starts. Presentations of results should stick to what was registered, and include clear presentations of confidence limits or precision. In my book, that’s actually more rigorous than just reporting a p-value for rejecting H0. There’s no conflict between being rigorous and being practical or relevant.
To be fair, I must admit that the Seven Dwarfs song is more real-world than I gave it credit for. We often misremember the lyrics “Hi-Ho, Hi-Ho, it’s off to work we go.” But the Seven Dwarfs were actually singing “Hi-Ho, Hi-Ho, it’s home from work we go.” That’s the sort of person-centered song we might actually hear in the real world.