Project Name: Computational Modeling of Suicide Risk |
Principal Investigator: Gregory Simon MD MPH |
Principal Investigator Contact Information: gregory.e.simon@kp.org |
Principal Investigator Institution: KP Washington Health Research Institute |
Funder: National Institute of Mental Health (NIMH) |
Funding Period: 7/1/2017 to 6/30/2019 |
Abstract: A previous supplement to the MHRN cooperative agreement supported development of a population-based suicide risk calculator, predicting risk of suicide attempt or suicide death following an outpatient visit using both responses to PHQ9 item 9 and discrete data extracted from health system electronic health records. Using a database of approximately 20 million visits by 3 million patients aged 13 and over, we developed and validated machine learning logistic regression models predicting risk of suicide attempt and suicide death within 90 days of an outpatient visit – either a visit to a specialty mental provider or a primary care visit in which a mental health diagnosis was recorded. Potential predictors included demographic and clinical data extracted from health system records for the 5 years prior to each visit: prior suicidal behavior, mental health and substance use diagnoses, general medical diagnoses, prescriptions for psychiatric medications, inpatient or emergency department mental health care, and responses to routinely administered PHQ9 depression questionnaires. Models were developed in a 65% random sample of visits and validated in the remaining 35%. Variable selection models considered 150 discrete predictors and 164 potential interactions. In the validation sample, the 5% of mental health specialty visits with highest risk scores accounted for 43% of subsequent suicide attempts and 47% of suicide deaths. Areas under the receiver operating characteristic curves (AUCs) for prediction of suicide attempt and suicide death were 0.85 and 0.86. In the validation sample, the 5% of primary care visits with highest risk scores accounted for 48% of subsequent suicide attempts and 43% of suicide deaths. AUCs for prediction of suicide attempt and suicide death were 0.85 and 0.83. While these models represent a substantial advance over existing risk prediction or risk stratification tools, we identify several significant limitations. Fixed limits of our computational methods (penalized LASSO logistic regression in the R computing environment) forced us to limit both our sample size and the number of potential predictors and interaction terms. Those methods also limit ability to appropriate account for clustering of observations within patients and account for the sparse and skewed distributions of predictor data. Finally, we now recognize the need to extend these methods to predict risk following acute-care (inpatient and emergency department) encounters. We now propose a next stage of work to address these limitations. Specific aims of this next stage include: Expand and enhance the risk prediction dataset to: include larger numbers of observations with data regarding self-reported suicidal ideation (PHQ9 Item 9), include additional encounters and events following the transition from ICD9 to ICD10 diagnoses, and allow more detailed consideration of the timing of predictor events (diagnoses, encounters, prescription fills)Expand sampling to include emergency department and inpatient encounters. Evaluate alternative modeling approaches, including classification- or tree-based approaches such as Classification and Regression Trees (CART), Mixed Effects Regression Trees (MERT), and Random Forest. Rapidly disseminate all methods, tools and results to a wide range of stakeholders including health systems, researchers, and EHR vendors. |
Grant Number: MH 092201 (supplement) |
Participating Sites: Kaiser Permanente Washington Kaiser Permanente Northwest Kaiser Permanente Southern California Kaiser Permanente Hawaii HealthPartners Henry Ford Health System: Kaiser Permanente Colorado |
Investigators: Gregory Simon MD MPH Susan Shortreed PhD Yates Coley PhD Frances Lynch PhD Jean Lawrence ScD Beth Waitzfelder PhD Rebecca Rossom MD MS Brian Ahmedani PhD Arne Beck PhD |
Major GoalsExpand and enhance the risk prediction dataset to: include larger numbers of observations with data regarding self-reported suicidal ideation (PHQ9 Item 9), include additional encounters and events following the transition from ICD9 to ICD10 diagnoses, and allow more detailed consideration of the timing of predictor events (diagnoses, encounters, prescription fills)Expand sampling to include emergency department and inpatient encounters. Evaluate alternative modeling approaches, including classification- or tree-based approaches such as Classification and Regression Trees (CART), Mixed Effects Regression Trees (MERT), and Random Forest. Rapidly disseminate all methods, tools and results to a wide range of stakeholders including health systems, researchers, and EHR vendors. |
Description of study sample: Expected to include approximately 30 million encounters by approximately 4 million members in seven health systems. |
Current Status: As of 7/1/2019: Data harvest, data quality control, and preliminary analyses are complete for the mental health and primary care visit cohorts. Data harvest and data quality control are underway for inpatient and emergency department cohorts. |
Study Registration: N/A |
Publications: N/A |
Resources: N/A |
Lessons Learned: Preliminary analyses indicate that: Models developed prior to October 2015 to predict ICD-9 self-harm diagnoses appear to perform as well when used after October 2015 to predict ICD-10 self-harm diagnoses. More complex ensemble-based model development methods (such as Random Forests) do not appear superior to parametric (such as penalized logistic) methods when predictors are primarily dichotomous. Inclusion of multiple visits per patient does not appear to contribute to over-fitting with parametric model development methods. |
What’s next? N/A |