Using NLP to Increase Identification of Child Maltreatment in EHR

Grant Details

Funder: NIMH (MHRN III Feasibility Pilot Program)

Grant Number: U19MH121738

Project Period: 7/1/2020 – 6/30/2021


Background: Child maltreatment is a critical public health issue and health care systems play an important role in identifying and treating children who experience maltreatment. To date, few studies of child maltreatment have used data from large health systems to try and understand how these systems identify and manage youth who experience maltreatment. Preliminary analyses of the number of children identified as having experienced child maltreatment in the most recent MHRN quarterly descriptive analyses (2018) indicate that there is likely a significant under-reporting of child maltreatment in the MHRN health systems. Epidemiologic studies suggest that many more youth would have been identified with child maltreatment. One reason for this potential under reporting is that providers may not use the ICD codes to document child maltreatment consistently. Some maltreatment may be discussed in chart notes but not documented using ICD codes. Better identification of maltreatment could aid both research and practice within health care systems. Natural Language Processing may help to identify additional youth with maltreatment. If NLP identifies cases that are not documented through ICD codes, this could indicate the need for health system efforts to develop new ways of consistently document child maltreatment. NLP might also help to identify any groups (e.g., age, gender, race/ethnicity) that may be particularly likely to have insufficient documentation of child maltreatment. 

This work aligns with NIMH’s strategies to increase research and improve outcomes of mental health services in diverse and vulnerable populations, and to conduct research that helps health systems to base care decisions on the best possible data.   

Research Question: The overarching question is does NLP allow us to obtain estimates the number of children who experience maltreatment more comparable to national epidemiologic data? Does NLP of chart notes identify new cases of child maltreatment that are not already documented with ICD codes? What is the overlap between the two methods? Are there differences by age group or race-ethnicity? Does NLP allow us to differentiate between new/current maltreatment versus history of maltreatment?

Methods: We propose to use simple NLP queries at 1 MHRN site (e.g., terms such as physical abuse, maltreatment) to search chart notes and to compare the number of cases identified through NLP and compare those to cases identified through ICD codes. We will also conduct analyses to see if there is variation in identification by age group, gender, and race/ethnicity.  

Planned Product: We plan to write a paper documenting our findings. We also plan to write a grant related to child maltreatment using this data.

  • Lead Site: KPWA (PI Rob Penfold)
  • Participating Sites:
  • KPSC (Co-I Sonya Negriff)
  • KPNW (Co-I Frances Lynch)

Current Status

  • NLP pipeline created
  • Manual adjudication of NLP “hits” complete
  • Descriptive statistics complete

Summary of Findings

The prevalence of child maltreatment as measured by adjudicated occurrences of terms and phrases discovered by NLP is much higher than when measured via discrete data elements.


None yet