Context-Aware Filtering of Unstructured Radiology Reports by Anatomical Region
Machine Learning for Health
Zakk Heile, Pranav Manjunath, Brian Lerner, Samuel Berchuck, Monica Agrawal, Timothy W DUNN

Summary
Radiology reports contain essential clinical information but often remain in unstructured, free-text formats. Notably, multiple imaging examinations performed simultaneously (such as CT head, facial bones, and cervical spine in trauma cases) may be bundled into a single report that consolidates findings from all studies into one free-text document, written jointly. Because individual sentences may reference ambiguous or overlapping anatomy (e.g., “there is a fracture”), sentence-level anatomic classification—filtering a report to retain only findings relevant to a specific anatomical region—is essential for downstream tasks such as structured label extraction and for creating clean, bijective training data for radiology report generation models. While formatting differs across reports, the clinical language remains precise. Using that fact, we develop context-aware classical models with feature engineering that surpass trained neural networks and pre-trained language models. We show that the learned model weights generalize effectively to MIMIC-IV radiology reports and that our approach achieves near-optimal performance with only a small amount of labeled training data. Together, these results make our approach practical and reproducible for new settings.
Citation
Heile, Zakk, et al. “Context-Aware Filtering of Unstructured Radiology Reports by Anatomical Region.” Machine Learning for Health 2025. 2025.
