An evaluation framework for ambient digital scribing tools in clinical applications
npj Digital Medicine
Haoyuan Wang, Rui Yang, Mahmoud Alwakeel, Ankit Kayastha, Anand Chowdhury, Joshua M. Biro, Anthony D. Sorrentino, Jessica L. Handley, Sarah Hantzmon, Sophia Bessias, Nicoleta J. Economou-Zavlanos, Armando Bedoya, Monica Agrawal, Raj M. Ratwani, Eric G. Poon, Michael J. Pencina, Kathryn I. Pollak & Chuan Hong

Summary
Ambient digital scribing (ADS) tools alleviate clinician documentation burden, reducing burnout and enhancing efficiency. As AI-driven ADS tools integrate into clinical workflows, robust governance is essential for ethical and secure deployment. This study proposes a comprehensive ADS evaluation framework incorporating human evaluation, automated metrics, simulation testing, and large language models (LLMs) as evaluators. Our framework assesses transcription, diarization, and medical note generation across criteria such as fluency, completeness, and factuality. To demonstrate its effectiveness, we developed an ADS tool and applied our framework to evaluate the tool’s performance on 40 real clinical visit recordings. Our evaluation revealed strengths, such as fluency and clarity, but also highlighted weaknesses in factual accuracy and the ability to capture new medications. These findings underscore the value of structured ADS evaluation in improving healthcare delivery while emphasizing the need for strong governance to ensure safe, ethical integration.
Citation
Wang, Haoyuan, et al. “An evaluation framework for ambient digital scribing tools in clinical applications.” npj Digital Medicine 8.1 (2025): 1-13.