An evaluation framework for ambient digital scribing tools in clinical applications

npj Digital Medicine

Haoyuan Wang, Rui Yang, Mahmoud Alwakeel, Ankit Kayastha, Anand Chowdhury, Joshua M. Biro, Anthony D. Sorrentino, Jessica L. Handley, Sarah Hantzmon, Sophia Bessias, Nicoleta J. Economou-Zavlanos, Armando Bedoya, Monica Agrawal, Raj M. Ratwani, Eric G. Poon, Michael J. Pencina, Kathryn I. Pollak & Chuan Hong

The radar plot presents the scores of GPT-generated notes evaluated by three different assessors: Human evaluators (green), LLM evaluators (red), and a trained auto evaluator (blue).

Summary

Ambient digital scribing (ADS) tools alleviate clinician documentation burden, reducing burnout and enhancing efficiency. As AI-driven ADS tools integrate into clinical workflows, robust governance is essential for ethical and secure deployment. This study proposes a comprehensive ADS evaluation framework incorporating human evaluation, automated metrics, simulation testing, and large language models (LLMs) as evaluators. Our framework assesses transcription, diarization, and medical note generation across criteria such as fluency, completeness, and factuality. To demonstrate its effectiveness, we developed an ADS tool and applied our framework to evaluate the tool’s performance on 40 real clinical visit recordings. Our evaluation revealed strengths, such as fluency and clarity, but also highlighted weaknesses in factual accuracy and the ability to capture new medications. These findings underscore the value of structured ADS evaluation in improving healthcare delivery while emphasizing the need for strong governance to ensure safe, ethical integration.

Citation

Wang, Haoyuan, et al. “An evaluation framework for ambient digital scribing tools in clinical applications.” npj Digital Medicine 8.1 (2025): 1-13.

BibTex

@article{wang2025evaluation, title={An evaluation framework for ambient digital scribing tools in clinical applications}, author={Wang, Haoyuan and Yang, Rui and Alwakeel, Mahmoud and Kayastha, Ankit and Chowdhury, Anand and Biro, Joshua M and Sorrentino, Anthony D and Handley, Jessica L and Hantzmon, Sarah and Bessias, Sophia and others}, journal={npj Digital Medicine}, volume={8}, number={1}, pages={1–13}, year={2025}, publisher={Nature Publishing Group} }

Collaborators:

Monica Agrawal

An evaluation framework for ambient digital scribing tools in clinical applications

Summary

Citation

BibTex

Collaborators:

Referenced Research:

Related Publications

In the News: