The challenges of integrating diverse data sources: A case study in major depression

Health Services and Outcomes Research Methodology

Carly L. Brantner, Wenshan Yu, Congwen Zhao, Kyungeun Jeon, Grace V. Ringlein, Qiao Wang, Elaona Lemoto, Trang Quynh Nguyen, Jane P. Gagliardi, Peter P. Zandi, Benjamin A. Goldstein, Elizabeth A. Stuart & Hwanhee Hong

Cover of publication

Summary

Combining data from diverse sources including randomized controlled trials (RCTs) and observational datasets holds the potential to increase sample size, improve external validity, and gain a well-rounded view of the question under study. However, the practical implementation of integrating different data sources can be complicated, particularly when considering data collected across sites and institutions. In this paper, we use a case study of data from four RCTs and two electronic health record (EHR) systems to illustrate some of the challenges that can arise when combining these various sources of data. We group the challenges into cohort- and variable-related challenges, and for each challenge, we provide descriptive statistics and visuals from our case study to show the decisions that must be made and the subsequent implications. We provide guidance for researchers on the most important considerations and emphasize the necessity for careful, documented decision-making done through an interdisciplinary team. Through this case study and associated reflections, we highlight the dangers of naively combining data and advocate for a discussion and clear communication of the decisions made at each step in the data combination process, as well as the limitations and implications of those decisions.

Citation

Brantner, Carly L., et al. “The challenges of integrating diverse data sources: A case study in major depression.” Health Services and Outcomes Research Methodology (2025): 1-23.

BibTex

@article{brantner2025challenges, title={The challenges of integrating diverse data sources: A case study in major depression}, author={Brantner, Carly L and Yu, Wenshan and Zhao, Congwen and Jeon, Kyungeun and Ringlein, Grace V and Wang, Qiao and Lemoto, Elaona and Nguyen, Trang Quynh and Gagliardi, Jane P and Zandi, Peter P and others}, journal={Health Services and Outcomes Research Methodology}, pages={1–23}, year={2025}, publisher={Springer} }

Collaborators:

Referenced Research: