Predicting Partially Observed Long-Term Outcomes with Adversarial Positive-Unlabeled Domain Adaptation

Proceedings of Machine Learning Research

M Yan, M Xia, WA Huang, C Hong, BA Goldstein, MM Engelhard

Illustration of the proposed framework with three alignment components: overall alignment, partial alignment, and conditional alignment. The source encoder and classifier are from the pre-training step.

Summary

Predicting long-term clinical outcomes often requires large-scale training data with sufficiently long follow-up. However, in electronic health records (EHR) data, long-term labels may not be available for contemporary patient cohorts. Given the dynamic nature of clinical practice, models that rely on historical training data may not perform optimally. In this work, we frame the problem as a positive–unlabeled domain adaptation task, where we seek to adapt from a fully labeled source domain (eg, historical data) to a partially labeled target domain (eg, contemporary data). We propose an adversarial framework that includes three core components:(1) Overall Alignment, to match feature distributions between source and target domains;(2) Partial Alignment, to map source negatives to unlabeled target samples; and (3) Conditional Alignment, to address conditional shift using available positive labels in the target domain. We evaluate our method on a benchmark digit classification task (SVHNMNIST), and two real-world EHR applications: prediction of one-year mortality post COVID-19, and long-term prediction of neurodevelopmental conditions (NDC) in children. In all settings, our approach consistently outperforms baseline models and, in most cases, achieves performance close to an oracle model trained with fully observed labels.

Data and Code Availability This paper uses publicly available image datasets SVHN (Lecun et al., 1998) and MNIST (Netzer et al., 2011). This paper also uses EHR data from the Duke University Health System (DUHS) that cannot be made publicly available. Program code is publicly available and included in the supplementary material.

Citation

Yan, Mengying, et al. “Predicting Partially Observed Long-Term Outcomes with Adversarial Positive-Unlabeled Domain Adaptation.” Proceedings of Machine Learning Research 287 (2025): 1-15.

BibTex

@article{yan2025predicting, title={Predicting Partially Observed Long-Term Outcomes with Adversarial Positive-Unlabeled Domain Adaptation}, author={Yan, Mengying and Xia, Meng and Huang, Wei A and Hong, Chuan and Goldstein, Benjamin A and Engelhard, Matthew M}, journal={Proceedings of Machine Learning Research}, volume={287}, pages={1–15}, year={2025} }

Collaborators:

Referenced Research: