Home / Papers / Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction

Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction

Authors: Haresh Rengaraj Rajamohan, Xiang Gao, Weicheng Zhu, Shih-Lun Huang, Long Chen, Gabe Schulman, Huizhen Jin, Shengduo Li, Yixuan Wang, Huidi Yang, Kyunghyun Cho, Cem M. Deniz, Narges Razavian Date: 2026-03-25 Paper ID: arxiv:2603.24562

Summary

This paper introduces RAVEN, a new generative pretraining strategy designed for sequential Electronic Health Record (EHR) data using a Recurrence-Aware next-Visit EveNt prediction objective. The model autoregressively generates tokenized clinical events conditioned on patient history, trained on a large cohort of over one million individuals. A key methodological contribution involves regularization against predicting repeated events and highlighting a critical evaluation pitfall where metric inflation occurs if new onsets are not differentiated from subsequent occurrences. Empirically, RAVEN demonstrates strong zero-shot generalization for disease incidence forecasting, matching fine-tuned models while also showing robustness to external cohort mapping discrepancies.

Key Contributions

Introduction of RAVEN, a novel autoregressive generative pretraining strategy tailored for sequential Electronic Health Record (EHR) data based on next-visit event prediction.
Development of a new evaluation metric principle to address the pitfall of inflated performance metrics due to unaccounted-for repeated event tokens in EHR foundation model evaluation.
Empirical investigation of scaling laws in a data-constrained regime, showing that model size increases are suboptimal without proportional data volume increases.
Demonstration of RAVEN achieving zero-shot prediction performance on disease incidence forecasting that rivals fine-tuned Transformer models and surpasses simulation-based baselines.
Showing RAVEN’s ability to generalize to external patient cohorts despite lossy clinical code mappings and feature coverage gaps without further fine-tuning.

Limitations

The study primarily focuses on next-visit prediction and its immediate forecasting utility; broader clinical utility beyond specific disease incidence prediction remains an open area. The analysis of scaling laws is constrained to a “data-constrained, compute-saturated regime.”

Open Questions & Future Work

Key Concepts

Recurrence-Aware next-Visit EveNt prediction: A generative pretraining strategy for sequential Electronic Health Record (EHR) data that explicitly models and regularizes the prediction of repeated clinical events.

Datasets

clinical records (EHR)

Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction

Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction

Summary

Key Contributions

Limitations

Open Questions & Future Work

Key Concepts

Datasets

Limitations

Links

Metadata & Links