Accommodating Population Differences in Risk Prediction Model Validation
Speaker: Ruth Pfeiffer, National Institutes of Health
Title: Accommodating Population Differences in Risk Prediction Model Validation
Abstract:
Statistical models that predict cancer incidence, recurrence, or mortality following a cancer diagnosis have broad public health and clinical applications. Validation of risk prediction models in independent data provides a rigorous assessment of model performance. However, several differences between the populations that gave rise to the training and the validation data can lead to seemingly poor performance of a risk model. We formalize the notions of “similarity” of the training and validation data and define reproducibility and transportability. We address the impact of different predictor distributions and differences in verifying the outcome on model calibration, accuracy and discrimination. When individual level data from both the training and validation data sets are available, we propose and study weighted versions of the validation metrics that adjust for differences in the predictor distributions and in outcome verification to provide a more comprehensive assessment of model performance. We give conditions on the model and the training and validation populations that ensure a model's reproducibility or transportability and show how to check them. As an illustration we develop and validate a prostate cancer risk model using data from two large North American prostate cancer prevention trials, the Selenium and Vitamin E Cancer Prevention Trial (SELECT) and Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening trials.