Statistical Methods for Phenotype Estimation and Analysis Using Electronic Health Records

Project Summary

Background and Significance: Electronic health records (EHR) provide extensive information on disease risk factors that can be studied to improve our understanding of health outcomes. However, medical assessments are performed at irregular intervals in response to patients’ medical needs, which makes these data difficult to use for research. This project will develop new statistical methods that combine the unique set of measures available for each individual to estimate a “latent phenotype.” The latent phenotype consists of a patient’s underlying, true disease profile, which may be only hinted at by the series of medical tests recorded in the EHR. By efficiently combining all available information for each individual, we will leverage the richness and complexity of EHR data, and we will be able to better characterize patients.

To demonstrate the potential of our new statistical methods, we will use them to identify children and adolescents with type II diabetes. Using EHR data from eight children’s hospital health systems participating in the PEDSnet federation, we will develop a pediatric diabetes latent phenotype. This phenotype can be used in subsequent research for identifying patient participants or for assessing risk of other health outcomes that may be increased in children with type II diabetes. We will work with clinician, patient, and parent partners from PEDSnet to identify downstream health consequences that are most important for further study and analyze associations between the newly developed diabetes latent phenotype and these outcomes. These analyses will illustrate the performance of the latent phenotype approach in a real-world context where information on risk factors and outcomes for type II diabetes is urgently needed.

Study Aims: The study aims 1) to develop statistical methods for estimating latent phenotypes, 2) to develop methods for incorporating latent phenotypes into analyses of health outcomes accounting for uncertainty in phenotypes and other patient covariates,  and 3) to estimate a type II diabetes phenotype for patients in the PEDSnet federation and associations with downstream health outcomes. The long-term objective of this research is to provide better statistical methods for combining inconsistently collected measures derived from the EHR.

Study Description: We will develop statistical methods and software for estimating latent phenotypes and their associations with health outcomes. We will evaluate the predictive accuracy, bias, and efficiency of these methods relative to standard approaches, through statistical simulations. Using data from PEDSnet, we will estimate a latent type II diabetes phenotype. To assess the added value of using our new methods, we will compare the performance of our new methods with performance of previously developed phenotypes.

Health Services Research Projects in Progress (HSRProj)  Registration

PCORI Award Information

Collaborator Institutions