Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research


dc.contributor.authorZhou Y
dc.contributor.authorShi J
dc.contributor.authorStein R
dc.contributor.authorLiu X
dc.contributor.authorBaldassano RN
dc.contributor.authorForrest CB
dc.contributor.authorChen Y
dc.contributor.authorHuang J
dc.contributor.otherUniversity of Pennsylvania
dc.contributor.otherChildren's Hospital of Philadelphia
dc.date.accessioned2026-01-22T16:09:40Z
dc.date.created2023-06-20
dc.date.issued2023-06-20
dc.description.abstract**Objectives** The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods. **Materials and Methods** We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing **Methods to handle missing data. **Results** When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced **Results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression. **Discussion and conclusion** Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.
dc.identifier.citationZhou Y, Shi J, Stein R, Liu X, Baldassano RN, Forrest CB, Chen Y, Huang J. "Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research." _J Am Med Inform Assoc_. 2023 Jun 20;30(7):1246-1256. <br>DOI: [10.1093/jamia/ocad066](https://doi.org/10.1093/jamia/ocad066) <br>PMID: 37337922; PMCID: PMC10280351
dc.identifier.doi10.1093/jamia/ocad066
dc.identifier.urihttps://hdl.handle.net/20.500.14642/1325
dc.language.isoen_US
dc.publisherJAMIA
dc.relation.urihttps://pubmed.ncbi.nlm.nih.gov/37337922/
dc.rights© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email** journals.permissions@oup.com.
dc.subject.meshComparative Effectiveness Research
dc.subject.meshData Interpretation, Statistical
dc.subject.meshComputer Simulation
dc.subject.meshBias
dc.subject.meshPropensity Score
dc.subject.meshResearch Design
dc.titleMissing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research
dspace.entity.typePublication
relation.isStudyOfPublicationea30d83b-595a-4e07-9c71-11b64acdd7aa
relation.isStudyOfPublicationf7b4cb51-33bf-4740-9ddf-d7fcab31c61c
relation.isStudyOfPublication19eb63d9-6057-4272-8235-8b604b1a8558
relation.isStudyOfPublication.latestForDiscoveryea30d83b-595a-4e07-9c71-11b64acdd7aa

Files