Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research


Publication Date

Journal Title

Journal ISSN

Volume Title

Publisher

JAMIA

Rights

© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email** journals.permissions@oup.com.

Research Projects

Institutions

Journal Issue

Abstract

Objectives
The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods.

Materials and Methods
We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing **Methods to handle missing data.

Results When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced **Results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression.

Discussion and conclusion
Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.

Clinical Subject Headings

Studies

Description

Specifications

Cite this Publication

Zhou Y, Shi J, Stein R, Liu X, Baldassano RN, Forrest CB, Chen Y, Huang J. “Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.” J Am Med Inform Assoc. 2023 Jun 20;30(7):1246-1256.
DOI: 10.1093/jamia/ocad066
PMID: 37337922; PMCID: PMC10280351

PEDSnet Project

Characterizing Disease Trajectory for Improving Treatment in Pediatric Crohn's Disease
Affiliation:Children's Hospital of Philadelphia
Study to characterize disease trajectory and identify unique patterns to explain heterogeneity of diseasetrajectory using EHR data, and to identify pediatric crohns disease (PCD) subgroups and build dynamic prediction for long-term PCD outcomes by incorporating information in disease trajectory.
An Efficient Distributed Learning Framework for Integrating Evidence in Clinical Research Networks
Affiliation:Hospital of the University of Pennsylvania
Study to develop a framework of distributed algorithms that efficiently synthesize evidence in large clinical data research networks for studying impacts of risk factors for rare adverse events.
A Two-Stage Meta-Regression Framework for Precision Medicine Using Data from Clinical Data Research Networks
Affiliation:Hospital of the University of Pennsylvania
Study to develop evidence synthesis methods to improve distributed analyses in distributed research networks (DRN) and advance our understanding of benefit and risk of different treatment options.

Endorsement

Review

Supplemented By

Referenced By