Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm
| dc.contributor.author | Duan R |
| dc.contributor.author | Boland MR |
| dc.contributor.author | Liu Z |
| dc.contributor.author | Liu Y |
| dc.contributor.author | Chang HH |
| dc.contributor.author | Xu H |
| dc.contributor.author | Chu H |
| dc.contributor.author | Schmid CH |
| dc.contributor.author | Forrest CB |
| dc.contributor.author | Holmes JH |
| dc.contributor.author | Schuemie MJ |
| dc.contributor.author | Berlin JA |
| dc.contributor.author | Moore JH |
| dc.contributor.author | Chen Y |
| dc.contributor.other | University of Pennsylvania |
| dc.contributor.other | Stanford University |
| dc.contributor.other | Harvard University |
| dc.contributor.other | Emory University |
| dc.contributor.other | The University of Texas Health Science Center at Houston |
| dc.contributor.other | University of Minnesota |
| dc.contributor.other | Brown University |
| dc.contributor.other | Children's Hospital of Philadelphia |
| dc.contributor.other | Janssen (United States) |
| dc.date.accessioned | 2026-02-17T17:53:16Z |
| dc.date.created | 2020-03-01 |
| dc.date.issued | 2020-03-01 |
| dc.description.abstract | **Objectives:** <br>We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites. **Materials and Methods:** <br>ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard). **Results:** <br>Our simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model. **Conclusions:** <br>This study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency. |
| dc.identifier.citation | Duan R, Boland MR, Liu Z, Liu Y, Chang HH, Xu H, Chu H, Schmid CH, Forrest CB, Holmes JH, Schuemie MJ, Berlin JA, Moore JH, Chen Y. Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm. J Am Med Inform Assoc. 2020 Mar 1;27(3):376-385. <br>DOI: [10.1093/jamia/ocz199](https://doi.org/10.1093/jamia/ocz199) <br>PMID: 31816040; PMCID: PMC7025371. |
| dc.identifier.doi | 10.1093/jamia/ocz199 |
| dc.identifier.uri | https://hdl.handle.net/20.500.14642/1531 |
| dc.language.iso | en-US |
| dc.publisher | JAMIA |
| dc.relation.uri | https://pubmed.ncbi.nlm.nih.gov/31816040/ |
| dc.rights | © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| dc.subject.mesh | Algorithms |
| dc.subject.mesh | Computer Simulation |
| dc.subject.mesh | Confidentiality |
| dc.subject.mesh | Data Analysis |
| dc.subject.mesh | Datasets as Topic |
| dc.subject.mesh | Drug-Related Side Effects and Adverse Reactions |
| dc.subject.mesh | Electronic Health Records |
| dc.subject.mesh | Fetal Death |
| dc.subject.mesh | Logistic Models |
| dc.subject.mesh | Odds Ratio |
| dc.subject.mesh | Pregnancy |
| dc.title | Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm |
| dspace.entity.type | Publication |
