Duplicate Records: Single Site, Exploratory, Cross-Sectional Analysis
| dc.contributor | Patient-Centered Outcomes Research Institute |
| dc.contributor.author | Wieand, Kaleigh |
| dc.contributor.author | Razzaghi, Hanieh |
| dc.contributor.other | PEDSnet Data Coordinating Center |
| dc.date.accessioned | 2026-05-18T14:14:37Z |
| dc.date.created | 2026-03-27 |
| dc.description.abstract | This check provides raw data and visualizations to aid a user in evaluating whether duplicate records are present in a dataset of interest. It summarizes the proportion of duplciate rows & patients with duplicate rows, as well as the median number of duplicate rows per patient. |
| dc.identifier.uri | https://hdl.handle.net/20.500.14642/1649 |
| dc.identifier.uri | https://doi.org/10.24373/pdsp-708 |
| dc.publisher | PEDSnet |
| dc.relation.uri | https://github.com/ssdqa/duplicaterecords |
| dc.rights | a CC-BY Attribution 4.0 License. |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0 |
| dc.subject | Event-Level Analysis |
| dc.subject | Single-Site Analysis |
| dc.subject | Exploratory Analysis |
| dc.subject | Cross-Sectional Analysis |
| dc.title | Duplicate Records: Single Site, Exploratory, Cross-Sectional Analysis |
| dspace.entity.type | DQCheck |
| local.code.package | # install.packages("devtools") devtools::install_github('ssdqa/duplicaterecords') |
| local.description.raw | This check produces a raw data output containing 14 columns: <br> |Column |Data Type|Definition | |----------------|---------|--------------------------------------------------------------------------------------------| |`site` |character|the name of the site being targeted | |`duplicate_definition` |character|an alias to describe the definition of duplication being investigated | |`duplicate_columns` |character|the name(s) of the column(s) included or excluded to define duplication| |`total_rows` |numeric |the total number of rows in the domain | |`total_pt` |numeric |the total number of patients in the domain| |`duplicate_rows` |numeric |the number of duplicate rows | |`duplicate_pt` |numeric |the number of patients with at least one duplicate row | |`duplicate_row_prop`|numeric|the proportion of duplicate rows| |`duplicate_pt_prop`|numeric|the proportion of patients with at least one duplicate row| |`median_all_with0s`|numeric|the median number of duplicate rows per patient, for all patients, across all sites| |`median_all_without0s`|numeric|the median number of duplicate rows per patient, for only patients with evidence of duplication, across all sites| |`median_site_with0s`|numeric|the median number of duplicate rows per patient, for all patients, across a specific site| |`median_site_without0s`|numeric|the median number of duplicate rows per patient, for only patients with evidence of duplication, for a specific site| |`output_function`|character|a string indicating the type of visualization that should be generated by dr_output| {.dqcheck-table} |
| local.description.viz | This check outputs a bar graph displaying the user-specified numeric value with one bar for each duplicate definition. It can display any of: `duplicate_row_prop`, `duplicate_pt_prop`, `median_all_with0s` / `median_site_with0s`, or `median_all_without0s` / `median_site_without0s` |
| local.dqcheck.category | Conformance |
| local.dqcheck.clinicalprobe | Clinical Data Distributions |
| local.dqcheck.clinicalprobe | Expected Clinical Event Representation |
| local.dqcheck.probe | Missing Expected Data |
| local.dqcheck.requirement | cohort |
| local.dqcheck.requirement | dr_input_file |
| local.dqcheck.requirement | omop_or_pcornet |
| local.dqcheck.requirement | single_or_multi_site |
| local.dqcheck.requirement | anomaly_or_exploratory |
| local.dqcheck.requirement | time |
| local.dqcheck.requirement | patient_level_tbl |
| local.dqcheck.requirement | output_level |
| local.dqcheck.viz | Bar Graph |
| relation.isCodeOfDQCheck | 929c8dfc-2c8b-4e62-8e1d-0fa06c542832 |
| relation.isCodeOfDQCheck.latestForDiscovery | 929c8dfc-2c8b-4e62-8e1d-0fa06c542832 |
Files
Original bundle
1 - 1 of 1
