Duplicate Records: Single Site, Exploratory, Cross-Sectional Analysis


dc.contributorPatient-Centered Outcomes Research Institute
dc.contributor.authorWieand, Kaleigh
dc.contributor.authorRazzaghi, Hanieh
dc.contributor.otherPEDSnet Data Coordinating Center
dc.date.accessioned2026-05-18T14:14:37Z
dc.date.created2026-03-27
dc.description.abstractThis check provides raw data and visualizations to aid a user in evaluating whether duplicate records are present in a dataset of interest. It summarizes the proportion of duplciate rows & patients with duplicate rows, as well as the median number of duplicate rows per patient.
dc.identifier.urihttps://hdl.handle.net/20.500.14642/1649
dc.identifier.urihttps://doi.org/10.24373/pdsp-708
dc.publisherPEDSnet
dc.relation.urihttps://github.com/ssdqa/duplicaterecords
dc.rightsa CC-BY Attribution 4.0 License.
dc.rights.urihttp://creativecommons.org/licenses/by/4.0
dc.subjectEvent-Level Analysis
dc.subjectSingle-Site Analysis
dc.subjectExploratory Analysis
dc.subjectCross-Sectional Analysis
dc.titleDuplicate Records: Single Site, Exploratory, Cross-Sectional Analysis
dspace.entity.typeDQCheck
local.code.package# install.packages("devtools") devtools::install_github('ssdqa/duplicaterecords')
local.description.rawThis check produces a raw data output containing 14 columns: <br> |Column |Data Type|Definition | |----------------|---------|--------------------------------------------------------------------------------------------| |`site` |character|the name of the site being targeted | |`duplicate_definition` |character|an alias to describe the definition of duplication being investigated | |`duplicate_columns` |character|the name(s) of the column(s) included or excluded to define duplication| |`total_rows` |numeric |the total number of rows in the domain | |`total_pt` |numeric |the total number of patients in the domain| |`duplicate_rows` |numeric |the number of duplicate rows | |`duplicate_pt` |numeric |the number of patients with at least one duplicate row | |`duplicate_row_prop`|numeric|the proportion of duplicate rows| |`duplicate_pt_prop`|numeric|the proportion of patients with at least one duplicate row| |`median_all_with0s`|numeric|the median number of duplicate rows per patient, for all patients, across all sites| |`median_all_without0s`|numeric|the median number of duplicate rows per patient, for only patients with evidence of duplication, across all sites| |`median_site_with0s`|numeric|the median number of duplicate rows per patient, for all patients, across a specific site| |`median_site_without0s`|numeric|the median number of duplicate rows per patient, for only patients with evidence of duplication, for a specific site| |`output_function`|character|a string indicating the type of visualization that should be generated by dr_output| {.dqcheck-table}
local.description.vizThis check outputs a bar graph displaying the user-specified numeric value with one bar for each duplicate definition. It can display any of: `duplicate_row_prop`, `duplicate_pt_prop`, `median_all_with0s` / `median_site_with0s`, or `median_all_without0s` / `median_site_without0s`
local.dqcheck.categoryConformance
local.dqcheck.clinicalprobeClinical Data Distributions
local.dqcheck.clinicalprobeExpected Clinical Event Representation
local.dqcheck.probeMissing Expected Data
local.dqcheck.requirementcohort
local.dqcheck.requirementdr_input_file
local.dqcheck.requirementomop_or_pcornet
local.dqcheck.requirementsingle_or_multi_site
local.dqcheck.requirementanomaly_or_exploratory
local.dqcheck.requirementtime
local.dqcheck.requirementpatient_level_tbl
local.dqcheck.requirementoutput_level
local.dqcheck.vizBar Graph
relation.isCodeOfDQCheck929c8dfc-2c8b-4e62-8e1d-0fa06c542832
relation.isCodeOfDQCheck.latestForDiscovery929c8dfc-2c8b-4e62-8e1d-0fa06c542832

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
dr_ss_exp_cs.png
Size:
41.05 KB
Format:
Portable Network Graphics