Duplicate Records: Single Site, Anomaly Detection, Cross-Sectional Analysis


dc.contributorPatient-Centered Outcomes Research Institute
dc.contributor.authorWieand, Kaleigh
dc.contributor.authorRazzaghi, Hanieh
dc.contributor.otherPEDSnet Data Coordinating Center
dc.date.accessioned2026-05-18T14:14:38Z
dc.date.created2026-03-27
dc.description.abstractThis check provides raw data and visualizations to aid a user in evaluating whether duplicate records are present in a dataset of interest. It summarizes the proportion of duplicate rows & patients with duplicate rows, as well as the median number of duplicate rows per patient.
dc.identifier.urihttps://hdl.handle.net/20.500.14642/1650
dc.identifier.urihttps://doi.org/10.24373/pdsp-709
dc.publisherPEDSnet
dc.relation.urihttps://github.com/ssdqa/duplicaterecords
dc.rightsa CC-BY Attribution 4.0 License.
dc.rights.urihttp://creativecommons.org/licenses/by/4.0
dc.subjectEvent-Level Analysis
dc.subjectSingle-Site Analysis
dc.subjectData Anomaly Method
dc.subjectCross-Sectional Analysis
dc.titleDuplicate Records: Single Site, Anomaly Detection, Cross-Sectional Analysis
dspace.entity.typeDQCheck
local.code.package# install.packages("devtools") devtools::install_github('ssdqa/duplicaterecords')
local.description.rawThis check produces a raw data output containing 14 columns: <br> |Column |Data Type|Definition | |----------------|---------|--------------------------------------------------------------------------------------------| |`site` |character|the name of the site being targeted | |`duplicate_definition` |character|an alias to describe the definition of duplication being investigated | |`duplicate_columns` |character|the name(s) of the column(s) included or excluded to define duplication| |`n_w_fact` |numeric |the total number of patients with evidence of duplication| |`sd_fact` |numeric |the standard deviation of the number of duplicate values per patient, only including patients who have evidence of duplication| |`mean_fact` |numeric |the mean of the number of duplicate values per patient, only including patients who have evidence of duplication| |`outlier_fact` |numeric |the number of patients, only including patients who have evidence of duplication, who fall a user-selected number of standard deviations away from the mean | |`prop_outlier_fact`|numeric|the proportion of patients who fall a user-selected number of standard deviations away from the mean out of patients who have evidence of duplication| |`n_tot` |numeric |the total number of patients | |`sd_tot` |numeric |the standard deviation of the number of duplicate rows per patient for all patients| |`mean_tot` |numeric |the mean of the number of duplicate rows per patient for all patients | |`outlier_tot` |numeric |the number of patients, out of all patients, who fall a user-selected number of standard deviations away from the mean | |`prop_outlier_tot`|numeric|the proportion of patients who fall a user-selected number of standard deviations away from the mean out of all patients| |`output_function`|character|a string indicating the type of visualization that should be generated by dr_output| {.dqcheck-table}
local.description.vizThis check outputs a bar graph displaying either the proportion or number of patients with a number of duplicate rows that fall a user-selected number of standard deviations away from the mean.
local.dqcheck.categoryConformance
local.dqcheck.clinicalprobeClinical Data Distributions
local.dqcheck.clinicalprobeExpected Clinical Event Representation
local.dqcheck.probeMissing Expected Data
local.dqcheck.probeAnomalous Values from Internal Distributions
local.dqcheck.requirementcohort
local.dqcheck.requirementdr_input_file
local.dqcheck.requirementomop_or_pcornet
local.dqcheck.requirementsingle_or_multi_site
local.dqcheck.requirementanomaly_or_exploratory
local.dqcheck.requirementtime
local.dqcheck.requirementpatient_level_tbl
local.dqcheck.requirementoutput_level
local.dqcheck.vizBar Graph
relation.isCodeOfDQCheck929c8dfc-2c8b-4e62-8e1d-0fa06c542832
relation.isCodeOfDQCheck.latestForDiscovery929c8dfc-2c8b-4e62-8e1d-0fa06c542832

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
dr_ss_anom_cs.png
Size:
72.52 KB
Format:
Portable Network Graphics