Source and Concept Vocabularies: Single Site, Anomaly Detection, Cross-Sectional Analysis


Created

Last Modified

Click on the thumbnail above to preview images.

Domain

Category

Parameters

Publisher

PEDSnet

Abstract

This check provides analyses at the level of a single site. It generates a high-level screenshot of possible anomalous mappings between source values and CDM codes. This check may only be executed if both the source code and the represented code are provided.

Probe

Clinical Assessment

Access Package

# install.packages("devtools") devtools::install_github('ssdqa/sourceconceptvocabularies')

Visualization Output

This plot represents the proportion of top mapping pairs for the top concept_id/CDM codes. Non-anomalous pairs are represented by dots while anomalies are represented by stars. The color gradient for the data point represents the proportion of each pair where: red represents a higher proportion and blue represents a lower proportion. The data point size represent the mean absolute deviation (MAD) value for the concept_id. Metadata about each data point can be obtained by hovering over each dot/star. The user sets the limit of number of top concept_id and top source_concept_id mappings.

Raw Output

This check produces a raw data output containing twenty-two columns of data:

Column Data Type Definition
site character the name of the site being targeted OR “combined” if multiple sites were provided
domain character the domain associated with the provided concept set
concept_id numeric / character the primary concept, native to the CDM and mapped from the source
source_concept_id numeric / character the source concept, from the source system and mapped to the CDM
ct numeric the number of times the concept_id / source_concept_id pair occurs in the data
denom_concept_ct numeric the number of times the concept_id appears in the data
denom_source_ct numeric the number of times the source_concept_id appears in the data
concept_prop numeric the proportion of concept_id appearences made up by the concept_id / source_concept_id pair
source_prop numeric the proportion of source_concept_id appearances made up by the concept_id / source_concept_id pair
mean_val numeric the mean proportion of the provided code type (cdm or source) across sites
median_val numeric the median proportion of the provided code type (cdm or source) across sites
sd_val numeric the standard deviation of the proportion of the provided code type (cdm or source) across sites
mad_val numeric the median absolute deviation of the proportion of the provided code type (cdm or source) across sites
cov_val numeric the coefficient of variance of the proportion of the provided code type (cdm or source) across sites
max_val numeric the maximum proportion of the provided code type (cdm or source) across sites
min_val numeric the minimum proportion of the provided code type (cdm or source) across sites
range_val numeric the range of the proportion of the provided code type (cdm or source) across sites
total_ct numeric the total number of group members
analysis_eligible character a string indicating whether the group is eligible for anomaly detection analysis
lower_tail numeric the lower bound used to identify low anomalies
upper_tail numeric the upper bound used to identify high anomalies
anomaly_yn character a string indicating whether the value is anomalous or not

Funder(s)

This research was made possible through the generous support of Patient-Centered Outcomes Research Institute. The statements presented in this work are solely the responsibility of the author(s) and do not necessarily represent the views of PCORI, its Board of Governors, or its Methodology Committee.

Provenance

Description

Clinical Subjects Headings

Related Data Quality Result

Related Person

Related Code

Study-Specific Quality, Utility, and Breadth Assessment
Created:2025-11Affiliation:PEDSnet Data Coordinating Center
This suite of R packages allows one to investigate multiple facets of data quality and customize analyses based on your study-specific needs. Each module allows up to 8 different analyses in either the OMOP or PCORnet CDM, all aimed at taking a different view of the data while still addressing the same data quality probe.

##### [View pkgdown summary here.](https://ssdqa.github.io/squba/)

Related Data Quality Check

Related Publications

Creative Commons license

Except where otherwised noted, this item's license is described as a CC-BY Attribution 4.0 License.

Cite this Data Quality Check

PEDSnet Data Coordinating Center. (2024, June). Source and Concept Vocabularies: Single Site, Anomaly Detection, Cross-Sectional Analysis. [D Q Check]. PEDSpace Knowledge Bank. https://doi.org/10.24373/pdsp-450