Source and Concept Vocabularies: Single Site, Anomaly Detection, Cross-Sectional Analysis
Created
Last Modified
Files
Domain
Category
Parameters
Publisher
Abstract
This check provides analyses at the level of a single site. It generates a high-level screenshot of possible anomalous mappings between source values and CDM codes. This check may only be executed if both the source code and the represented code are provided.
Data Requirements
Probe
Clinical Assessment
Access Package
# install.packages("devtools")
devtools::install_github('ssdqa/sourceconceptvocabularies')Visualization Output
This plot represents the proportion of top mapping pairs for the top concept_id/CDM codes. Non-anomalous pairs are represented by dots while anomalies are represented by stars. The color gradient for the data point represents the proportion of each pair where: red represents a higher proportion and blue represents a lower proportion. The data point size represent the mean absolute deviation (MAD) value for the concept_id. Metadata about each data point can be obtained by hovering over each dot/star. The user sets the limit of number of top concept_id and top source_concept_id mappings.
Raw Output
This check produces a raw data output containing twenty-two columns of data:
| Column | Data Type | Definition |
|---|---|---|
site |
character | the name of the site being targeted OR “combined” if multiple sites were provided |
domain |
character | the domain associated with the provided concept set |
concept_id |
numeric / character | the primary concept, native to the CDM and mapped from the source |
source_concept_id |
numeric / character | the source concept, from the source system and mapped to the CDM |
ct |
numeric | the number of times the concept_id / source_concept_id pair occurs in the data |
denom_concept_ct |
numeric | the number of times the concept_id appears in the data |
denom_source_ct |
numeric | the number of times the source_concept_id appears in the data |
concept_prop |
numeric | the proportion of concept_id appearences made up by the concept_id / source_concept_id pair |
source_prop |
numeric | the proportion of source_concept_id appearances made up by the concept_id / source_concept_id pair |
mean_val |
numeric | the mean proportion of the provided code type (cdm or source) across sites |
median_val |
numeric | the median proportion of the provided code type (cdm or source) across sites |
sd_val |
numeric | the standard deviation of the proportion of the provided code type (cdm or source) across sites |
mad_val |
numeric | the median absolute deviation of the proportion of the provided code type (cdm or source) across sites |
cov_val |
numeric | the coefficient of variance of the proportion of the provided code type (cdm or source) across sites |
max_val |
numeric | the maximum proportion of the provided code type (cdm or source) across sites |
min_val |
numeric | the minimum proportion of the provided code type (cdm or source) across sites |
range_val |
numeric | the range of the proportion of the provided code type (cdm or source) across sites |
total_ct |
numeric | the total number of group members |
analysis_eligible |
character | a string indicating whether the group is eligible for anomaly detection analysis |
lower_tail |
numeric | the lower bound used to identify low anomalies |
upper_tail |
numeric | the upper bound used to identify high anomalies |
anomaly_yn |
character | a string indicating whether the value is anomalous or not |

