Expected Variables Present: Multi-Site, Anomaly Detection, Cross-Sectional Analysis
Created
Last Modified
Files
Domain
Category
Parameters
Publisher
Abstract
This check provides raw data and visualizations to aid a user in evaluating whether expected concepts are present in a dataset of interest. It summarizes the proportion of patients with co-occurring variables. This check promotes the identification of anomalous data to compare among sites.
Data Requirements
Probe
Clinical Assessment
Access Package
# install.packages("devtools")
devtools::install_github('ssdqa/https://github.com/ssdqa/conceptsetdistribution')Visualization Output
This check outputs a dot plot representing anomalous proportions of patients (or rows) with a given variable per site. This graph summarizes the mean absolute deviation (MAD) value for the concept_id by the dot size, how often that concept_id is used proportionally by the dot color, and whether that concept_id is anomalous by replacing the dot with a star. A tooltip provides metadat for the mapped concet and the site and precise values for proportion, mean proportion, median proportion, standard deviation and MAD upon hover.
Raw Output
The raw data output of this check produces twenty_one columns of data:
| Column | Data Type | Definition |
|---|---|---|
site |
character | the name of the site being targeted |
total_pt_ct |
numeric | the total number of patients from the cohort in the domain table |
total_row_ct |
numeric | the total number of rows associated with patients from the cohort in the domain table |
variable_pt_ct |
numeric | the number of patients with evidence of the variable |
variable_row_ct |
numeric | the number of rows with evidence of the variable |
prop_pt_variable |
numeric | the proportion of patients with evidence of the variable |
prop_row_variable |
numeric | the proportion of rows with evidence of the variable |
variable |
character | the name of the variable |
mean_val |
numeric | the mean proportion of patients or rows (based on user selection) for each group across sites |
median_val |
numeric | the median proportion of patients or rows (based on user selection) for each group across sites |
sd_val |
numeric | the standard deviation of the proportion of patients or rows (based on user selection) for each group across sites |
mad_val |
numeric | the median absolute deviation of the proportion of patients or rows (based on user selection) for each group across sites |
cov_val |
numeric | the coefficient of variance of the proportion of patients or rows (based on user selection) for each group across sites |
max_val |
numeric | the maximum proportion of patients or rows (based on user selection) for each group across sites |
min_val |
numeric | the minimum prorportion of patients or rows (based on user selection) for each group across sites |
range_val |
numeric | the range of the proportion of patients or rows (based on user selection) for each group across sites |
total_ct |
numeric | the total number of group members |
analysis_eligible |
character | a string indicating whether the group is eligible for anomaly detection analysis |
lower_tail |
numeric | the lower bound used to identify low anomalies |
upper_tail |
numeric | the upper bound used to identify high anomalies |
anomaly_yn |
character | a string indicating whether the value is anomalous or not |

