Expected Variables Present: Multi-Site, Anomaly Detection, Longitudinal Analysis
Created
Last Modified
Domain
Category
Parameters
Publisher
Abstract
This check provides raw data and visualizations to aid a user in evaluating whether expected concepts are present in a dataset of interest. It summarizes the proportion of patients with co-occurring variables. This check promotes the identification of anomalous data to compare among sites.
Data Requirements
Probe
Clinical Assessment
Access Package
# install.packages("devtools")
devtools::install_github('ssdqa/https://github.com/ssdqa/conceptsetdistribution')Visualization Output
This check outputs three visualizations to display the Euclidean distance between two time series: the smoothed (Loess) proportion of a user-selected variable for a given site, and the average proportion of all sites. Two line graphs (one smoother, one raw) represent the proportion of the variable at each site over time. Sites are differentiated by color, and a thick red line represente the All Site Average. A circular bar graph displays the Euclidean distance from the all-site mean where the color represents the average Loess proportion over time.
Raw Output
The raw data output of this check produces nine columns of data:
| Column | Data Type | Definition |
|---|---|---|
site |
character | the name of the site being targeted |
time_start |
date | the start of the time period being examined |
variable |
character | the name of the variable |
prop_pt_variable / prop_row_variable |
numeric | the proportion of patients or rows (based on user selection) with evidence of the variable |
mean_allsiteprop |
numeric | the average patient/row proportion across sites |
median |
numeric | the median patient/row proportion across sites |
date_numeric |
numeric | the numeric equivalent of time_start |
site_loess |
numeric | the patient/row proportion with Loess regression applied |
dist_eucl_mean |
numeric | the Euclidean distance of site_loess from mean_allsiteprop |

