Patient-Facts: Single Site, Anomaly Detection, Longitudinal Analysis
Created
Last Modified
Domain
Category
Parameters
Publisher
Abstract
This check assesses how much clinical data is available for patients accross time (in years, months, or weeks). It provides a high level summary of anomalous/outlier clinical data for a single site. The number of clinical events per year of follow-up for each patient in a cohort is computed and stratified by visit type.
Data Requirements
Probe
Clinical Assessment
Access Package
# install.packages("devtools")
devtools::install_github('ssdqa/https://github.com/ssdqa/patientfacts')Visualization Output
This check output depends on the time increment input by the user. For yearly time increments, the check outputs a control chart visualization that highlights anomalies in the proportion of patients with a given fact type in the provided variable. anomalous proportion of patients are represented by orange dots. Blue dots represent non-anomalous values. The time increment (x-axis) is in years. For smaller time increments (by month or smaller) the check outputs two graphs to visualize anomalies while ignoring seasonality. The first is a time series line graph with anomalies indicated by red dots. The second graph is a four-facet time series line graph that demonstrates the decomposition of the anomalies to clarify how the anomalies were identified. For each output, a tooltip provides each point’s exact coordinates upon hover.
Raw Output
The raw data output of this check produces eleven columns of data for analysis in annual time intervals:
| Column | Data Type | Definition |
|---|---|---|
visit_type |
character | string indicating the visit type |
site |
character | the name of the site being targeted OR “combined” if multiple sites were provided |
time_start |
date | the start of the time period being examined |
time_increment |
character | the length of each time period |
domain |
character | string indicating the domain |
pts_w_fact |
numeric | the number of patients who have a fact within the time period |
sum_fact_ct |
numeric | the total number of facts per patient within the time period |
median_fact_ct |
numeric | the median number of facts per patient within the time period |
pt_ct_denom |
numeric | the total number of eligible patients from the cohort within the time period |
pts_w_visit |
numeric | the number of patients with a visit of the type of interest within the time period |
prop_pts_fact |
numeric | the proportion of patients with the domain of interest out of all patients with a visit of the visit type of interest during the time period |
The raw data output of this check produces eleven columns of data for analysis in monthly or weekly time intervals:
| Column | Data Type | Definition |
|---|---|---|
observed |
numeric | the original proportion of patients |
season |
numeric | the seasonal component of the time series |
trend |
numeric | the trend component of the time series |
remainder |
numeric | the residual component after “season” and “trend” are removed from “observed” - target of anomaly detection |
seasadj |
numeric | the adjusted seasonal component |
anomaly |
character | a flag to indicate whether the proportion is an anomaly |
anomaly_direction |
numeric | the direction of the anomaly (upper or lower) |
anomaly_score |
numeric | the distance between the anomaly and the centerline |
recomposed_l1 |
numeric | the lower level bound of the processed time series used to identify lower outliers |
recomposed_l2 |
numeric | the upper level bound of the processed time series used to identify upper outliers |
observed_clean |
numeric | the original proportion after the season and trend components have been removed and anomalies have been detected |

