Patient Records Consistency: Single Site, Anomaly Detection, Longitudinal Analysis
Created
Last Modified
Domain
Category
Parameters
Publisher
Abstract
This check provides analyses to identify anomalous data across time at the level of a single site. The Patient Record Consistency module, part of the larger SSDQA ecosystem, tests the consistency of clinical data representation within a patient’s record. The goal is to ensure that the patient’s information is confirmatory and complete, such that two events that are expected to co-exist do both occur within the same patient (i.e. a leukemia diagnosis and chemotherapy).
Data Requirements
Probe
Clinical Assessment
Access Package
# install.packages("devtools")
devtools::install_github('ssdqa/patientrecordconsistency')Visualization Output
This check’s visual output depends on the time increment input by the user.
For yearly time increments, this check outputs a control chart that highlights anomalies in the proportion of patients per event category. A P Prime chart is used to account for the high sample size, which means that the standard deviation is multiplied by a numerical constant. Blue dots along the line indicate non-anomalous values, while orange dots are anomalies.Only one event category should be specified via the event_filter parameter to be displayed on the graph. Any of the four options seen in the other output may be chosen with a, b, both, or neither.
For smaller time increments (by month or smaller), seasonality can make it difficult to detect true anomalies in a time series. This output computes anomalies while ignoring seasonality and outputs 2 graphs:
- A time series line graph with anomalies highlighted with a red dot.
- A 4-facet time series line graph that demonstrates the decomposition of the anomalies to make it more clear how the anomalies were identified.
Raw Output
This check produces a raw data output containing 9 columns of data for analyses over annual intervals:
| Column | Data Type | Definition |
|---|---|---|
site |
character | the name of the site being targeted OR “combined” if multiple sites were provided |
time_start |
date | the start of the time period being examined |
time_increment |
character | the length of each time period |
event_a_name |
character | the name of event A |
event_b_name |
character | the name of event B |
total_pts |
numeric | the total number of eligible patients in the cohort during the time period |
stat_type |
character | string indicating the event combination of interest: A only, B only, both, or neither |
stat_ct |
numeric | the count of patients meeting the criteria for stat_type in the time period of interest |
prop_event |
numeric | the proportion of patients meeting the criteria for stat_type in the time period of interest |
It produces 11 columns of data for analyses over time of monthly or weekly intervals:
| Column | Data Type | Definition |
|---|---|---|
observed |
numeric | the original proportion of patients |
season |
numeric | the seasonal component of the time series |
trend |
numeric | the trend component of the time series |
remainder |
numeric | the residual component after “season” and “trend” are removed from “observed” - target of anomaly detection |
seasadj |
numeric | the adjusted seasonal component |
anomaly |
character | a flag to indicate whether the proportion is an anomaly |
anomaly_direction |
numeric | the direction of the anomaly (upper or lower) |
anomaly_score |
numeric | the distance between the anomaly and the centerline |
recomposed_l1 |
numeric | the lower level bound of the processed time series used to identify lower outliers |
recomposed_l2 |
numeric | the upper level bound of the processed time series used to identify upper outliers |
observed_clean |
numeric | the original proportion after the season and trend components have been removed and anomalies have been detected |

