Patient Records Consistency: Single Site, Anomaly Detection, Longitudinal Analysis

Last Modified

Click on the thumbnail above to preview images.

Tags















Publisher

PEDSnet

Abstract

This check provides analyses to identify anomalous data across time at the level of a single site. The Patient Record Consistency module, part of the larger SSDQA ecosystem, tests the consistency of clinical data representation within a patient’s record. The goal is to ensure that the patient’s information is confirmatory and complete, such that two events that are expected to co-exist do both occur within the same patient (i.e. a leukemia diagnosis and chemotherapy).


How to Access This Check

  1. You may access the module’s R package in GitHub.
    Or, run in R
install_github('ssdqa/patientrecordconsistency')
  1. Using the provided vignettes on GitHub or help in R, follow parameter input instructions for “Single-Site”, “Anomaly Detection”, “Longitudinal Analysis” requirements.

Check Output

Visualization Output

This check’s visual output depends on the time increment input by the user.

For yearly time increments, this check outputs a control chart that highlights anomalies in the proportion of patients per event category. A P Prime chart is used to account for the high sample size, which means that the standard deviation is multiplied by a numerical constant. Blue dots along the line indicate non-anomalous values, while orange dots are anomalies.Only one event category should be specified via the event_filter parameter to be displayed on the graph. Any of the four options seen in the other output may be chosen with a, b, both, or neither.

For smaller time increments (by month or smaller), seasonality can make it difficult to detect true anomalies in a time series. This output computes anomalies while ignoring seasonality and outputs 2 graphs:

  1. A time series line graph with anomalies highlighted with a red dot.
  2. A 4-facet time series line graph that demonstrates the decomposition of the anomalies to make it more clear how the anomalies were identified.

Raw Output

This check produces a raw data output containing 9 columns of data for analyses over annual intervals:

Column Data Type Definition
site character the name of the site being targeted OR “combined” if multiple sites were provided
time_start date the start of the time period being examined
time_increment character the length of each time period
event_a_name character the name of event A
event_b_name character the name of event B
total_pts numeric the total number of eligible patients in the cohort during the time period
stat_type character string indicating the event combination of interest: A only, B only, both, or neither
stat_ct numeric the count of patients meeting the criteria for stat_type in the time period of interest
prop_event numeric the proportion of patients meeting the criteria for stat_type in the time period of interest

It produces 11 columns of data for analyses over time of monthly or weekly intervals:
Column Data Type Definition
observed numeric the original proportion of patients
season numeric the seasonal component of the time series
trend numeric the trend component of the time series
remainder numeric the residual component after “season” and “trend” are removed from “observed” - target of anomaly detection
seasadj numeric the adjusted seasonal component
anomaly character a flag to indicate whether the proportion is an anomaly
anomaly_direction numeric the direction of the anomaly (upper or lower)
anomaly_score numeric the distance between the anomaly and the centerline
recomposed_l1 numeric the lower level bound of the processed time series used to identify lower outliers
recomposed_l2 numeric the upper level bound of the processed time series used to identify upper outliers
observed_clean numeric the original proportion after the season and trend components have been removed and anomalies have been detected

Funder(s)

This research was made possible through the generous support of the Patient-Centered Outcomes Research Institute .

Provenance

Description

Clinical Subjects Headings

Related Data Quality Result

Related Person

relationships.isdDQCheckOf

Related Publications

Creative Commons license

Except where otherwised noted, this item's license is described as a CC-BY Attribution 4.0 License.