Patient Records Consistency: Single Site, Anomaly Detection, Longitudinal Analysis

PEDSnet; Wieand, Kaleigh; Bailey, Charles; Razzaghi, Hanieh; Dickinson, Kimberley

Data Quality Check

Patient Records Consistency: Single Site, Anomaly Detection, Longitudinal Analysis

Created

2024-12-17

Click on the thumbnail above to preview images.

Files

Publisher

PEDSnet

Data Requirements

cohort , prc_event_file , omop_or_pcornet , multi_or_single_site , anomaly_or_exploratory , age_groups , patient_level_tbl , fu_breaks , p_value , time , time_span , time_period

Abstract

This check provides analyses to identify anomalous data across time at the level of a single site. The Patient Record Consistency module, part of the larger SSDQA ecosystem, tests the consistency of clinical data representation within a patient’s record. The goal is to ensure that the patient’s information is confirmatory and complete, such that two events that are expected to co-exist do both occur within the same patient (i.e. a leukemia diagnosis and chemotherapy).

How to Access This Check

You may access the module’s R package in GitHub.
Or, run in R

install_github('ssdqa/patientrecordconsistency')

Using the provided vignettes on GitHub or help in R, follow parameter input instructions for “Single-Site”, “Anomaly Detection”, “Longitudinal Analysis” requirements.

Check Output

Visualization Output

This check’s visual output depends on the time increment input by the user.

For yearly time increments, this check outputs a control chart that highlights anomalies in the proportion of patients per event category. A P Prime chart is used to account for the high sample size, which means that the standard deviation is multiplied by a numerical constant. Blue dots along the line indicate non-anomalous values, while orange dots are anomalies.Only one event category should be specified via the event_filter parameter to be displayed on the graph. Any of the four options seen in the other output may be chosen with a, b, both, or neither.

For smaller time increments (by month or smaller), seasonality can make it difficult to detect true anomalies in a time series. This output computes anomalies while ignoring seasonality and outputs 2 graphs:

A time series line graph with anomalies highlighted with a red dot.
A 4-facet time series line graph that demonstrates the decomposition of the anomalies to make it more clear how the anomalies were identified.

Raw Output

This check produces a raw data output containing 9 columns of data for analyses over annual intervals:

Column	Data Type	Definition
`site`	character	the name of the site being targeted OR “combined” if multiple sites were provided
`time_start`	date	the start of the time period being examined
`time_increment`	character	the length of each time period
`event_a_name`	character	the name of event A
`event_b_name`	character	the name of event B
`total_pts`	numeric	the total number of eligible patients in the cohort during the time period
`stat_type`	character	string indicating the event combination of interest: A only, B only, both, or neither
`stat_ct`	numeric	the count of patients meeting the criteria for stat_type in the time period of interest
`prop_event`	numeric	the proportion of patients meeting the criteria for stat_type in the time period of interest

It produces 11 columns of data for analyses over time of monthly or weekly intervals:

Column	Data Type	Definition
`observed`	numeric	the original proportion of patients
`season`	numeric	the seasonal component of the time series
`trend`	numeric	the trend component of the time series
`remainder`	numeric	the residual component after “season” and “trend” are removed from “observed” - target of anomaly detection
`seasadj`	numeric	the adjusted seasonal component
`anomaly`	character	a flag to indicate whether the proportion is an anomaly
`anomaly_direction`	numeric	the direction of the anomaly (upper or lower)
`anomaly_score`	numeric	the distance between the anomaly and the centerline
`recomposed_l1`	numeric	the lower level bound of the processed time series used to identify lower outliers
`recomposed_l2`	numeric	the upper level bound of the processed time series used to identify upper outliers
`observed_clean`	numeric	the original proportion after the season and trend components have been removed and anomalies have been detected

Affiliation(s)

Children's Hospital of Philadelphia

Funder(s)

This research was made possible through the generous support of the Patient-Centered Outcomes Research Institute .

Development Code

https://github.com/ssdqa/patientrecordconsistency

Creative Commons license

Except where otherwised noted, this item's license is described as a CC-BY Attribution 4.0 License.

Full item page

Patient Records Consistency: Single Site, Anomaly Detection, Longitudinal Analysis

Created

Last Modified

Files

Tags

Publisher

Data Requirements

Abstract

How to Access This Check

Check Output

Visualization Output

Raw Output

Affiliation(s)

Funder(s)

Provenance

Description

Development Code

Clinical Subjects Headings

Related Data Quality Result

Related Person

relationships.isdDQCheckOf

Related Publications

Creative Commons license