Concept-Set Distribution: Single Site, Anomaly Detection, Longitudinal Analysis
dc.contributor | Patient-Centered Outcomes Research Institute |
dc.contributor.author | PEDSnet |
dc.contributor.other | Children's Hospital of Philadelphia |
dc.date.accessioned | 2024-09-09T17:11:49Z |
dc.date.created | 2024-06-05 |
dc.description.abstract | This check is intended to aid a user in understanding the distribution of concepts that form or represent a particular variable in a dataset. This check demonstrates how concept sets drive the prevalence or clinical composition of variables in a study. This check is designed for the identification of anomalous data within a single site's data over time. Use of this check will inform a researcher which time periods a given code was used abnormaly frequently or infrequently in the dataset. |
dc.description.abstract | #### How to Access This Check 1. You may access the module's R package in [GitHub](https://github.com/ssdqa/https://github.com/ssdqa/conceptsetdistribution).<br> Or, run in R ```{r} install_github('ssdqa/https://github.com/ssdqa/conceptsetdistribution') ``` 2. Using the provided vignettes on GitHub or help in R, follow parameter input instructions for "Single Site", "Anomaly Detection", "Longitudinal Analysis" requirements. |
dc.identifier.uri | https://pedsnet.org/metadata/handle/20.500.14642/765 |
dc.publisher | PEDSnet |
dc.relation.uri | https://github.com/ssdqa/conceptsetdistribution |
dc.rights | a CC-BY Attribution 4.0 License. |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0 |
dc.subject | Data Quality Check Categorizations::Data Quality Category::Consistency |
dc.subject | Data Quality Check Categorizations::Dataset Evaluation Strategy::Data Source Comparison::Single Site Analysis |
dc.subject | Data Quality Check Categorizations::Dataset Evaluation Strategy::Data Anomaly Method |
dc.subject | Data Quality Check Categorizations::Dataset Evaluation Strategy::Temporal Evaluation::Longitudinal Analysis |
dc.subject | Data Quality Check Categorizations::Dataset Evaluation Strategy::Analysis Level::Event-Level Analysis |
dc.subject | Data Quality Check Categorizations::Error Detection Approach::Data Quality Probe::Data Representation Errors |
dc.subject | Data Quality Check Categorizations::Error Detection Approach::Data Quality Probe::Information Density |
dc.subject | Data Quality Check Categorizations::Error Detection Approach::Data Quality Probe::Anomalous Values from Internal Distributions |
dc.subject | Data Quality Check Categorizations::Error Detection Approach::Data Quality Probe::Temporality Consistency Check |
dc.subject | Data Quality Check Categorizations::Error Detection Approach::Clinical Probe::Expected Clinical Event Representation |
dc.subject | Data Quality Check Categorizations::Error Detection Approach::Clinical Probe::Clinical Data Distributions |
dc.subject | Data Quality Check Categorizations::Dataset Evaluation Strategy::Data Visualization::Control Chart |
dc.subject | Data Quality Check Categorizations::Dataset Evaluation Strategy::Data Anomaly Method::Seasonal-Trend Decomposition Using LOESS |
dc.subject | Data Quality Check Categorizations::Dataset Evaluation Strategy::Data Anomaly Method::Time Series Anomalies |
dc.title | Concept-Set Distribution: Single Site, Anomaly Detection, Longitudinal Analysis |
dspace.entity.type | DQCheck |
local.description.raw | The raw data output of this check produces eight columns of data for annual increments if analysis: <br> |Column |Data Type |Definition | |-------------------------|-------------------|--------------------------------------------------------------------------------------------------------| |`site` |character |the name of the site being targeted OR "combined" if multiple sites were provided | |`time_start` |date |the start of the time period being examined | |`time_increment` |character |the length of each time period | |`variable` |character |the user-defined variable grouping assigned to the code | |`ct_denom` |numeric |the number of rows in the domain table associated with the variable | |`concept_id` / `concept_code`|numeric / character|the code of interest; for OMOP CDMs this will be `concept_id` / for PCORnet CDMS this will be concept_code| |`ct_concept` |numeric |the number of occurrences of the code | |`prop_concept` |numeric |the proportion of variable rows with the code of interest (ct_concept / ct_denom) | <br> For analyses in monthly increments or less, the raw output produces eleven columns: |Column |Data Type|Definition | |-----------------|---------|----------------------------------------------------------------------------------------------------------------| |`observed` |numeric |the original proportion of the concept | |`season` |numeric |the seasonal component of the time series | |`trend` |numeric |the trend component of the time series | |`remainder` |numeric |the residual component after "season" and "trend" are removed from "observed" - target of anomaly detection | |`seasadj` |numeric |the adjusted seasonal component | |`anomaly` |character|a flag to indicate whether the proportion is an anomaly | |`anomaly_direction`|numeric |the direction of the anomaly (upper or lower) | |`anomaly_score` |numeric |the distance between the anomaly and the centerline | |`recomposed_l1` |numeric |the lower level bound of the processed time series used to identify lower outliers | |`recomposed_l2` |numeric |the upper level bound of the processed time series used to identify upper outliers | |`observed_clean` |numeric |the original proportion after the season and trend components have been removed and anomalies have been detected| {.dqcheck-table} |
local.description.viz | This check output varies based on the time increment input by the user. For yearly time increments, a control chart highlights anomalies in the proportion of patients per `concept_id` for the provided variable over time. A P Prime chart is used to account for the high sample size, meaning that the standard deviation has been multiplied by a numerical constant. Blue dots along the line indicate non-anomalous values, while orange dots are anomalies. The chart is accompanied by a concept reference table which provides the total count of the concept in question. When using smaller time increments, such as months or weeks, seasonality can make it difficult to detect true anomalies in a time series. This output computes anomalies while ignoring seasonality and outputs 2 graphs: a time series line graph with anomalies highlighted with a red dot, and a four-faceted time series line graph demonstrating the anomaly decomposition to make clarify how the anomalies were identified. |
local.dqcheck.requirement | cohort |
local.dqcheck.requirement | domain_tbl |
local.dqcheck.requirement | concept_set |
local.dqcheck.requirement | omop_or_pcornet |
local.dqcheck.requirement | multi_or_single_site |
local.dqcheck.requirement | anomaly_or_exploratory |
local.dqcheck.requirement | num_concept_combined |
local.dqcheck.requirement | num_concept_1 |
local.dqcheck.requirement | num_concept_2 |
local.dqcheck.requirement | p_value |
local.dqcheck.requirement | age_groups |
local.dqcheck.requirement | time |
local.dqcheck.requirement | time_span |
local.dqcheck.requirement | time_period |
local.subject.flat | Single Site Analysis |
local.subject.flat | Data Anomaly Method |
local.subject.flat | Longitudinal Analysis |
local.subject.flat | Event-Level Analysis |
local.subject.flat | Consistency |
local.subject.flat | Data Representation Errors |
local.subject.flat | Information Density |
local.subject.flat | Anomalous Values from Internal Distributions |
local.subject.flat | Temporality Consistency Check |
local.subject.flat | Expected Clinical Event Representation |
local.subject.flat | Clinical Data Distributions |
local.subject.flat | Control Chart |
local.subject.flat | Seasonal-Trend Decomposition Using LOESS |
local.subject.flat | Time Series Anomalies |
Files
Original bundle
1 - 4 of 4