Expected Variables Present: Single Site, Anomaly Detection, Longitudinal Analysis
| dc.contributor | Patient-Centered Outcomes Research Institute |
| dc.contributor.author | PEDSnet Data Coordinating Center |
| dc.contributor.other | PEDSnet Data Coordinating Center |
| dc.date.accessioned | 2024-09-09T17:20:49Z |
| dc.date.created | 2024-06-05 |
| dc.description.abstract | This check provides raw data and visualizations to aid a user in evaluating whether expected concepts are present in a dataset of interest. It summarizes the proportion of patients with co-occurring variables. This check promotes the identification of anomalous data for a single site data across time (years). |
| dc.identifier.uri | https://hdl.handle.net/20.500.14642/776 |
| dc.identifier.uri | https://doi.org/10.24373/pdsp-467 |
| dc.publisher | PEDSnet |
| dc.relation.uri | https://github.com/ssdqa/expectedvariablespresent |
| dc.rights | a CC-BY Attribution 4.0 License. |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0 |
| dc.subject | Single Site Analysis |
| dc.subject | Data Anomaly Method |
| dc.subject | Longitudinal Analysis |
| dc.subject | Person-Level Analysis |
| dc.title | Expected Variables Present: Single Site, Anomaly Detection, Longitudinal Analysis |
| dspace.entity.type | DQCheck |
| local.code.package | # install.packages("devtools") devtools::install_github('ssdqa/https://github.com/ssdqa/conceptsetdistribution') |
| local.description.raw | The raw data output of this check produces ten columns of data for analysis over annual time intervals: <br> | Column | Data Type | Definition | |-------------------|-----------|-----------------------------------------------------------------------------------------| |`site` | character | the name of the site being targeted OR "combined" if multiple sites were provided | |`time_start` | date | the start of the time period being examined | |`time_increment` | character | the length of each time period | |`total_pt_ct` | numeric | the total number of patients from the cohort in the domain table | |`total_row_ct` | numeric | the total number of rows associated with patients from the cohort in the domain table | |`variable_pt_ct` | numeric | the number of patients with evidence of the variable | |`variable_row_ct` | numeric | the number of rows with evidence of the variable | |`prop_pt_variable` | numeric | the proportion of patients with evidence of the variable | |`prop_row_variable` | numeric | the proportion of rows with evidence of the variable | |`variable` | character | the name of the variable | {.dqcheck-table} The raw data output of this check produces eleven columns of data for analysis in monthly or weekly time intervals: <br> | Column | Data Type | Definition | |-------------------|-----------|--------------------------------------------------------------------------------------------------------------------| |`observed` | numeric | the original proportion of patients/rows | |`season` | numeric | the seasonal component of the time series | |`trend` | numeric | the trend component of the time series | |`remainder` | numeric | the residual component after "season" and "trend" are removed from "observed" - target of anomaly detection | |`seasadj` | numeric | the adjusted seasonal component | |`anomaly` | character | a flag to indicate whether the proportion is an anomaly | |`anomaly_direction` | numeric | the direction of the anomaly (upper or lower) | |`anomaly_score` | numeric | the distance between the anomaly and the centerline | |`recomposed_l1` | numeric | the lower level bound of the processed time series used to identify lower outliers | |`recomposed_l2` | numeric | the upper level bound of the processed time series used to identify upper outliers | |`observed_clean` | numeric | the original proportion after the season and trend components have been removed and anomalies have been detected | {.dqcheck-table} |
| local.description.viz | This check output varies based on the time increment input by the user. For yearly time increments, the check outputs a control chart displaying the number of pair mappings across time. The user is limited to one `concept_id` or CDM code per graph A tooltip provides each point's exact coordinates upon hover. Anomalous visits are distiguished by an orange point while non-anomalous visits are blue points. For smaller time increments (by month or smaller) the check outputs two graphs to visualize anomalies while ignoring seasonality. The first is a time series line graph with anomalies indicated by red dots. The second graph is a four-facet time series line graph that demonstrates the decomposition of the anomalies to clarify how eash anomaly was identified. For each output, a tooltip provides each point's exact coordinates upon hover. Both graphs represent data for one user-specified specialty at a time. |
| local.dqcheck.category | Consistency |
| local.dqcheck.clinicalprobe | Confirmatory Clinical Data |
| local.dqcheck.clinicalprobe | Clinical Follow-Up |
| local.dqcheck.clinicalprobe | Clinical Complexity |
| local.dqcheck.clinicalprobe | Clinical Consistency |
| local.dqcheck.measurement | Seasonal-Trend Decomposition Using LOESS |
| local.dqcheck.measurement | Time Series Anomalies |
| local.dqcheck.probe | Data Representation Errors |
| local.dqcheck.probe | Misclassification Detection |
| local.dqcheck.probe | Temporality Consistency Check |
| local.dqcheck.probe | Missing Required Data |
| local.dqcheck.requirement | cohort |
| local.dqcheck.requirement | omop_or_pcornet |
| local.dqcheck.requirement | evp_variable_file |
| local.dqcheck.requirement | multi_or_single_site |
| local.dqcheck.requirement | anomaly_or_exploratory |
| local.dqcheck.requirement | output_level |
| local.dqcheck.requirement | age_groups |
| local.dqcheck.requirement | p_value |
| local.dqcheck.requirement | time |
| local.dqcheck.requirement | time_span |
| local.dqcheck.requirement | time_period |
| local.dqcheck.type | Variable Testing |
| local.dqcheck.viz | Control Chart |
| relation.isCodeOfDQCheck | 929c8dfc-2c8b-4e62-8e1d-0fa06c542832 |
| relation.isCodeOfDQCheck.latestForDiscovery | 929c8dfc-2c8b-4e62-8e1d-0fa06c542832 |
| relation.isDQResultOfDQCheck | 304ff83c-5cb2-4e9a-8f50-d312f4d6e8c7 |
| relation.isDQResultOfDQCheck.latestForDiscovery | 304ff83c-5cb2-4e9a-8f50-d312f4d6e8c7 |
