Source and Concept Vocabularies: Single Site, Anomaly Detection, Cross-Sectional Analysis
| dc.contributor | Patient-Centered Outcomes Research Institute |
| dc.contributor.author | PEDSnet Data Coordinating Center |
| dc.contributor.other | PEDSnet Data Coordinating Center |
| dc.date.accessioned | 2024-09-09T17:26:07Z |
| dc.date.created | 2024-06-05 |
| dc.description.abstract | This check provides analyses at the level of a single site. It generates a high-level screenshot of possible anomalous mappings between source values and CDM codes. This check may only be executed if both the source code and the represented code are provided. |
| dc.identifier.uri | https://hdl.handle.net/20.500.14642/783 |
| dc.identifier.uri | https://doi.org/10.24373/pdsp-450 |
| dc.publisher | PEDSnet |
| dc.relation.uri | https://github.com/ssdqa/sourceconceptvocabularies/tree/main |
| dc.rights | a CC-BY Attribution 4.0 License. |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0 |
| dc.subject | Single Site Analysis |
| dc.subject | Data Anomaly Method |
| dc.subject | Cross-Sectional Analysis |
| dc.subject | Event-Level Analysis |
| dc.title | Source and Concept Vocabularies: Single Site, Anomaly Detection, Cross-Sectional Analysis |
| dspace.entity.type | DQCheck |
| local.code.package | # install.packages("devtools") devtools::install_github('ssdqa/sourceconceptvocabularies') |
| local.description.raw | This check produces a raw data output containing twenty-two columns of data: <br> | Column | Data Type | Definition | |-------------------|---------------------|----------------------------------------------------------------------------------------------------------| |`site` | character | the name of the site being targeted OR "combined" if multiple sites were provided | |`domain` | character | the domain associated with the provided concept set | |`concept_id` | numeric / character | the primary concept, native to the CDM and mapped from the source | |`source_concept_id` | numeric / character | the source concept, from the source system and mapped to the CDM | |`ct` | numeric | the number of times the `concept_id` / `source_concept_id` pair occurs in the data | |`denom_concept_ct` | numeric | the number of times the `concept_id` appears in the data | |`denom_source_ct` | numeric | the number of times the `source_concept_id` appears in the data | |`concept_prop` | numeric | the proportion of `concept_id` appearences made up by the `concept_id` / `source_concept_id` pair | |`source_prop` | numeric | the proportion of `source_concept_id` appearances made up by the `concept_id` / `source_concept_id` pair | |`mean_val` | numeric | the mean proportion of the provided code type (cdm or source) across sites | |`median_val` | numeric | the median proportion of the provided code type (cdm or source) across sites | |`sd_val` | numeric | the standard deviation of the proportion of the provided code type (cdm or source) across sites | |`mad_val` | numeric | the median absolute deviation of the proportion of the provided code type (cdm or source) across sites | |`cov_val` | numeric | the coefficient of variance of the proportion of the provided code type (cdm or source) across sites | |`max_val` | numeric | the maximum proportion of the provided code type (cdm or source) across sites | |`min_val` | numeric | the minimum proportion of the provided code type (cdm or source) across sites | |`range_val` | numeric | the range of the proportion of the provided code type (cdm or source) across sites | |`total_ct` | numeric | the total number of group members | |`analysis_eligible` | character | a string indicating whether the group is eligible for anomaly detection analysis | |`lower_tail` | numeric | the lower bound used to identify low anomalies | |`upper_tail` | numeric | the upper bound used to identify high anomalies | |`anomaly_yn` | character | a string indicating whether the value is anomalous or not | {.dqcheck-table} |
| local.description.viz | This plot represents the proportion of top mapping pairs for the top `concept_id`/CDM codes. Non-anomalous pairs are represented by dots while anomalies are represented by stars. The color gradient for the data point represents the proportion of each pair where: red represents a higher proportion and blue represents a lower proportion. The data point size represent the mean absolute deviation (MAD) value for the `concept_id`. Metadata about each data point can be obtained by hovering over each dot/star. The user sets the limit of number of top `concept_id` and top `source_concept_id` mappings. |
| local.dqcheck.category | Information Representation |
| local.dqcheck.clinicalprobe | Expected Clinical Event Representation |
| local.dqcheck.clinicalprobe | Clinical Data Distributions |
| local.dqcheck.measurement | Hotspots Outlier Detection |
| local.dqcheck.probe | Data Representation Errors |
| local.dqcheck.probe | Misclassification Detection |
| local.dqcheck.probe | Anomalous Values from Internal Distributions |
| local.dqcheck.requirement | cohort |
| local.dqcheck.requirement | concept_set |
| local.dqcheck.requirement | omop_or_pcornet |
| local.dqcheck.requirement | domain_tbl |
| local.dqcheck.requirement | code_type |
| local.dqcheck.requirement | code_domain |
| local.dqcheck.requirement | multi_or_single_site |
| local.dqcheck.requirement | anomaly_or_exploratory |
| local.dqcheck.requirement | p_value |
| local.dqcheck.requirement | age_groups |
| local.dqcheck.requirement | time |
| local.dqcheck.requirement | time_span |
| local.dqcheck.requirement | time_period |
| local.dqcheck.type | Concept Set Testing |
| local.dqcheck.viz | Dot and Star Plot |
| relation.isCodeOfDQCheck | 929c8dfc-2c8b-4e62-8e1d-0fa06c542832 |
| relation.isCodeOfDQCheck.latestForDiscovery | 929c8dfc-2c8b-4e62-8e1d-0fa06c542832 |
Files
Original bundle
1 - 1 of 1
