Source and Concept Vocabularies: Single Site, Anomaly Detection, Cross-Sectional Analysis


dc.contributorPatient-Centered Outcomes Research Institute
dc.contributor.authorPEDSnet Data Coordinating Center
dc.contributor.otherPEDSnet Data Coordinating Center
dc.date.accessioned2024-09-09T17:26:07Z
dc.date.created2024-06-05
dc.description.abstractThis check provides analyses at the level of a single site. It generates a high-level screenshot of possible anomalous mappings between source values and CDM codes. This check may only be executed if both the source code and the represented code are provided.
dc.identifier.urihttps://hdl.handle.net/20.500.14642/783
dc.identifier.urihttps://doi.org/10.24373/pdsp-450
dc.publisherPEDSnet
dc.relation.urihttps://github.com/ssdqa/sourceconceptvocabularies/tree/main
dc.rightsa CC-BY Attribution 4.0 License.
dc.rights.urihttp://creativecommons.org/licenses/by/4.0
dc.subjectSingle Site Analysis
dc.subjectData Anomaly Method
dc.subjectCross-Sectional Analysis
dc.subjectEvent-Level Analysis
dc.titleSource and Concept Vocabularies: Single Site, Anomaly Detection, Cross-Sectional Analysis
dspace.entity.typeDQCheck
local.code.package# install.packages("devtools") devtools::install_github('ssdqa/sourceconceptvocabularies')
local.description.rawThis check produces a raw data output containing twenty-two columns of data: <br> | Column | Data Type | Definition | |-------------------|---------------------|----------------------------------------------------------------------------------------------------------| |`site` | character | the name of the site being targeted OR "combined" if multiple sites were provided | |`domain` | character | the domain associated with the provided concept set | |`concept_id` | numeric / character | the primary concept, native to the CDM and mapped from the source | |`source_concept_id` | numeric / character | the source concept, from the source system and mapped to the CDM | |`ct` | numeric | the number of times the `concept_id` / `source_concept_id` pair occurs in the data | |`denom_concept_ct` | numeric | the number of times the `concept_id` appears in the data | |`denom_source_ct` | numeric | the number of times the `source_concept_id` appears in the data | |`concept_prop` | numeric | the proportion of `concept_id` appearences made up by the `concept_id` / `source_concept_id` pair | |`source_prop` | numeric | the proportion of `source_concept_id` appearances made up by the `concept_id` / `source_concept_id` pair | |`mean_val` | numeric | the mean proportion of the provided code type (cdm or source) across sites | |`median_val` | numeric | the median proportion of the provided code type (cdm or source) across sites | |`sd_val` | numeric | the standard deviation of the proportion of the provided code type (cdm or source) across sites | |`mad_val` | numeric | the median absolute deviation of the proportion of the provided code type (cdm or source) across sites | |`cov_val` | numeric | the coefficient of variance of the proportion of the provided code type (cdm or source) across sites | |`max_val` | numeric | the maximum proportion of the provided code type (cdm or source) across sites | |`min_val` | numeric | the minimum proportion of the provided code type (cdm or source) across sites | |`range_val` | numeric | the range of the proportion of the provided code type (cdm or source) across sites | |`total_ct` | numeric | the total number of group members | |`analysis_eligible` | character | a string indicating whether the group is eligible for anomaly detection analysis | |`lower_tail` | numeric | the lower bound used to identify low anomalies | |`upper_tail` | numeric | the upper bound used to identify high anomalies | |`anomaly_yn` | character | a string indicating whether the value is anomalous or not | {.dqcheck-table}
local.description.vizThis plot represents the proportion of top mapping pairs for the top `concept_id`/CDM codes. Non-anomalous pairs are represented by dots while anomalies are represented by stars. The color gradient for the data point represents the proportion of each pair where: red represents a higher proportion and blue represents a lower proportion. The data point size represent the mean absolute deviation (MAD) value for the `concept_id`. Metadata about each data point can be obtained by hovering over each dot/star. The user sets the limit of number of top `concept_id` and top `source_concept_id` mappings.
local.dqcheck.categoryInformation Representation
local.dqcheck.clinicalprobeExpected Clinical Event Representation
local.dqcheck.clinicalprobeClinical Data Distributions
local.dqcheck.measurementHotspots Outlier Detection
local.dqcheck.probeData Representation Errors
local.dqcheck.probeMisclassification Detection
local.dqcheck.probeAnomalous Values from Internal Distributions
local.dqcheck.requirementcohort
local.dqcheck.requirementconcept_set
local.dqcheck.requirementomop_or_pcornet
local.dqcheck.requirementdomain_tbl
local.dqcheck.requirementcode_type
local.dqcheck.requirementcode_domain
local.dqcheck.requirementmulti_or_single_site
local.dqcheck.requirementanomaly_or_exploratory
local.dqcheck.requirementp_value
local.dqcheck.requirementage_groups
local.dqcheck.requirementtime
local.dqcheck.requirementtime_span
local.dqcheck.requirementtime_period
local.dqcheck.typeConcept Set Testing
local.dqcheck.vizDot and Star Plot
relation.isCodeOfDQCheck929c8dfc-2c8b-4e62-8e1d-0fa06c542832
relation.isCodeOfDQCheck.latestForDiscovery929c8dfc-2c8b-4e62-8e1d-0fa06c542832

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
scv_ss_anom_cs.png
Size:
186.63 KB
Format:
Portable Network Graphics