PEDSnet collects data to support learning about children's health. The first step in learning is to understand the quality and characteristics of the data.
PEDSnet Data Quality Assessment (DQA)
The PEDSnet data network contains information about a large population of children, collected across many sites and source systems. In order to support a variety of research and learning uses, it is important to take a careful approach to analyzing data quality. To meet this need, PEDSnet includes an extensive data quality assessment (DQA) process in its data management. This process has two goals. First, it works to maximize the quality of data available in the traditional sense of the word "quality", that is, to find and correct errors in the collection or standardization of data. This process begins with approximately 850 tests done on data from each PEDSnet site in each quarterly data cycle. Test results are reviewed by data scientists at the Data Coordinating Center, and discussed with the informatics teams at each site. Issues that arise from errors in data extraction or transmission, called "ETL issues", are corrected, and the data reloaded. For instance, when we first extract a new data element we may find that collecting data from all of our sites identifies gaps in our ETL specifications, and we need to revise and re-extract. Not surprisingly, as PEDSnet informatics teams have gained more experience working together on a common process, the number of the ETL issues has decreased.
Perhaps more importantly, the DQA process also lets us describe the operating characteristics of the data in several dimensions. This is especially relevant because PEDSnet collects data obtained during "real world" clinical settings, providing researchers with a window into clinical care. Since we can't tailor the data to a specific research use beforehand, it's essential that we help researchers understand the strengths and limitations of the data for their needs. These might include noticing that some types of data are missing frequently, because they're not well-captured in routine clinical care (e.g. gestational age for older children), or a jump in the number of procedures available as a site expands their EHR to new departments. So we also create "data provenance issues", which are recorded alongside the data to guide future use.
To address these two goals, the DQA program aims to characterize four dimensions of data quality:
- Fidelity (also sometimes called reliability) – the degree to which PEDSnet data accurately reflects the source systems from which it is drawn (as distinct from variations in how well the data are captured at the point of care);
- Consistency (also sometimes called internal validity) – the degree to which a specific type of information is recorded in the same way in the different data sources contributing to PEDSnet data;
- Accuracy (also sometimes called external validity) – the degree to which PEDSnet data correctly reflect the clinical characteristics of patients; and
- Completeness (also sometimes called feasibility) – the degree to which a given type of information is actually collected and available in PEDSnet.
These principles are used to develop the specific tests in the DQA program. The lower-level tests assess individual data elements for consistency with the CDM specification and with clinical plausibility, for trends over time, and for expected correlations between data elements. We are currently developing data quality profiles for sentinel cohorts of patients (e.g. children admitted to the hospital, children diagnosed with asthma) as well, in order to get better insight into how well the data represent groups of children who have been the focus of many clinical studies.
PCORnet Data Characterization
In addition to PEDSnet's own DQA process, as a member CDRN of PCORnet, we participate in the data characterization process managed by the PCORnet Coordinating Center. This program, based on the Mini-Sentinel data quality review process, provides another window into the characeristics of PEDSnet data, and produces a concrete set of measures for researchers using PEDSnet data via the PCORnet data model.
The PEDSnet Data Coordinating Center has been approved through this process to handle both feasibility queries preparatory to research studies and research queries that are part of PCORnet-sponsored studies.