Automated Feature Extraction from Transcranial Doppler Procedure Notes Using Natural Language Processing: A Multi-Center Study
Study Dates
Last Modified
Tags
Publisher
Abstract
Project aims to demonstrate the feasibility of using automated methods to extract Transcranial Doppler velocities and results from multiple hospitals in the PEDSnet network. The goal is to confirm data abstraction validity from at least 5 hospitals.
Affiliation(s)
Funder(s)
Provenance
Description
Efforts to reduce the burden of abnormal transcranial Doppler in patients with SCD have been hampered by challenges in ascertaining transcranial Doppler results, which are typically stored as unstructured data elements consisting of free-text radiology reports. Multi-center studies examining transcranial Doppler results have required labor-intensive manual review.5 Natural language processing (NLP) offers a scalable and reproducible methodology that has been proven to yield high performance in extracting clinical features from clinical notes and pathology reports. The goal of this study is to develop an NLP-based tool and workflow to extract transcranial Doppler results using radiology reports across multiple PEDSnet institutions. We will then use these reports to describe the epidemiology and outcomes SCD related transcranial Doppler abnormalities in a modern cohort.
Study Design
This is a retrospective cohort study to determine the feasibility of using NLP techniques to extract transcranial Doppler velocities and classify transcranial Doppler outcomes. The overall design of this aim will be as follows:
- Leverage the PEDSnet data pipeline to obtain transcranial Doppler clinical notes from participating sites
- Populate OMOP/PEDSnet NOTE table for transcranial Doppler (free text, no extracted features)
- Abstract transcranial Doppler notes to obtain velocities, and outcome ascertainment
- Manually validate a sample of transcranial Doppler reports at each site and, if needed, modify our transcranial Doppler natural language processing methods.
- Use extracted data to complete analyses of a related PEDSnet Project, Creation of a Computable Phenotype in Childhood- Onset Arterial Ischemic Stroke (CAIS).
Patients with a transcranial Doppler order will be obtained from the PEDSnet database for participating institutes. Free text data from these identified patients will be extracted and de-identified locally by sites using the TiDE program. The de-identified data will be incorporated into OMOP/PEDSnet CDM tables in the centralized PEDSnet database. The team will leverage NLP tools such as Machine Learning-based feature extraction using a pre-trained model to extract transcranial Doppler velocities and classify transcranial Doppler outcomes into normal, abnormal, conditional, indeterminate, or not performed

