Automated Feature Extraction from Transcranial Doppler Procedure Notes Using Natural Language Processing: A Multi-Center Study


dc.contributorPatient-Centered Outcomes Research Institute
dc.contributorAgency for Healthcare Research and Quality (AHRQ)
dc.contributor.authorDain, Aleksandra (Sarah)
dc.contributor.authorBeslow, Lauren
dc.contributor.otherNemours Children's Health
dc.date.accessioned2026-06-08T16:45:34Z
dc.descriptionEfforts to reduce the burden of abnormal transcranial Doppler in patients with SCD have been hampered by challenges in ascertaining transcranial Doppler results, which are typically stored as unstructured data elements consisting of free-text radiology reports. Multi-center studies examining transcranial Doppler results have required labor-intensive manual review.5 Natural language processing (NLP) offers a scalable and reproducible methodology that has been proven to yield high performance in extracting clinical features from clinical notes and pathology reports. The goal of this study is to develop an NLP-based tool and workflow to extract transcranial Doppler results using radiology reports across multiple PEDSnet institutions. We will then use these reports to describe the epidemiology and outcomes SCD related transcranial Doppler abnormalities in a modern cohort. #### Study Design This is a retrospective cohort study to determine the feasibility of using NLP techniques to extract transcranial Doppler velocities and classify transcranial Doppler outcomes. The overall design of this aim will be as follows: 1. Leverage the PEDSnet data pipeline to obtain transcranial Doppler clinical notes from participating sites 2. Populate OMOP/PEDSnet NOTE table for transcranial Doppler (free text, no extracted features) 3. Abstract transcranial Doppler notes to obtain velocities, and outcome ascertainment 4. Manually validate a sample of transcranial Doppler reports at each site and, if needed, modify our transcranial Doppler natural language processing methods. 5. Use extracted data to complete analyses of a related PEDSnet Project, [Creation of a Computable Phenotype in Childhood- Onset Arterial Ischemic Stroke (CAIS)](https://hdl.handle.net/20.500.14642/824). Patients with a transcranial Doppler order will be obtained from the PEDSnet database for participating institutes. Free text data from these identified patients will be extracted and de-identified locally by sites using the TiDE program. The de-identified data will be incorporated into OMOP/PEDSnet CDM tables in the centralized PEDSnet database. The team will leverage NLP tools such as Machine Learning-based feature extraction using a pre-trained model to extract transcranial Doppler velocities and classify transcranial Doppler outcomes into normal, abnormal, conditional, indeterminate, or not performed
dc.description.abstractProject aims to demonstrate the feasibility of using automated methods to extract Transcranial Doppler velocities and results from multiple hospitals in the PEDSnet network. The goal is to confirm data abstraction validity from at least 5 hospitals.
dc.identifier.urihttps://hdl.handle.net/20.500.14642/1662
dc.publisherPEDSnet
dc.rightsa CC-BY Attribution 4.0 license.
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectPCORI-Funded Research
dc.subjectGrant-Funded Research
dc.subjectCohort Study
dc.subjectRetrospective Study
dc.subject.meshUltrasonography, Doppler, Transcranial
dc.subject.meshNatural Lanuage Processing
dc.subject.meshLarge Language Models
dc.titleAutomated Feature Extraction from Transcranial Doppler Procedure Notes Using Natural Language Processing: A Multi-Center Study
dspace.entity.typeStudy
local.contributor.grant1P30HS029755
local.contributor.grantRI-CHOP-01-PS10
local.contributor.sitesChildren’s Hospital of Philadelphia
local.contributor.sitesNemours Children's Health
local.contributor.sitesChildren's Hospital Colorado
local.contributor.sitesTexas Children's Hospital
local.contributor.sitesSeattle Children's Hospital
local.description.analyticsSelection criteria for the study sample is as follows: **Inclusion Criteria:** 1. Diagnosis of SCD genotype SS (SCD), as per previously published phenotype: - Qualifying diagnosis code of SCD in the problem list, medical history, as a primary diagnosis at encounter, nonprimary diagnosis at encounter, or as a discharge diagnosis - 2 hematology/oncology outpatient visits at least 3 days apart OR 1 hospitalization in the electronic medical record. - Visits for administrative purposes, imaging, and labs will be excluded. **Exclusion Criteria:** - Number of diagnoses for sickle cell trait diagnosis > qualifying SCD diagnoses - Evidence for stem cell transplant before cohort entrance date - Evidence for autologous gene therapy before cohort entrance date - Age >= 17 years on cohort entrance date (first hematology/oncology in person encounter) - Age < 2 years on cohort exit date (last available PEDSnet encounter)
project.endDate2026
project.startDate2023
relation.isDocumentationOfStudy7156b9cb-99cf-430f-acc6-040fe7398373
relation.isDocumentationOfStudy.latestForDiscovery7156b9cb-99cf-430f-acc6-040fe7398373
relation.isOrgUnitOfStudya118440c-013c-4d44-b7ce-19a9b1441304
relation.isOrgUnitOfStudycdb3cef9-ebdd-4ca8-9000-14573ba301bf
relation.isOrgUnitOfStudyc8e42b1c-6ffb-4b73-b893-ae4b5fe07dfa
relation.isOrgUnitOfStudy751635c0-bac7-47e0-95ea-c78f3cb31390
relation.isOrgUnitOfStudyff3cd76b-cf4c-44c1-84a6-368cfd0e9123
relation.isOrgUnitOfStudy.latestForDiscoverya118440c-013c-4d44-b7ce-19a9b1441304
relation.isPublicationOfStudy88dd6903-1cb0-4296-9ace-62eba94d7ff2
relation.isPublicationOfStudye01509c3-6fee-4567-9155-ccfb92e5e1d0
relation.isPublicationOfStudyaf8ec59e-8375-45c5-89b2-36ee5937c485
relation.isPublicationOfStudy.latestForDiscovery88dd6903-1cb0-4296-9ace-62eba94d7ff2
relation.isStudyOfStudye31630db-0837-4cda-bd2d-1db206a5d751
relation.isStudyOfStudy.latestForDiscoverye31630db-0837-4cda-bd2d-1db206a5d751

Files