Leading initial
E
First name
Bryan
Last name
Crenshaw

Title

Perspective on the Development of a Large-Scale Clinical Data Repository for Pediatric Hearing Research.

Year of Publication

2020

Number of Pages

231-238

Date Published

2020 Mar/Apr

ISSN Number

1538-4667

Abstract

<p>The use of "big data" for pediatric hearing research requires new approaches to both data collection and research methods. The widespread deployment of electronic health record systems creates new opportunities and corresponding challenges in the secondary use of large volumes of audiological and medical data. Opportunities include cost-effective hypothesis generation, rapid cohort expansion for rare conditions, and observational studies based on sample sizes in the thousands to tens of thousands. Challenges include finding and forming appropriately skilled teams, access to data, data quality assessment, and engagement with a research community new to big data. The authors share their experience and perspective on the work required to build and validate a pediatric hearing research database that integrates clinical data for over 185,000 patients from the electronic health record systems of three major academic medical centers.</p>

DOI

10.1097/AUD.0000000000000779

Alternate Title

Ear Hear

PMID

31408044

Title

Temporal bone radiology report classification using open source machine learning and natural langue processing libraries.

Year of Publication

2016

Number of Pages

65

Date Published

2016 Jun 06

ISSN Number

1472-6947

Abstract

<p><strong>BACKGROUND: </strong>Radiology reports are a rich resource for biomedical research. Prior to utilization, trained experts must manually review reports to identify discrete outcomes. The Audiological and Genetic Database (AudGenDB) is a public, de-identified research database that contains over 16,000 radiology reports. Because the reports are unlabeled, it is difficult to select those with specific abnormalities. We implemented a classification pipeline using a human-in-the-loop machine learning approach and open source libraries to label the reports with one or more of four abnormality region labels: inner, middle, outer, and mastoid, indicating the presence of an abnormality in the specified ear region.</p>

<p><strong>METHODS: </strong>Trained abstractors labeled radiology reports taken from AudGenDB to form a gold standard. These were split into training (80&nbsp;%) and test (20&nbsp;%) sets. We applied open source libraries to normalize and convert every report to an n-gram feature vector. We trained logistic regression, support vector machine (linear and Gaussian), decision tree, random forest, and naïve Bayes models for each ear region. The models were evaluated on the hold-out test set.</p>

<p><strong>RESULTS: </strong>Our gold-standard data set contained 726 reports. The best classifiers were linear support vector machine for inner and outer ear, logistic regression for middle ear, and decision tree for mastoid. Classifier test set accuracy was 90&nbsp;%, 90&nbsp;%, 93&nbsp;%, and 82&nbsp;% for the inner, middle, outer and mastoid regions, respectively. The logistic regression method was very consistent, achieving accuracy scores within 2.75&nbsp;% of the best classifier across regions and a receiver operator characteristic area under the curve of 0.92 or greater across all regions.</p>

<p><strong>CONCLUSIONS: </strong>Our results indicate that the applied methods achieve accuracy scores sufficient to support our objective of extracting discrete features from radiology reports to enhance cohort identification in AudGenDB. The models described here are available in several free, open source libraries that make them more accessible and simplify their utilization as demonstrated in this work. We additionally implemented the models as a web service that accepts radiology report text in an HTTP request and provides the predicted region labels. This service has been used to label the reports in AudGenDB and is freely available.</p>

DOI

10.1186/s12911-016-0306-3

Alternate Title

BMC Med Inform Decis Mak

PMID

27267768

WATCH THIS PAGE

Subscription is not available for this page.