Method and system of building hospital-scale chest X-ray database for entity extraction and weakly-supervised classification and localization of common thorax diseases
Inventors
Wang, Xiaosong • Peng, Yifan • Lu, Le • Lu, Zhiyong • Summers, Ronald M.
Assignees
US Department of Health and Human Services
Publication Number
US-11583239-B2
Publication Date
2023-02-21
Expiration Date
2038-03-26
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
A new chest X-ray database, referred to as “ChestX-ray8”, is disclosed herein, which comprises over 100,000 frontal view X-ray images of over 32,000 unique patients with the text-mined eight disease image labels (where each image can have multi-labels), from the associated radiological reports using natural language processing. We demonstrate that these commonly occurring thoracic diseases can be detected and spatially-located via a unified weakly supervised multi-label image classification and disease localization framework, which is validated using our disclosed dataset.
Core Innovation
The invention relates to a method and system for building a hospital-scale chest X-ray database, called ChestX-ray8, comprising over 108,000 frontal-view X-ray images from more than 32,000 unique patients. The images are labeled with eight common thoracic diseases mined from their associated radiological reports using natural language processing (NLP) to extract multi-labels per image.
The system demonstrates that commonly occurring thoracic diseases can be detected and spatially located using a unified weakly supervised multi-label image classification and disease localization framework. This framework employs deep convolutional neural network (DCNN) architectures, with a transition layer, global pooling layer, prediction layer, and multi-label classification loss layer after the last convolutional layer, enabling high precision computer-aided diagnosis.
The problem being solved is how to effectively utilize large-scale, loosely labeled hospital chest X-ray databases, containing unlabeled or weakly labeled imaging informatics, to train data-hungry deep learning models for high precision medical image analysis. Existing datasets are smaller and lack large-scale, multi-label annotations extracted from radiology reports. Furthermore, challenges include handling multi-label classification with limited region-level annotations, negation and uncertainty in report text, and localization of small pathological regions in large images.
Claims Coverage
The claims disclose 2 independent claims covering a method and a system utilizing a unified weakly supervised multi-label classification and disease localization framework for thoracic disease detection and localization in chest X-ray images.
Unified weakly supervised multi-label classification and localization framework
A multi-label deep convolutional neural network (DCNN) classification model comprising a transition layer, a global pooling layer, a prediction layer, and a multi-label classification loss layer after the last convolutional layer, used to analyze chest X-ray images and determine presence and anatomical locations of thoracic diseases.
Anatomical location identification via bounding boxes
Identification of anatomical locations of one or more thoracic diseases using localized bounding boxes drawn relative to the chest X-ray images based on the classification and localization framework.
Natural language processing-mined disease labels and locations
Use of a database containing chest X-ray images with disease labels and anatomical locations text-mined from associated radiological reports via NLP to assist in determining disease presence and location.
Use of deep convolutional neural network architectures for weakly supervised disease localization
Employment of DCNN architectures tailored for weakly supervised localization of thoracic diseases by combining deep activations from the transition layer and weights from the prediction layer to generate disease spatial locations.
The claims collectively cover a method and system that leverage a unified weakly supervised multi-label DCNN framework incorporating text-mined labels and localization to detect and spatially locate multiple common thoracic diseases in chest X-ray images, enhancing automated clinical diagnosis.
Stated Advantages
Enables the construction of a large-scale chest X-ray database with accurate multi-label disease annotations derived from radiological reports using advanced NLP techniques.
Supports detection and spatial localization of multiple thoracic diseases simultaneously using a unified deep learning framework, facilitating automated high precision CAD systems.
Addresses challenges of weakly labeled data and negation/uncertainty in radiology reports by novel syntactic-level negation and uncertainty detection.
Improves disease classification accuracy using a positive/negative balancing factor in multi-label loss functions and advanced pooling methods in DCNN architectures.
Enables fully automated, scalable medical image analysis with minimal need for dense manual annotation, advancing clinical applications and research.
Documented Applications
Building a hospital-scale chest X-ray image database (ChestX-ray8) with multi-label disease annotations mined from radiological reports.
Weakly supervised multi-label classification and spatial localization of common thoracic diseases in chest X-rays, including Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, and Pneumothorax.
Automated generation of radiological reports based on detected diseases and their anatomical locations in X-ray images.
Development of fully automated high precision computer-aided diagnosis (CAD) systems for chest X-ray interpretation at scale.
Interested in licensing this patent?