Datasets | VinLab

VinDr-CXR

This is a large-scale dataset of chest X-ray images that was created via the VinDr Lab platform. It contains more than 18,000 CXR scans collected from two major hospitals in Vietnam. The images were labeled for the presence of 28 different radiographic findings and diagnoses in collaboration with a total of 17 experienced radiologists. VinDr-CXR is currently the largest dataset with radiologist-generated annotations. The dataset was explored to organize a competition hosted on the Kaggle platform.

VinDr-SpineXR

Vingroup Big Data Institute (VinBigdata) has created and made freely available the VinDr-SpineXR: A large-scale X-ray dataset for spinal lesions detection and classification. The VinDr-SpineXR contains 10,469 images from 5,000 studies that are manually annotated with 13 types of abnormalities, each scan was annotated by an expert radiologist. To the best of our knowledge, the VinDr-SpineXR is currently the largest dataset to date that provides radiologist’s bounding-box annotations for developing supervised-learning object detection algorithms

VinDr-RibCXR

VinDr-RibCXR is a dataset for automatic segmentation and labeling of individual ribs from chest X-ray (CXR) scans. The VinDr-RibCXR contains 245 CXRs with corresponding ground truth annotations provided by human experts. Each image was assigned to an expert, who manually segmented and annotated each of 20 ribs, denoted as L1→L10 (left ribs) and R1→R10 (right ribs). The masks of ribs (see Figure 1) were then stored in a JSON file that can later be used for training instance segmentation models.

VinDr-Mammo

We make VinDr-Mammo publicly available as a new imaging resource to promote advances in developing computer-aided detection and diagnosis (CADe/x) tools for breast cancer screening. The dataset consists of 5,000 four-view exams with breast-level assessment and finding annotations. Each of these exams was independently double read, with discordance (if any) being resolved by arbitration by a third radiologist. The VinDr-Mammo dataset is currently the largest public dataset of full-field digital mammography that contains BI-RADS assessment and abnormality annotations.

VinDr-PCXR

In an effort to provide a large-scale pediatric chest X-ray (PCXR) dataset with high-quality annotations for the research community, we have built the VinDr-PCXR dataset which consists of 9,125 CXR scans in patients younger than 10 years old. The images were annotated by a group of three radiologists with at least 10 years of experience for the presence of 36 critical findings and 15 diagnoses. To the best of our knowledge, the released VinDr-PCXR is currently the largest public pediatric CXR dataset with radiologist-generated annotations.

VinDr-BodyPartXR

We make VinDr-BodyPartXR publicly available as a new imaging resource to promote the development and evaluation of new machine learning models for the body part X-ray classification. The dataset consists of 16,093 X-ray images that are collected and manually annotated. To the best of of our knowledge, the VinDr-BodyPartXR is currently the largest dataset to date that provides annotations for developing supervised-learning classification algorithms for body part X-ray classification.