Abstract
Breast cancer remains a significant global health concern, and machine learning algorithms and computer-aided detection systems have shown great promise in enhancing the accuracy and efficiency of mammography image analysis. However, there is a critical need for large, benchmark datasets for training deep learning models for breast cancer detection. In this work we developed Mammo-Bench, a large-scale benchmark dataset of mammography images, by collating data from seven well-curated resources, viz., INbreast, Mini-DDSM, KAU-BCMD, CMMD, CDD-CESM, DMID, and RSNA Screening Dataset. To ensure consistency across images from diverse sources while preserving clinically relevant features, all the images underwent a preprocessing pipeline that includes breast segmentation, pectoral muscle removal, and intelligent cropping. The dataset consists of 71,844 high-quality mammographic images from 26,500 patients across 8 countries and is one of the largest open-source mammography databases to the best of our knowledge. To show the utility of Mammo-Bench, ResNet101 architecture was used for classifying the images into Normal, Benign and Malignant classes. Performance of ResNet101 was evaluated on the proposed dataset and the results compared with a few member datasets and an external dataset, VinDr-Mammo. We show that training on the larger, proposed benchmark dataset is more reliable compared to when trained on other smaller datasets. An accuracy of 78.8% (with data augmentation of the minority classes) and 77.8% (without data augmentation) was achieved on the proposed benchmark dataset, compared to the other datasets for which the accuracy varied from 25 – 69%. Most striking was the improved prediction of the minority classes using the Mammo-Bench. These results establish baseline performance and demonstrate Mammo-Bench's utility as a comprehensive resource for developing and evaluating mammography analysis systems.