Comparative Analysis of Machine-Learning Model Performance in Image Analysis: The Impact of Dataset Diversity and Size

Eric D Pelletier; Sean D Jeffries; Kevin Song; Thomas M Hemmerling

doi:10.1213/ANE.0000000000007088

Comparative Analysis of Machine-Learning Model Performance in Image Analysis: The Impact of Dataset Diversity and Size

Anesth Analg. 2024 Aug 8. doi: 10.1213/ANE.0000000000007088. Online ahead of print.

Authors

Eric D Pelletier¹, Sean D Jeffries^{1

2}, Kevin Song², Thomas M Hemmerling^{1

2}

Affiliations

¹ From the Department of Experimental Surgery, McGill University Health Center, Montreal, Quebec, Canada.
² Department of Anesthesia, McGill University, Montreal, Quebec, Canada.

PMID: 39116018
DOI: 10.1213/ANE.0000000000007088

Abstract

Background: This study presents an analysis of machine-learning model performance in image analysis, with a specific focus on videolaryngoscopy procedures. The research aimed to explore how dataset diversity and size affect the performance of machine-learning models, an issue vital to the advancement of clinical artificial intelligence tools.

Methods: A total of 377 videolaryngoscopy videos from YouTube were used to create 6 varied datasets, each differing in patient diversity and image count. The study also incorporates data augmentation techniques to enhance these datasets further. Two machine-learning models, YOLOv5-Small and YOLOv8-Small, were trained and evaluated on metrics such as F1 score (a statistical measure that combines the precision and recall of the model into a single metric, reflecting its overall accuracy), precision, recall, mAP@50, and mAP@50-95.

Results: The findings indicate a significant impact of dataset configuration on model performance, especially the balance between diversity and quantity. The Multi-25 × 10 dataset, featuring 25 images from 10 different patients, demonstrates superior performance, highlighting the value of a well-balanced dataset. The study also finds that the effects of data augmentation vary across different types of datasets.

Conclusions: Overall, this study emphasizes the critical role of dataset structure in the performance of machine-learning models in medical image analysis. It underscores the necessity of striking an optimal balance between dataset size and diversity, thereby illuminating the complexities inherent in data-driven machine-learning development.