Does size matter? 3 key findings from a new study on developing AI for radiology research
When research teams are developing deep learning models, they have to make certain decisions about the image resolutions used in their work. For instance, should they always aim to use the largest images possible? Or are there times when smaller images can get the job done?
The authors of a new study in Radiology: Artificial Intelligence explored this subject at length, assessing how convolutional neural networks (CNNs) were affected by “a wide spectrum of image resolutions and network training strategies.” Numerous CNNs were trained on the NIH ChestX-ray14 dataset, which includes more than 112,000 chest x-rays stored at a resolution of 1024 x 1024.
CNNs were trained using nine image resolutions—32 x 32, 64 x 64, 128 x 128, 224 x 224, 256 x 256, 320 x 320, 448 x 448, 512 x 512 and 600 x 600—and eight diagnostic labels: emphysema, cardiomegaly, hernia, atelectasis, edema, effusion, mass and nodule.
“We elected to only model eight out of the 14 labels in the ChestX-ray14 dataset owing to concerns about the clinical utility and relevance of models trained on the six excluded labels,” wrote authors Carl F. Sabottke, MD, and Bradley M. Spieler, MD, LSU Health Sciences Center New Orleans.
The authors then compared the performance of all of these models, looking for clues that may help future researchers develop the most effective CNNs possible. These are three key takeaways from their analysis:
1. Higher image resolutions are not always the answer
Researchers may assume that using the highest image resolutions possible is always the right call, but that is far from the case. The work of Sabottke and Spieler revealed that the area under the ROC curve (AUC) would actually drop in some instances as their CNNs were trained with the larger images.
2. CNNs trained to detect subtle findings do benefit from higher image resolutions
When developing a CNN to detect more subtle findings—pulmonary nodules, for example—higher image resolutions can lead to a higher AUC.
“For pulmonary nodule detection compared with thoracic mass detection, performance discrepancies between ranges of image resolutions are likely due to the size difference between these findings,” the authors wrote. “By definition, a pulmonary nodule is less than 3 cm, whereas a mass exceeds that size. In contrast, emphysema typically presents more diffusely on a radiograph than a nodule or mass, and thus, relatively poor performance at low resolutions likely relates to more generalized loss of information within the images.”
3. AI technology is still evolving at a rapid rate
Using larger images can impact CNN development in negative ways, resulting in “an inherent trade-off.” But what can’t be done today may be possible tomorrow, meaning that researchers in the near future may have more flexibility when it comes to using larger image resolutions in their AI models.
“As hardware improvements and algorithmic advancements continue to occur, developing radiology deep learning applications at higher image resolutions becomes continuously more feasible,” the authors wrote. “One limitation of our present work was that, owing to graphics processing unit memory constraints, we fixed our batch size at eight for all models, as our hardware was not capable of training high-resolution models at larger batch sizes. However, as hardware advances make graphics processing units with larger amounts of random access memory increasingly available, there is an opportunity for obtaining better performance from high image resolution models with larger batch sizes.”