Deep neural networks have been extensively applied in the medical domain for various tasks, including image classification, segmentation, and landmark detection. However, their application is often hindered by data scarcity, both in terms of available annotations and images. This study introduces a novel application of denoising diffusion probabilistic models (DDPMs) to landmark detection task, specifically addressing the challenge of limited annotated data in x-ray imaging. Our key innovation lies in leveraging DDPMs for self-supervised pre-training in landmark detection, a previously unexplored approach in this domain. This method enables accurate landmark detection with minimal annotated training data (as few as 50 images), surpassing both ImageNet supervised pre-training and traditional self-supervised techniques across three popular x-ray benchmark datasets. To our knowledge, this work represents the first application of diffusion models for self-supervised learning in landmark detection, which may offer a valuable pre-training approach in few-shot regimes, for mitigating data scarcity
This paper introduces a novel application of denoising diffusion probabilistic models (DDPMs) for anatomical landmark detection in X-ray images, specifically addressing the challenge of limited annotated data. The key innovation lies in leveraging DDPMs for self-supervised pre-training in landmark detection, a previously unexplored approach in this domain. The method enables accurate landmark detection with minimal annotated training data (as few as 1-50 images), significantly outperforming both ImageNet supervised pre-training and traditional self-supervised techniques across three popular X-ray benchmark datasets (Chest, Cephalometric, and Hand). A comprehensive evaluation against state-of-the-art alternatives, including YOLO, demonstrates the approach's effectiveness even when pre-trained on one in-domain dataset and fine-tuned on smaller, distinct datasets.
The paper evaluates the effectiveness of DDPM self-supervised pre-training for landmark detection by benchmarking it against supervised ImageNet pre-training, self-supervised state-of-the-art methods (MoCoV3, SimCLRV2, and DINO), and the YOLO framework across three X-ray datasets: Chest, Cephalometric, and Hand. The proposed approach consistently outperforms both ImageNet and alternative SSL methods across all datasets and training image quantities, with particularly impressive results in low-data regimes. For instance, with just one labeled sample in the Chest dataset, DDPM achieves a Mean Radial Error (MRE) of 14.99px compared to ImageNet's 143.67px, representing an 89.6% improvement. Similar significant performance gains are observed in the Cephalometric dataset (15.71mm vs 86.71mm MRE) and Hand dataset (28.75mm vs 79.32mm MRE). When compared to YOLO, a state-of-the-art universal anatomical landmark detection model that uses mixed dataset training, DDPM achieves competitive or superior results using just one labeled sample. These results demonstrate the method's effectiveness in few-shot learning scenarios, which are common in medical imaging where annotated data is scarce.
@article{DiVia2024,
author = {Di Via, R. and Odone, F. and Pastore, V. P.},
title = {Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images},
year = {2024},
journal = {arXiv},
volume = {2407.18125},
url = {https://arxiv.org/abs/2407.18125},
}
Di Via, R., Odone, F., & Pastore, V. P. (2024). Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images. ArXiv. https://arxiv.org/abs/2407.18125