Unsupervised Image Segmentation Using Self-Supervised Deep Neural Networks

Noor Aldeen A. Khalid

doi:10.59992/IJSR.2026.v5n2p2

Authors

Noor Aldeen A. Khalid Author

DOI:

https://doi.org/10.59992/IJSR.2026.v5n2p2

Keywords:

Image Segmentation, Unsupervised Image Segmentation, Self-Supervised Deep Neural Networks

Abstract

Unsupervised image segmentation remains one of the most persistent challenges in computer vision, particularly in fields lacking annotated data such as medical diagnostics and environmental monitoring. This paper introduces a novel segmentation model built on a modified U-Net backbone enhanced with self-supervised deep learning, dual spatial alignment (local and global), and explainability mechanisms including Grad-CAM and SLIC superpixels. The proposed framework was evaluated across three benchmark datasets from diverse domains: HyperKvasir (gastrointestinal endoscopy), PASCAL VOC 2012 (natural scenes), and ISIC 2018 (skin lesion images).Experimental results demonstrated robust segmentation outcomes, achieving DSC = 0.716 and Recall = 0.783, which outperform traditional unsupervised baselines. These findings were further validated through comparison with five recent methods, showing superior generalization and transparency. Additionally, the framework was successfully deployed in a practical application involving drought monitoring in Kirkuk, Iraq, by leveraging satellite imagery and unsupervised segmentation to support early warning systems. Overall, the results highlight the flexibility, interpretability, and domain adaptability of the proposed model, making it a promising tool for critical tasks in both medical and environmental domains.

Author Biography

Noor Aldeen A. Khalid

Department of Medical Instruments Engineering Techniques, Bilad Alrafidain University College, 32001, Diyala, Iraq

References

1. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Medical Image Computing and Computer- Assisted Intervention (MICCAI), 2015.

2. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440.

3. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille, “Deep Lab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018.

4. A. Garcia-Garcia et al., “A review on deep learning techniques applied to semantic segmentation,” arXiv preprint ar Xiv: 1704.06857, 2017.

5. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R- CNN,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), 2017, pp. 2980–2988.

6. M. Tajbakhsh et al., “Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1299–1312, 2016.

7. X. Chen, S. Xie, and K. He, “An Empirical Study of Training Self-Supervised Vision Transformers,” arXiv preprint arXiv: 2104.02057, 2021.

8. A. Dosovitskiy et al., “Discriminative unsupervised feature learning with exemplar convolutional neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1734– 1747, 2016.

9. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visua Representations,” in Proc. Int. Conf. on Machine Learning (ICML), 2020, pp. 1597–1607.

10. J. Ji, Z. Zhang, and T. Zhao, “Learning Generalizable and Domain-Invariant Representations for Unsupervised Image Segmentation,” Neural Networks, vol. 144, pp. 1–10, 2021.

11. W. Zhu, C. Huang, and Y. Zhang, “Unsupervised semantic segmentation by mutual consistency learning,” in Proc. AAAI Conf. on Artificial Intelligence, vol. 34, 2020, pp. 13088–13095.

12. M. Tjoa and C. Guan, “A Survey on Explainable Artificial Intelligence (XAI): Towards Medical XAI,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 11, pp. 4793–4813, 2021.

13. A. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “SLIC Super pixels Compared to State-of-the-art Super pixel Methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274–2282, 2012.

14. Y. Wang, X. Xu, J. Yan, and H. Zha, “AlignSeg: Feature- Aligned Segmentation Networks,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8901–8910.

15. C., Li, Q., Li, C., Yu, J., Liu, Y., Wang, G., ... & Sun, L. (2025). A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. International Journal of Machine Learning and Cybernetics, 16(12), 9851-9915.

16. L.-C. Chen et al., "Deep Lab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs," IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018.

17. K. He et al., "Mask R-CNN," in Proc. IEEE ICCV, 2017, pp. 2980–2988.

18. M. Tajbakhsh et al., "Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?,"IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1299–1312, 2016.

19. M. Caron et al., "Deep clustering for unsupervised learning of visual features," in Proc. ECCV, 2018.

20. X. Ji, J. F. Henriques, and A. Vedaldi, "Invariant Information Clustering for Unsupervised Image Classification and Segmentation," in Proc. ICCV, 2019.

21. T. Chen et al., "A Simple Framework for Contrastive Learning of Visual Representations," in Proc. ICML, 2020.

22. K. He et al., "Momentum Contrast for Unsupervised Visual Representation Learning," in Proc. CVPR, 2020.

23. J.-B. Grill et al., "Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning," in Proc. NeurIPS, 2020.

24. R. R. Selvaraju et al., "Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization," in Proc. ICCV, 2017, pp. 618–626.eature-Aligned Segmentation Networks," in Proc. CVPR, 2021, pp. 8901–8910.

25. J. Ahn and S. Kwak, "Learning Pixel-Level Semantic Affinity with Image-Level Supervision for Weakly Supervised Semantic Segmentation," in Proc. CVPR, 2018.

26. Y. Wang et al., "AlignSeg: Feature-Aligned Segmentation Networks," in Proc. CVPR, 2021, pp. 8901–8910.

27. Li, J., Chen, J., Tang, Y., Wang, C., Landman, B. A., & Zhou, S. K. (2023). Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives. Medical image analysis, 85, 102762.

28. D. Borgli et al., “HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy,” Scientific Data, vol. 7, no. 283, 2020, doi: 10.1038/s41597- 020-00622-y.

29. M. Thambawita et al., “Extensive experiments with convolutional neural networks for polyp detection in colonoscopy videos,” IEEE Access, vol. 7, pp. CVPR, 2021, pp. 8901–8910.

30. R. R. Selvaraju et al., “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), 2017, pp. 618–626.

31. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010.

32. A. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging Properties in Self- Supervised Vision Transformers,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), 2021, pp. 9650–9660.

33. Park, S. M., & Kim, Y. G. (2022). A metaverse: Taxonomy, components, applications, and open challenges. IEEE access, 10, 4209-4251.

34. A.Djelouah, T. O. Ajanthan, P. Pérez, and M. Paluri, “Unsupervised Object Segmentation by Redrawing,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.

35. N. Codella et al., “Skin lesion analysis toward melanoma detection: A challenge at the 2018 ISIC workshop,” in Proc. IEEE Int. Symp. Biomedical Imaging (ISBI), 2018, pp. 168– 172.

36. P. Tschandl, C. Rosendahl, and H. Kittler, “The HAM10000 dataset: A large collection of multi-source dermatoscopic images of common pigmented skin lesions,” Scientific Data, vol. 5, 180161, 2018.

37. H. Haenssle et al., “Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists,” Annals of Oncology, vol. 29, no. 8, pp. 1836–1842, 2018.

38. X. Chen, H. Fan, R. Girshick, and K. He, “Improved Baselines with Momentum Contrastive Learning,” arXiv preprint arXiv: 2003.04297, 2020.

39. R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “SLIC Superpixels Compared to State-of-the-art Superpixel Methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274–2282, 2012.

40. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.

41. A. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.

42. Z. Liu, X. Pan, C. Luo, Z. Wang, and L. Lin, “Semantic Alignment for Consistency between Image and Text,” in Proc. AAAI Conf. Artificial Intelligence, vol. 34, 2020, pp. 11642– 11649.

43. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” in Proc. Int. Conf. Machine Learning (ICML), 2020, pp. 1597–1607.

44. X. Chen, H. Fan, R. Girshick, and K. He, “Improved Baselines with Momentum Contrastive Learning,” arXiv preprint arXiv: 2003.04297, 2020.

45. Caron, M., Bojanowski, P., Joulin, A., & Douze, M. (2018). Deep Clustering for Unsupervised Learning of Visual Features. In European Conference on Computer Vision (ECCV 2018).

46. Ji, X., Henriques, J. F., & Vedaldi, A. (2018). Invariant Information Clustering for Unsupervised Image Classification and Segmentation. arXiv.

47. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A Simple Framework for Contrastive Learning of Visual Representations (SimCLR). arXiv.

48. Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E.,... & Valko, M. (2020). Bootstrap Your Own Latent (BYOL). arXiv.

Unsupervised Image Segmentation Using Self-Supervised Deep Neural Networks

Authors

DOI:

Keywords:

Abstract

Author Biography

References

Downloads

Published

Issue

Section

How to Cite