Logo image
Redefining Vision Tasks: The Power of Transformers in Classification, Detection, and Segmentation
Conference proceeding   Peer reviewed

Redefining Vision Tasks: The Power of Transformers in Classification, Detection, and Segmentation

Sushma Hans, Pallavi Ranjan and Salih Ismail
Artificial Intelligence and Speech Technology (AIST 2024), pp.42-53
Communications in Computer and Information Science
6th International Conference on Artificial Intelligence and Speech Technology (AIST2024) (Delhi, India, 13/11/2024–14/11/2024)
2025

Abstract

Classification Detection Review Segmentation Survey Transformers
Deep learning architectures have innovated the field of vision transformers with their attainments. Inspired by such significant accomplishments, a multitude of progressive research has recently been done that employs Transformer-based frameworks in computer vision (CV). These models have proved their efficacy in three fundamental vision tasks: image classification, object detection, and segmentation of different sensory data streams. Visual transformers have demonstrated significant performance across various benchmarks in contrast to state-of-the-art convolutional neural networks. In this survey, we have comprehensively reviewed some newly published works according to three central CV tasks. We have assessed and compared all these prevailing transformers using diverse metrics. Additionally, we discuss the open issues and challenges faced and some unmined aspects to strengthen visual transformer architectures.

Details

Metrics

12 Record Views
Logo image