Output list
Conference paper
A Large Scale Multi-View RGBD Visual Affordance Learning Dataset
Published 2023
2023 IEEE International Conference on Image Processing (ICIP), 08/10/2023–11/10/2023, Kuala Lumpur, Malaysia
The physical and textural attributes of objects have been widely studied for recognition, detection and segmentation tasks in computer vision. A number of datasets, such as large scale ImageNet, have been proposed for feature learning using data hungry deep neural networks and for hand-crafted feature extraction. To intelligently interact with objects, robots and intelligent machines need the ability to infer beyond the traditional physical/textural attributes, and understand/learn visual cues, called visual affordances, for affordance recognition, detection and segmentation. To date there is no publicly available large dataset for visual affordance understanding and learning. In this paper, we introduce a large scale multi-view RGBD visual affordance learning dataset, a benchmark of 47210 RGBD images from 37 object categories, annotated with 15 visual affordance categories. To the best of our knowledge, this is the first ever and the largest multi-view RGBD visual affordance learning dataset. We benchmark the proposed dataset for affordance segmentation and recognition tasks using popular Vision Transformer and Convolutional Neural Networks. Several state-of-the-art deep learning networks are evaluated each for affordance recognition and segmentation tasks. Our experimental results showcase the challenging nature of the dataset and present definite prospects for new and robust affordance learning algorithms. The dataset is publicly available at https://sites.google.com/view/afaqshah/dataset.
Conference paper
Hierarchical Transformer for Visual Affordance Understanding using a Large-scale Dataset
Date presented 2023
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023), 01/10/2023–05/10/2023, Detroit, MI
Recognition, detection, and segmentation tasks in machine vision have focused on studying the physical and textural attributes of objects. However, robots and intelligent machines require the ability to understand visual cues, such as the visual affordances that objects offer, to interact intelligently with novel objects. In this paper, we present a large-scale multi-view RGBD visual affordance learning dataset a benchmark of 47,210 RGBD images from 37 object categories, annotated with 15 visual affordance categories and 35 cluttered/complex scenes. We deploy a Vision Transformer (ViT), called Visual Affordance Transformer (VAT), for the affordance segmentation task. Due to its hierarchical architecture, VAT can learn multiple affordances at various scales, making it suitable for objects of varying sizes. Our experimental results show the superior performance of VAT compared to state-of-the-art deep learning networks. In addition, the challenging nature of the proposed dataset highlights the potential for new and robust affordance learning algorithms. Our dataset is publicly available at https://sites.google.com/view/afaqshah/dataset.