Output list
Conference paper
Reinforced Learning for Label-Efficient 3D Face Reconstruction
Published 2023
2023 IEEE International Conference on Robotics and Automation (ICRA), 2023-, 6028 - 6034
2023 IEEE International Conference on Robotics and Automation (ICRA), 29/05/2023–02/06/2023, London, United Kingdom
3D face reconstruction plays a major role in many human-robot interaction systems, from automatic face authentication to human-computer interface-based entertainment. To improve robustness against occlusions and noise, 3D face reconstruction networks are often trained on a set of in-the-wild face images preferably captured along different viewpoints of the subject. However, collecting the required large amounts of 3D annotated face data is expensive and time-consuming. To address the high annotation cost and due to the importance of training on a useful set, we propose an Active Learning (AL) framework that actively selects the most informative and representative samples to be labeled. To the best of our knowledge, this paper is the first work on tackling active learning for 3D face reconstruction to enable a label-efficient training strategy. In particular, we propose a Reinforcement Active Learning approach in conjunction with a clustering-based pooling strategy to select informative view-points of the subjects. Experimental results on 300W-LP and AFLW2000 datasets demonstrate that our proposed method is able to 1) efficiently select the most influencing view-points for labeling and outperforms several baseline AL techniques and 2) further improve the performance of a 3D Face Reconstruction network trained on the full dataset.
Conference paper
Date presented 2022
36th Conference on Neural Information Processing Systems (NeurIPS 2022), 22/11/2022–09/12/2022, New Orleans
In stereo vision, self-similar or bland regions can make it difficult to match patches between two images. Active stereo-based methods mitigate this problem by projecting a pseudo-random pattern on the scene so that each patch of an image pair can be identified without ambiguity. However, the projected pattern significantly alters the appearance of the image. If this pattern acts as a form of adversarial noise, it could negatively impact the performance of deep learning-based methods, which are now the de-facto standard for dense stereo vision. In this paper, we propose the Active-Passive SimStereo dataset and a corresponding benchmark to evaluate the performance gap between passive and active stereo images for stereo matching algorithms. Using the proposed benchmark and an additional ablation study, we show that the feature extraction and matching modules of a selection of twenty selected deep learning-based stereo matching methods generalize to active stereo without a problem. However, the disparity refinement modules of three of the twenty architectures (ACVNet, CascadeStereo, and StereoNet) are negatively affected by the active stereo patterns due to their reliance on the appearance of the input images.
Conference paper
Advances in geometrical analysis of Topologically-varying shapes
Published 2020
2020 IEEE 17th International Symposium on Biomedical Imaging Workshops (ISBI Workshops)
2020 IEEE 17th International Symposium on Biomedical Imaging Workshops (ISBI Workshops), 04/04/2020, Iowa City, IA, USA
Statistical shape analysis using geometrical approaches provides comprehensive tools – such as geodesic deformations, shape averages, and principal modes of variability – all in the original object space. While geometrical methods have been limited to objects with fixed topologies (e.g. functions, closed curves, surfaces of genus zero, etc) in the past, this paper summarizes recent progress where geometrical approaches are beginning to handle topologically different objects – trees, graphs, etc – that exhibit arbitrary branching and connectivity patterns. The key idea is to “divide-and-conquer”, i.e. break complex objects into simpler parts and help register these parts across objects. Such matching and quantification require invariant metrics from Riemannian geometry and provide foundational tools for statistical shape analysis.
Conference paper
Game design principles influencing stroke survivor engagement for VR-Based upper limb rehabilitation
Published 2020
Proceedings of the 31st Australian Conference on Human-Computer-Interaction
31st Australian Conference on Human-Computer-Interaction (OzCHI) 2019, 02/12/2019–05/12/2019, Esplanade Hotel, Fremantle, Australia
Engagement with one's rehabilitation is crucial for stroke survivors. Serious games utilising desktop Virtual Reality could be used in rehabilitation to increase stroke survivors' engagement. This paper discusses the results of a user experience case study that was conducted with six stroke survivors to determine which game design principles are or would be important for engaging them with a desktop VR serious games designed for the upper limb rehabilitation. The results of our study showed the game design principles that warrant further investigation are awareness, feedback, interactivity, flow and challenge; and also important to a great extent are attention, involvement, motivation, effort, clear instructions, usability, interest, psychological absorption, purpose and a first-person view.
Conference paper
Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning
Published 2019
2019 Digital Image Computing: Techniques and Applications (DICTA)
Digital Image Computing: Techniques and Applications (DICTA) 2019, 02/12/2019–04/12/2019, Hyatt Regency Perth, Australia
In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods.
Conference paper
Improving follicular lymphoma identification using the class of interest for transfer learning
Published 2019
2019 Digital Image Computing: Techniques and Applications (DICTA)
Digital Image Computing: Techniques and Applications (DICTA) 2019, 02/12/2019–04/12/2019, Hyatt Regency Perth, Australia
Follicular Lymphoma (FL) is a type of lymphoma that grows silently and is usually diagnosed in its later stages. To increase the patients' survival rates, FL requires a fast diagnosis. While, traditionally, the diagnosis is performed by visual inspection of Whole Slide Images (WSI), recent advances in deep learning techniques provide an opportunity to automate this process. The main challenge, however, is that WSI images often exhibit large variations across different operating environments, hereinafter referred to as sites. As such, deep learning models usually require retraining using labeled data from each new site. This is, however, not feasible since the labelling process requires pathologists to visually inspect and label each sample. In this paper, we propose a deep learning model that uses transfer learning with fine-tuning to improve the identification of Follicular Lymphoma on images from new sites that are different from those used during training. Our results show that the proposed approach improves the prediction accuracy with 12% to 52% compared to the initial prediction of the model for images from a new site in the target environment.
Conference paper
Attention-Based image captioning using DenseNet features
Published 2019
Neural Information Processing, 1143
26th International Conference, ICONIP 2019, 12/12/2019–15/12/2019, Sydney, NSW
We present an attention-based image captioning method using DenseNet features. Conventional image captioning methods depend on visual information of the whole scene to generate image captions. Such a mechanism often fails to get the information of salient objects and cannot generate semantically correct captions. We consider an attention mechanism that can focus on relevant parts of the image to generate fine-grained description of that image. We use image features from DenseNet. We conduct our experiments on the MSCOCO dataset. Our proposed method achieved 53.6, 39.8, and 29.5 on BLEU-2, 3, and 4 metrics, respectively, which are superior to the state-of-the-art methods.
Conference paper
Elastic 3D shape analysis using Square-Root normal field representation
Published 2017
IEEE 56th Annual Conference on Decision and Control (CDC) 2017, 12/12/2017–15/12/2017, Melbourne, VIC
Shape is an important physical property of natural and man-made 3D objects that characterizes their external appearances. Understanding differences between shapes, and modeling the variability within and across shape classes, hereinafter referred to as shape analysis, are problems fundamental to many applications, ranging from computer vision and computer graphics to biology and medicine. This paper provides an overview of some of the recent techniques for studying the shape of 3D objects that undergo non-rigid deformations including bending and stretching. We will mainly focus on a new representation called the square-root normal field (SRNF), discuss its properties, and show its application in the analysis of the shape of various types of objects, including human body shapes, anatomical organs such as carpal bones, and hand-drawn 2D sketches. We will show how the representation is used for (1) jointly computing correspondences and geodesics; (2) computing summary statistics such as means and modes of variations; and (3) exploring shape variability in a collection of 3D objects.
Conference paper
SHREC'16 Track: 3D Sketch-Based 3D Shape Retrieval
Published 2016
Eurographics Workshop on 3D Object Retrieval (3DOR) 2016, 07/05/2016–08/05/2016, Lisbon, Portugal
Sketch-based 3D shape retrieval has unique representation availability of the queries and vast applications. Therefore, it has received more and more attentions in the research community of content-based 3D object retrieval. However, sketch-based 3D shape retrieval is a challenging research topic due to the semantic gap existing between the inaccurate representation of sketches and accurate representation of 3D models. In order to enrich and advance the study of sketch-based 3D shape retrieval, we initialize the research on 3D sketch-based 3D model retrieval and collect a 3D sketch dataset based on a developed 3D sketching interface which facilitates us to draw 3D sketches in the air while standing in front of a Microsoft Kinect. The objective of this track is to evaluate the performance of different 3D sketch-based 3D model retrieval algorithms using the hand-drawn 3D sketch query dataset and a generic 3D model target dataset. The benchmark contains 300 sketches that are evenly divided into 30 classes, as well as 1 258 3D models that are classified into 90 classes. In this track, nine runs have been submitted by five groups and their retrieval performance has been evaluated using seven commonly used retrieval performance metrics. We wish this benchmark, the comparative evaluation results and the corresponding evaluation code will further promote sketch-based 3D shape retrieval and its applications.
Conference paper
Published 2016
2016 IEEE Virtual Reality (VR)
2016 IEEE Virtual Reality (VR) Conference, 19/03/2016–23/03/2016, Greenville, SC
Co-located haptic feedback in mixed and augmented reality environments can improve realism and user performance, but it also requires careful system design and calibration. In this poster, we determine the thresholds for perceiving co-location errors through two psychophysics experiments in a typical fine-motor manipulation task. In these experiments we simulate the two fundamental ways of implementing VHAR systems: first, attaching a real tool; second, augmenting a virtual tool. We determined the just-noticeable co-location errors for position and orientation in both experiments and found that users are significantly more sensitive to co-location errors with virtual tools. Our overall findings are useful for designing visuo-haptic augmented reality workspaces and calibration procedures.