Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning

M.Z. Hossain; F. Sohel; M.F. Shiratuddin; H. Laga; M. Bennamoun

doi:10.1109/DICTA47822.2019.8946003

Back

Conference paper

Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning

M.Z. Hossain, F. Sohel, M.F. Shiratuddin, H. Laga and M. Bennamoun

2019 Digital Image Computing: Techniques and Applications (DICTA)

Digital Image Computing: Techniques and Applications (DICTA) 2019 (Hyatt Regency Perth, Australia, 02/12/2019–04/12/2019)

2019

DOI: https://doi.org/10.1109/DICTA47822.2019.8946003

Files and links (1)

url

Link to Published Version *Subscription may be requiredView

Abstract

In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods.

Details

Title: Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning
Authors/Creators: M.Z. Hossain (Author/Creator)
F. Sohel (Author/Creator) - Murdoch University
M.F. Shiratuddin (Author/Creator) - Murdoch University
H. Laga (Author/Creator) - Murdoch University
M. Bennamoun (Author/Creator) - The University of Western Australia
Publication Details: 2019 Digital Image Computing: Techniques and Applications (DICTA)
Conference: Digital Image Computing: Techniques and Applications (DICTA) 2019 (Hyatt Regency Perth, Australia, 02/12/2019–04/12/2019)
Identifiers: 991005544417007891
Murdoch Affiliation: College of Science, Health, Engineering and Education
Language: English
Resource Type: Conference paper

Metrics

62 Record Views