Journal article
Deep Boltzmann machines for i-Vector based audio-visual person identification
Lecture Notes in Computer Science, Vol.9431, pp.631-641
2015
Abstract
We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBMspeech and DBMface is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNNspeech and DBM-DNNface in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset.
Details
- Title
- Deep Boltzmann machines for i-Vector based audio-visual person identification
- Authors/Creators
- M. Alam (Author/Creator)M. Bennamoun (Author/Creator)R. Togneri (Author/Creator)F. Sohel (Author/Creator)
- Publication Details
- Lecture Notes in Computer Science, Vol.9431, pp.631-641
- Publisher
- Springer Verlag
- Number of pages
- 11
- Identifiers
- 991005541534007891
- Copyright
- 2016 Springer International Publishing Switzerland
- Murdoch Affiliation
- School of Engineering and Information Technology
- Language
- English
- Resource Type
- Journal article
- Additional Information
- Book Title: Image and Video Technology: 7th Pacific Rim Symposium on Image and Video Technology (PSIVT) 2015 Auckland, New Zealand 23 - 27 November 2015 Revised Selected Papers
Metrics
51 Record Views