Abstract
In this paper we introduce a novel multimodal biometric recognition system based on generalized sparse representations. In the recently proposed systems, with heterogeneous features (such as audio and video), the joint sparse optimization problem was addressed by bringing the features into the same dynamic range, such as normalizing the features into unitl 2 norm. This is however not optimal, and such normalization may decrease the performance of that modality unimodally. We propose to solve the original joint sparse optimization problem by introducing scaling factors for different modalities, such that the modalities interact efficiently at the feature level. The sequence-dependent scaling factors are automatically calculated so that the mismatch between the sparse representations of different modalities is accounted for. In the case of audiovisual recognition system, our experiments on the challenging MOBIO database show that the proposed method outperforms the original joint sparsity-based system (96.8% vs 94.3% recognition rate).