Logo image
Lightweight Voice Biometrics Authentication for Telecom
Conference proceeding

Lightweight Voice Biometrics Authentication for Telecom

Thesavin Kanagar, Kok Wai Wong and Anupiya Nugaliyadde
2025 9th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)
9th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI) 2025 (Colombo, Sri Lanka, 19/11/2025–21/11/2025)
2025

Abstract

Accuracy Adaptation models Biological system modeling Computational modeling Error analysis GMMUBM lightweight models Log-Mel spectrogram Low latency communication Mel frequency cepstral coefficient MFCC Real-time systems Siamese networks speaker verification Spectrogram telecom authentication Telecommunications Voice biometrics
This paper presents a comparative analysis of lightweight voice biometric authentication methods designed for real-time deployment in telecommunication environments. The study evaluates two distinct approaches: a traditional MelFrequency Cepstral Coefficients (MFCC) combined with Gaussian Mixture Model-Universal Background Model (GMM-UBM), and a Double-Branch Siamese Neural Network (DB-SNN) trained on log-Mel spectrograms. Both models were assessed using the VoxCeleb1 dataset, resampled to 8 kHz to reflect typical telecom audio conditions, and tested across utterance durations ranging from 4 to 7 seconds. Experimental results show that the GMM-UBM model achieved strong efficiency, with an average inference time of 10 ms and a compact model size of 8 KB, demonstrating stable performance on short utterances. Conversely, the DB-SNN achieved higher accuracy (78.53%) and a lower Equal Error Rate (EER) of 21.47% on longer inputs; however, it required substantially more computational resources, including an 8 MB model size and inference times of up to 26 ms. The findings reveal a clear trade-off between speed and accuracy in constrained environments. While GMM-UBM remains preferable for latency-critical telecom systems, the Siamese approach offers superior verification strength when resources permit. The paper concludes by recommending future work on optimizing deep learning models through refined loss functions, adaptive architecture, and enhanced noise robustness for real-world telecom applications.

Details

Metrics

1 Record Views
Logo image