Sound event detection using multiple optimized kernels

X. Xia; R. Togneri; F. Sohel; Y. Zhao; D. Huang

doi:10.1109/TASLP.2020.2998298

Back

Sound event detection using multiple optimized kernels

Journal article

Peer reviewed

Sound event detection using multiple optimized kernels

X. Xia, R. Togneri, F. Sohel, Y. Zhao and D. Huang

IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol.28, pp.1745-1754

2020

DOI: https://doi.org/10.1109/TASLP.2020.2998298

Files and links (1)

url

Link to Published Version *Subscription may be requiredView

Abstract

Sound event detection (SED) has been widely applied in real world applications. Convolutional recurrent neural network based SED approaches have achieved state-of-the-art performance. However, the convolution process is typically performed by using a fixed sized kernel, which adversely affects the detection accuracy especially when the acoustic features of different event classes are characterized by high variations. To deal with this, this article proposes a sound event detection technique using a convolutional recurrent neural network framework with multiple convolutional kernels of different sizes. The top performing kernels are selected from a kernel pool based on the unsupervised clustering errors and the accuracies of the temporarily trained models. Afterwards, the selected kernels are fed to multiple convolution layers to deal with the acoustic feature variations. Experimental results on different subsets of AudioSet, namely the DCASE Challenge 2017 Task 4 and DCASE Challenge 2018 Task 4, demonstrate the performance of the proposed approach compared to state-of-the-art systems.

Details

Title: Sound event detection using multiple optimized kernels
Authors/Creators: X. Xia (Author/Creator) - The University of Western Australia
R. Togneri (Author/Creator) - The University of Western Australia
F. Sohel (Author/Creator) - Murdoch University
Y. Zhao (Author/Creator) - The University of Western Australia
D. Huang (Author/Creator) - The University of Western Australia
Publication Details: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol.28, pp.1745-1754
Publisher: IEEE
Identifiers: 991005545117607891
Murdoch Affiliation: Information Technology, Mathematics and Statistics
Language: English
Resource Type: Journal article

Metrics

42 Record Views

6 Times Cited - Web of Science

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Collaboration types: Domestic collaboration
Citation topics: 4 Electrical Engineering, Electronics & Computer Science; 4.174 Digital Signal Processing; 4.174.152 Speech Recognition
Web Of Science research areas: Acoustics; Engineering, Electrical & Electronic
ESI research areas: Engineering