A Comparative Study in Machine Learning and Audio Features for Kitchen Sounds Recognition

Alain Manzo-Martínez, Fernando Gaxiola, Graciela Ramírez-Alonso, Fernando Martínez-Reyes


For the last decades the work on audiorecognition has been directed to speech and music,however, an increasing interest for the classificationand recognition of acoustic events is observed for thelast years. This poses the challenge to determine theidentity of sounds, their sources, and the importanceof analysing the context of the scenario where theyact.The aim of this paper is focused on evaluatingthe robustness to retain the characteristic information ofan acoustic event against the background noise usingaudio features in the task of identifying acoustic eventsfrom a mixture of sounds that are produced in a kitchenenvironment.A new database of kitchen sounds wasbuilt by us, since in the reviewed literature there isno similar benchmark that allows us to evaluate thisissue in conditions of 3 decibels for the signal to noiseratio. In our study, we compared two methods of audiofeatures, Multiband Spectral Entropy Signature (MSES)and Mel Frequency Cepstral Coefficients (MFCC). Toevaluate the performance of both MSES and MFCC,we used different classifiers such as Similarity Distance,k-Nearest Neighbors, Support Vector Machines andArtificial Neural Networks (ANN). The results showedthat MSES supported with an ANN outperforms anyother combination of classifiers with MSES or MFCC forgetting a better score.


Entropy, neural networks, mixture of sounds, mfcc

Full Text: PDF