Arabic Dialect Identification based on Probabilistic-Phonetic Modeling

Naim Terbeh, Mohsen Maraoui, Mounir Zrigui


The identification of Arabic dialects is considered to be the first pre-processing component for any natural language processing problem. This task is useful for automatic translation, information retrieval, opinion mining and sentiment analysis. In this purpose, we propose a statistical approach based on the phonetic modeling to identify the correspondent Arabic dialect for each input acoustic signal. The main idea consists first, and for each dialect, in calculating a referenced phonetic model. Second, for every input audio signal, we calculate an appropriate phonetic model. Third, we compare this latter to all referenced Arabic dialect models. Finally, we associate the input acoustic signal to the dialect where the referenced phonetic model minimizes the cosine similarity. The obtained results are satisfactory. Indeed, based on 117 audio sequences, we attain a classification rate of 93%. Supporting the achieved results and the coverage of most of Arabic dialects, this study can be a reference for future work addressing dialectical speech processing applications.


Arabic dialects, probabilistic-phonetic model, dialect identification, cosine similarity

Full Text: PDF