A New Arabic Word Embeddings model for Word Sense Induction

Djaidri Asma, Aliane Hassina, Azzoune Hamid

Abstract


We describe in this paper a new Arabic word embedding model for word sense induction. Word embedding models are gaining a great interest from the NLP research community and Word2vec is undoubtedly the most influ-ential among these models. These models map all the words of the vocabu-lary to a vector space and then provide a semantic description of the words of a corpus as numerical vectors. Nevertheless, a well-known problem of these models is that they cannot handle polysemy. We present a new simple model for Arabic Word embedding which we experiment for the unsuper-vised task of word sense induction. The model is developed using Gensim tools for both SKIP-Gram and CBOW. Then the model allows the building of an indexer based on the cosine similarity using Annoy indexer which is faster than the Gensim similarity function. An Ego-network is used to study the structure of an individual’s relationships and allows to build a graph of related words from the local neighbors. The different senses of the words are generated by clustering the graph. We have worked with two different news corpora: OSAC and Aracorpus. We have experimented the different models of the existing Aravec and our models to word sense induction and we ob-tained promising results. Our model shows good performance of word sense discrimination for a sample of Arabic ambiguous words.

Full Text: PDF

Refbacks

  • There are currently no refbacks.