Revisiting Arabic Morphology: A Machine-Learning-Based Approach to Stemming and Root Character Permutations with a Publicly Available Open-Source Implementation

Iskander Akhmetov, Basem Ibrahim Malawi Al-Raba’a, Alexander Gelbukh, Rustam Mussabayev, Alexander Krassovitskiy

Abstract


We present a novel method for learning
morphological rules from a corpus for a major language
with very rich and complicated morphology. Namely,
we conducted experiments on Arabic stemming and
root character permutations, focusing on their properties
and semantic relations. First, we built stemmer
and lemmatizer models using an only 0.2713/0.6347
accuracy score for the test/train sets. Second, we
have explored the semantic relationship between word
root character permutation variants. We have found
that recombining the characters in a root may give rise
to antonymy, synonymy, or other semantic relations.

Keywords


Stemming, lemmatization, Arabic, root, permutation

Full Text: PDF