Revisiting Arabic Morphology: A Machine-Learning-Based Approach to Stemming and Root Character Permutations with a Publicly Available Open-Source Implementation
Abstract
We present a novel method for learning
morphological rules from a corpus for a major language
with very rich and complicated morphology. Namely,
we conducted experiments on Arabic stemming and
root character permutations, focusing on their properties
and semantic relations. First, we built stemmer
and lemmatizer models using an only 0.2713/0.6347
accuracy score for the test/train sets. Second, we
have explored the semantic relationship between word
root character permutation variants. We have found
that recombining the characters in a root may give rise
to antonymy, synonymy, or other semantic relations.
morphological rules from a corpus for a major language
with very rich and complicated morphology. Namely,
we conducted experiments on Arabic stemming and
root character permutations, focusing on their properties
and semantic relations. First, we built stemmer
and lemmatizer models using an only 0.2713/0.6347
accuracy score for the test/train sets. Second, we
have explored the semantic relationship between word
root character permutation variants. We have found
that recombining the characters in a root may give rise
to antonymy, synonymy, or other semantic relations.
Keywords
Stemming, lemmatization, Arabic, root, permutation