Revisiting Arabic Morphology: A Machine-Learning-Based Approach to Stemming and Root Character Permutations with a Publicly Available Open-Source Implementation

doi:10.13053/cys-30-1-6322

Revisiting Arabic Morphology: A Machine-Learning-Based Approach to Stemming and Root Character Permutations with a Publicly Available Open-Source Implementation

Iskander Akhmetov, Basem Ibrahim Malawi Al-Raba’a, Alexander Gelbukh, Rustam Mussabayev, Alexander Krassovitskiy

Abstract

We present a novel method for learning
morphological rules from a corpus for a major language
with very rich and complicated morphology. Namely,
we conducted experiments on Arabic stemming and
root character permutations, focusing on their properties
and semantic relations. First, we built stemmer
and lemmatizer models using an only 0.2713/0.6347
accuracy score for the test/train sets. Second, we
have explored the semantic relationship between word
root character permutation variants. We have found
that recombining the characters in a root may give rise
to antonymy, synonymy, or other semantic relations.

Keywords

Stemming, lemmatization, Arabic, root, permutation

Full Text: PDF

Username
Password
Remember me