Comparison of Feature Selection Techniques for Multi-label Text Classification against a New Semantic-based Method

Wael Alkhatib, Steffen Schnitzer, Wei Ding, Peter Jiang, Yassin Alkhalili, Christoph Rensing

Abstract


The under-explored research area of multi-labeltext classification has led to substantial amount of researchin adapting feature selection techniques to handle multi-labeldata directly. A wide range of statistical techniques have beenproposed for weighting and selecting features in order to reducethe high dimensionality of feature space. Those techniques sufferfrom losing semantic regularities of concepts as features andignoring the dependencies and ordering between adjacent words.In this work, we undertake a comparative study across a set ofstatistical and semantic-based techniques for feature selection.Moreover, we propose a novel approach incorporating the textsemantics in feature selection using typed dependencies. Ourintensive experiments, using the EUR-lex dataset, showed thatincorporating text semantics in feature selection can significantlyimprove the performance of multi-label classifiers. Moreover,it drastically decrease the computation costs by reducing thefeature space. The experiments approved that our methodapplied to a combination of typed dependencies outperformedthe state-of-the-art techniques for feature selection in terms ofF1-measure.

Full Text: PDF

Refbacks

  • There are currently no refbacks.