SIMTEX: An Approach for Detecting and Measuring Textual Similarity based on Discourse and Semantics

Iria da Cunha, Jorge Vivaldi, Juan Manuel Torres-Moreno, Gerardo Sierra


Nowadays automatic systems for detectingand measuring textual similarity are being developed,in order to apply them to different tasks in the field ofNatural Language Processing (NLP). Currently, thesesystems use surface linguistic features or statistical information.Nowadays, few researchers use deep linguisticinformation. In this work, we present an algorithm fordetecting and measuring textual similarity that takes intoaccount information offered by discourse relations ofRhetorical Structure Theory (RST), and lexical-semanticrelations included in EuroWordNet. We apply the algorithm,called SIMTEX, to texts written in Spanish, but themethodology is potentially language-independent


Textual similarity, discourse, semantics, paraphrase.

Full Text: PDF