Generation of Feature Vectors for Identifying Medical Entities in Spanish

Gabriela A. García-Robledo, Alma Delia Cuevas-Rasgado, Maricela Bravo, José A. Reyes-Ortiz

Abstract


Natural Language Processing (NLP) encompasses a range of high impact techniques for enabling computers to interact with humans in a more natural manner. One such technique is the extraction of entities, which allows computers to identify relevant information within a text. This paper presents a methodology for the recognition of medical entities within a texts written in Spanish. The methodology combines syntactic, semantic, and contextual features at the word level. The principal objective of a feature-based approach is the identification of drug, anatomy, and disease entities. A training evaluation was conducted on two types of machine learning algorithms, with an accuracy of 98\% on an external set. Additionally, an accuracy check was performed for each medical class.

Keywords


Information Extraction; Named Entity Recognition; Natural Language Processing

Full Text: PDF