Generating Ontology from a Set of Texts Belonging to a Certain Field of Knowledge
Abstract
The automatic generation of ontologies from textual data is a crucial tool for organizing domain-specific knowledge, particularly in fields like natural language processing (NLP). This research explores methods for extracting, classifying, and structuring terms from scientific texts to create coherent ontologies. We evaluated techniques such as Term Frequency-Inverse Document Frequency (TFIDF) and TextRank for term extraction, as well as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging for classification. Hierarchical relationships between terms are established using clustering methods like Agglomerative Clustering and visualized through dendograms. The generated ontology is validated using cosine similarity, co-occurrence matrices, and topic modeling to ensure domain relevance and coherence. By comparing these methods, this study highlights their strengths and limitations, offering insights into how automated techniques can enhance ontology creation in specialized domains, facilitating better knowledge organization, retrieval, and machine understanding of unstructured data.
Keywords
Ontology; Natural Language Processing; TFIDF; TextRank; POS tagging; NER