Generating Ontology from a Set of Texts Belonging to a Certain Field of Knowledge

doi:10.13053/cys-29-4-5815

Generating Ontology from a Set of Texts Belonging to a Certain Field of Knowledge

Iskander Akhmetov, Shakarim Aubakirov, Timur Saparov, Rustam Mussabayev, Alexander Krassovitsky, Alexander Gelbukh

Abstract

The automatic generation of ontologies from textual data is a crucial tool for organizing domain-specific knowledge, particularly in fields like natural language processing (NLP). This research explores methods for extracting, classifying, and structuring terms from scientific texts to create coherent ontologies. We evaluated techniques such as Term Frequency-Inverse Document Frequency (TFIDF) and TextRank for term extraction, as well as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging for classification. Hierarchical relationships between terms are established using clustering methods like Agglomerative Clustering and visualized through dendograms. The generated ontology is validated using cosine similarity, co-occurrence matrices, and topic modeling to ensure domain relevance and coherence. By comparing these methods, this study highlights their strengths and limitations, offering insights into how automated techniques can enhance ontology creation in specialized domains, facilitating better knowledge organization, retrieval, and machine understanding of unstructured data.

Keywords

Ontology; Natural Language Processing; TFIDF; TextRank; POS tagging; NER

Full Text: PDF

Username
Password
Remember me