Using Text Embeddings and Graph Neural Networks for Personal Facts Classification
Abstract
In this study, we propose a framework for classifying personal facts in dialogue systems, utilizing a combination of lightweight text embeddings and Graph Neural Networks (GNNs). Due to the lack of existing labeled datasets, we annotated personal facts from the Multi-Session Chat (MSC) dataset using a large language model and verified these annotations manually. We categorize personal facts into Characteristics, Experiences, Routines or Habits, Goals or Plans, and Relationships. Our hypothesis
is that semantically similar facts tend to share labels, enhancing classification accuracy. To test this, we construct a graph where nodes represent facts and edges reflect semantic similarity. Experimental results
demonstrate that integrating GNNs with lightweight encoders consistently yields higher F1-scores than using encoders alone and rivals significantly larger models, indicating its efficacy in resource-limited environments.
An ablation study further examines the roles of edge weighting and feature extraction in boosting classification performance. This work not only advances personal fact classification but also lays the groundwork for elevating the personalization of conversational agents.
is that semantically similar facts tend to share labels, enhancing classification accuracy. To test this, we construct a graph where nodes represent facts and edges reflect semantic similarity. Experimental results
demonstrate that integrating GNNs with lightweight encoders consistently yields higher F1-scores than using encoders alone and rivals significantly larger models, indicating its efficacy in resource-limited environments.
An ablation study further examines the roles of edge weighting and feature extraction in boosting classification performance. This work not only advances personal fact classification but also lays the groundwork for elevating the personalization of conversational agents.
Keywords
Personal facts classification, GNN, encoder models, text embeddings