Minimalist Machine Learning with Metaheuristic Optimization for Explainable Spam Filtering

Jorge Alberto Pacheco-Senard, Mailyn Moreno-Espino, Cornelio Yáñez-Márquez, Yenny Villuendas-Rey, Oscar Camacho-Nieto

Abstract


Spam detection remains a challenging task in text classification due to the dynamic nature of unsolicited messages and the lack of transparency in conventional machine learning models. This paper proposes a family of lightweight and interpretable classifiers based on the Minimalist Machine Learning (MML) paradigm integrated with metaheuristic optimization techniques. Three variants MML + Random Search, MML + Hill Climb, and MML + Simulated Annealing were implemented and evaluated on the SMS Spam Corpus v.0.1 using a hybrid lexical–semantic representation that combines BM25 and Word2Vec~embeddings. Each model was designed to select the most discriminative lexical–semantic features from the feature matrix, optimizing class separability through an objective function based on the Intra-Class Correlation Coefficient (ICC). Experimental results under Leave-One-Out Cross-Validation (LOOCV) demonstrate that the MML + Simulated Annealing variant achieved the best overall performance (Balanced Accuracy = 0.9327, F1-score = 0.9014, MCC = 0.8700), yielding results statistically comparable to a linear SVM baseline according to the Wilcoxon paired test. These findings highlight that metaheuristic-enhanced MML models can achieve competitive performance while maintaining full interpretability. Future work will extend these models to sentiment analysis, AI-generated text detection, and hybrid transformer–MML architectures to combine transparency with deep semantic understanding. Given the increasing demand for transparent and responsible AI in communication systems, this study contributes to the development of interpretable and lightweight spam filtering mechanisms.

Keywords


Spam detection, minimalist machine learning, metaheuristic optimization, explainable AI, text classification

Full Text: PDF