Comparing Sparse and Dense Information Retrieval Methods on a Wikipedia-Derived NLP Dataset

A. Kairbek, Z. Abildasheva, I. Akhmetov, S. Aubakirov, A. Toleu, A. Krassovitsky, R. Mussabayev, A. Gelbukh

Abstract


This paper presents a comparative evaluation
of sparse and dense Information Retrieval (IR) methods
on a domain-focused dataset of 500 NLP-related
Wikipedia articles. Four approaches—TF–IDF, BM25,
MiniLM, and Dense Passage Retrieval (DPR)—were
assessed using Precision@K, nDCG@K, and Hit
Rate@K against ground truths from the Wikipedia
API and Google results. Results show that BM25
consistently delivers the highest precision and ranking
stability across both sources, while TF–IDF remains
competitive at larger cut-offs, often surpassing DPR
in recall. Dense methods, especially MiniLM,
improve recall at higher ranks by capturing semantic
relationships, though they lag behind sparse methods in
top-rank precision. The divergence between Wikipedia-
and Google-based evaluations highlights the importance
of multi-perspective benchmarking. Findings confirm
the complementary strengths of sparse and dense
paradigms, suggesting that hybrid pipelines—combining
BM25 with dense re-ranking—provide the most effective
balance between efficiency, precision, and semantic
coverage for domain-specific retrieval.

Keywords


Information retrieval

Full Text: PDF