Cross-Domain Failures of Fake News Detection

doi:10.13053/cys-23-3-3281

Cross-Domain Failures of Fake News Detection

Maria Janicka, Maria Pszona, Aleksander Wawer

Abstract

Fake news recognition has become aprominent research topic in natural language processing. Researchers reported significant successes when applying methods based on various stylometric and lexical features and machine learning, with accuracy reaching 90%. This article is focused on answering the question: are the fake news detection models universally applicable or limited to the domain they have been trained on? We used four different, freely available English language Fake News corpora and trained models in both in-domain and cross-domain setting. We also explored and compared features important in eachdomain. We found that the performance in cross-domain setting degrades by 20% and sets of features importantto detect fake texts differ between domains. Our conclusions support the hypothesis that high accuracy of machine learning models applied to fake news detectionmay be related to over-fitting, and models need to betrained and evaluated on mixed types of texts.

Keywords

Cross domain, failures, news detection

Full Text: PDF