Spotting Fake Reviews using Positive-Unlabeled Learning

Huayi Li, Bing Liu, Arjun Mukherjee, Jidong Shao


Fake review detection has been studied byresearchers for several years. However, so far all reportedstudies are based on English reviews. This paperreports a study of detecting fake reviews in Chinese. Ourreview dataset is from the Chinese review hosting site Dianping1,which has built a fake review detection system.They are confident that their algorithm has a very highprecision, but they don’t know the recall. This meansthat all fake reviews detected by the system are almostcertainly fake but the remaining reviews may not be allgenuine. This paper first reports a supervised learningstudy of two classes, fake and unknown. However, sincethe unknown set may contain many fake reviews, it ismore appropriate to treat it as an unlabeled set. Thiscalls for the model of learning from positive and unlabeledexamples (or PU-learning). Experimental resultsshow that PU learning not only outperforms supervisedlearning significantly, but also detects a large number ofpotentially fake reviews hidden in the unlabeled set that Dianping fails to detect.


Fake reviews, Positive-Unlabeled learning, PU-learning.

Full Text: PDF