Identification of Opinion Spammers using Reviewer Reputation and Clustering Analysis

  • Minjuan Zhong
  • Liang Tan School of Information Management Jiangxi University of Finance and Economics Nanchang 33013, China
  • Xilong Qu School of Information Technology Hunan University of Finance and Economics Changsha 410205, China

Abstract

Online reviews have increasingly become a very important resource before making a purchasing decisions. Unfortunately, malicious sellers try to game the system by hiring a person or team (which is called spammers) to fabricate fake reviews to improve their reputation.Existing methods mainly take the problem as a general binary classification or focus on some heuristic rules. However, supervised learning methods relies heavily on a large number of labeled examples of deceptive and truthful opinions by domain experts, and most of features mentioned in the heuristic strategy ignore the characteristic of the group organization among spammers. In this paper, an effective method of identifying opinion spammers is proposed. Firstly, suspected spammers are detected by means of unsupervised learning based on reviewer’s reputation. We believe that the reviewer’s reputation has a direct relation with the quality of reviews. Generally, review written by user with lower reputation, shows lower quality and higher possibility to be fake. Therefore, the model assigns reputation score to each reviewer wherein the content based factors and activeness of reviewers are employed efficiently. On basis of all suspected spammers, k-center clustering algorithm is performed to further spot the spammers based on the observation of burst of review release time. Experimental results on Amazon’s dataset are encouraging and indicate that our approach poses high accuracy and recall, and good performance is achieved.

References

[1] Banerjee, S.; Chua, A.; Kim, J.(2015). Using Supervised Learning to Classify Authentic and Fake Online Reviews, Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication, 938-942, 2015.
https://doi.org/10.1145/2701126.2701130

[2] Crawford, M.; Khoshgoftaar, T.M.; Prusa, J.D. et al.(2015). Survey of Review Spam Detection using Machine Learning Technique, Journal of Big Data, 2(1), 1-24, 2015.
https://doi.org/10.1186/s40537-015-0029-9

[3] Dewang, R.K.; Singh, A. K.(2015). Identification of Fake Reviews using New Set of Lexical and Syntactic Features, Proceedings of the sixth International Conference on Computer and Communication Technology, 115-119, 2015.

[4] Dong, M.; Yao, L.; Wang, X.(2018). Opinion Fraud Detection via Neural Autoencoder Decision Forest, Pattern Recognition Letters, 1-9, 2018.
https://doi.org/10.1016/j.patrec.2018.07.013

[5] Heydari, A.; Tavakoli, M.; Salim, N.(2016). Detection of Fake Opinions using Time Series, Expert Systems with Application, 58, 83-92, 2016.
https://doi.org/10.1016/j.eswa.2016.03.020

[6] Heydari, A.; Tavakoli, M.; Salim, N. et al. (2015). Detection of Review Spam: A Survey, Expert Systems with Applications, 42 (7), 3634-3642, 2015.
https://doi.org/10.1016/j.eswa.2014.12.029

[7] Hua, N.; Boseb, I.; Koh, N. et al.(2012). Manipulation of Online Reviews: An Analysis of Ratings, Readability, and Sentitnents, Decision Support System, 52(3), 674-684, 2012.
https://doi.org/10.1016/j.dss.2011.11.002

[8] Jindal, N.; Liu, B. (2008). Opinion Spam and Analysis, Proceedings of the First ACM International Conference on Web Search and Data Mining (WSDM), 219-229, 2008.
https://doi.org/10.1145/1341531.1341560

[9] Lau, R.Y.K.; Liao, S.Y.; Chi-Wai Kwok, R.; Xu, C. et al.(2014). Text Mining and Probabilistic Language Modeling for Online Review Spam Detection, ACM Transactions on Management Information Systems, 2(4), 1-30, 2011.
https://doi.org/10.1145/2070710.2070716

[10] Li, J.; Wu, G.S.; Xie, F. et al.(2016). Research of Fraud Review Detection Model on O2O Platform, Journal of ACTA Electronica Sinica, 44(12), 2855-2860, 2016.

[11] Lim, E.; Nguyen, V.; Jindal, N. et al.(2010). Detecting Product Review Spammers using Rating Behaviors, Proceedings of the 19th ACM International Conference on Information and Knowledge Management(CIKM), 939-948, 2010.
https://doi.org/10.1145/1871437.1871557

[12] Lin, Y.; Zhu, T.; Wang, X. et al.(2014). Towards Online Review Spam Detection, Proceedings of the companion publication of the 23rd International Conference on World Wide Web Companion, 341-342, 2014.
https://doi.org/10.1145/2567948.2577293

[13] Liu, Y.; Pang, B.(2018). A Unified Framework for Detecting Author Spamicity by Modeling Review Deviation, Expert Systems With Applications, 112, 148-155, 2018.
https://doi.org/10.1016/j.eswa.2018.06.028

[14] Luca, M.; Zervas, G. (2016). Fake it Till You Make It: Reputation, Competition, and Yelp Review Fraud, Harvard Business School Working Paper, 62, 3412-3427, 2016.
https://doi.org/10.1287/mnsc.2015.2304

[15] Mukherjee, A.; Liu, B.; Wang, J. et al.(2011). Detecting Group Review Spam, Proceedings of the 20th International World Wide Web Conference (WWW), 93-94, 2011.
https://doi.org/10.1145/1963192.1963240

[16] Ren, Y.; Ji, D.(2017). Neural Networks for Deceptive Opinion Spam Detection: An Empirical Study, Information Sciences, 385-386, 213-224, 2017.
https://doi.org/10.1016/j.ins.2017.01.015

[17] Savage, D.; Zhang, X.; Yu, X. et al.(2015). Detection of Opinion Spam based on Anomalous Rating Deviation, Expert Systems with Applications, 42(22), 8650-8657, 2015.
https://doi.org/10.1016/j.eswa.2015.07.019

[18] Vlad, S.; Martin, E.(2015). Detecting Singleton Review Spammers using Semantic Similarity, Proceedings of 24th International Conference on World Wide Web Companion, 971-976, 2015.

[19] Zhang, W.; Bu, C.; Taketoshi, Y. et al.(2016). Cospa: A Co-training Approach for Spam Review Identification with Support Vector Machine, Information, 7(12), 1-15, 2016.
https://doi.org/10.3390/info7010012

[20] Zhang, D.(2017). High Speed Train Control System Big Data Analysis based on Fuzzy RDF Model and Uncertain Reasoning, International Journal of Computers Communications & Control, 12(4), 577-591, 2017.
https://doi.org/10.15837/ijccc.2017.4.2914

[21] Zhang, D.; Sui, J.; Gong, Y. (2017). Large Scales Software Test Data Generation based on Collective Constraint and Weighted Combination Method, Tehnicki Vjesnik, 24(4), 1041- 1050, 2017.
https://doi.org/10.17559/TV-20170319045945
Published
2020-02-02
How to Cite
ZHONG, Minjuan; TAN, Liang; QU, Xilong. Identification of Opinion Spammers using Reviewer Reputation and Clustering Analysis. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, [S.l.], v. 14, n. 6, p. 759-772, feb. 2020. ISSN 1841-9844. Available at: <http://univagora.ro/jour/index.php/ijccc/article/view/3704>. Date accessed: 06 aug. 2020. doi: https://doi.org/10.15837/ijccc.2019.6.3704.

Keywords

opinion spammer, fake review, reviewer reputation, clustering analysis