Identification of Opinion Spammers using Reviewer Reputation and Clustering Analysis

Minjuan Zhong, Liang Tan, Xilong Qu

Abstract


Online reviews have increasingly become a very important resource before making a purchasing decisions. Unfortunately, malicious sellers try to game the system by hiring a person or team (which is called spammers) to fabricate fake reviews to improve their reputation.Existing methods mainly take the problem as a general binary classification or focus on some heuristic rules. However, supervised learning methods relies heavily on a large number of labeled examples of deceptive and truthful opinions by domain experts, and most of features mentioned in the heuristic strategy ignore the characteristic of the group organization among spammers. In this paper, an effective method of identifying opinion spammers is proposed. Firstly, suspected spammers are detected by means of unsupervised learning based on reviewer’s reputation. We believe that the reviewer’s reputation has a direct relation with the quality of reviews. Generally, review written by user with lower reputation, shows lower quality and higher possibility to be fake. Therefore, the model assigns reputation score to each reviewer wherein the content based factors and activeness of reviewers are employed efficiently. On basis of all suspected spammers, k-center clustering algorithm is performed to further spot the spammers based on the observation of burst of review release time. Experimental results on Amazon’s dataset are encouraging and indicate that our approach poses high accuracy and recall, and good performance is achieved.

Keywords


opinion spammer, fake review, reviewer reputation, clustering analysis

Full Text:

PDF

References


Banerjee, S.; Chua, A.; Kim, J.(2015). Using Supervised Learning to Classify Authentic and Fake Online Reviews, Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication, 938-942, 2015.
https://doi.org/10.1145/2701126.2701130

Crawford, M.; Khoshgoftaar, T.M.; Prusa, J.D. et al.(2015). Survey of Review Spam Detection using Machine Learning Technique, Journal of Big Data, 2(1), 1-24, 2015.
https://doi.org/10.1186/s40537-015-0029-9

Dewang, R.K.; Singh, A. K.(2015). Identification of Fake Reviews using New Set of Lexical and Syntactic Features, Proceedings of the sixth International Conference on Computer and Communication Technology, 115-119, 2015.

Dong, M.; Yao, L.; Wang, X.(2018). Opinion Fraud Detection via Neural Autoencoder Decision Forest, Pattern Recognition Letters, 1-9, 2018.
https://doi.org/10.1016/j.patrec.2018.07.013

Heydari, A.; Tavakoli, M.; Salim, N.(2016). Detection of Fake Opinions using Time Series, Expert Systems with Application, 58, 83-92, 2016.
https://doi.org/10.1016/j.eswa.2016.03.020

Heydari, A.; Tavakoli, M.; Salim, N. et al. (2015). Detection of Review Spam: A Survey, Expert Systems with Applications, 42 (7), 3634-3642, 2015.
https://doi.org/10.1016/j.eswa.2014.12.029

Hua, N.; Boseb, I.; Koh, N. et al.(2012). Manipulation of Online Reviews: An Analysis of Ratings, Readability, and Sentitnents, Decision Support System, 52(3), 674-684, 2012.
https://doi.org/10.1016/j.dss.2011.11.002

Jindal, N.; Liu, B. (2008). Opinion Spam and Analysis, Proceedings of the First ACM International Conference on Web Search and Data Mining (WSDM), 219-229, 2008.
https://doi.org/10.1145/1341531.1341560

Lau, R.Y.K.; Liao, S.Y.; Chi-Wai Kwok, R.; Xu, C. et al.(2014). Text Mining and Probabilistic Language Modeling for Online Review Spam Detection, ACM Transactions on Management Information Systems, 2(4), 1-30, 2011.
https://doi.org/10.1145/2070710.2070716

Li, J.; Wu, G.S.; Xie, F. et al.(2016). Research of Fraud Review Detection Model on O2O Platform, Journal of ACTA Electronica Sinica, 44(12), 2855-2860, 2016.

Lim, E.; Nguyen, V.; Jindal, N. et al.(2010). Detecting Product Review Spammers using Rating Behaviors, Proceedings of the 19th ACM International Conference on Information and Knowledge Management(CIKM), 939-948, 2010.
https://doi.org/10.1145/1871437.1871557

Lin, Y.; Zhu, T.; Wang, X. et al.(2014). Towards Online Review Spam Detection, Proceedings of the companion publication of the 23rd International Conference on World Wide Web Companion, 341-342, 2014.
https://doi.org/10.1145/2567948.2577293

Liu, Y.; Pang, B.(2018). A Unified Framework for Detecting Author Spamicity by Modeling Review Deviation, Expert Systems With Applications, 112, 148-155, 2018.
https://doi.org/10.1016/j.eswa.2018.06.028

Luca, M.; Zervas, G. (2016). Fake it Till You Make It: Reputation, Competition, and Yelp Review Fraud, Harvard Business School Working Paper, 62, 3412-3427, 2016.
https://doi.org/10.1287/mnsc.2015.2304

Mukherjee, A.; Liu, B.; Wang, J. et al.(2011). Detecting Group Review Spam, Proceedings of the 20th International World Wide Web Conference (WWW), 93-94, 2011.
https://doi.org/10.1145/1963192.1963240

Ren, Y.; Ji, D.(2017). Neural Networks for Deceptive Opinion Spam Detection: An Empirical Study, Information Sciences, 385-386, 213-224, 2017.
https://doi.org/10.1016/j.ins.2017.01.015

Savage, D.; Zhang, X.; Yu, X. et al.(2015). Detection of Opinion Spam based on Anomalous Rating Deviation, Expert Systems with Applications, 42(22), 8650-8657, 2015.
https://doi.org/10.1016/j.eswa.2015.07.019

Vlad, S.; Martin, E.(2015). Detecting Singleton Review Spammers using Semantic Similarity, Proceedings of 24th International Conference on World Wide Web Companion, 971-976, 2015.

Zhang, W.; Bu, C.; Taketoshi, Y. et al.(2016). Cospa: A Co-training Approach for Spam Review Identification with Support Vector Machine, Information, 7(12), 1-15, 2016.
https://doi.org/10.3390/info7010012

Zhang, D.(2017). High Speed Train Control System Big Data Analysis based on Fuzzy RDF Model and Uncertain Reasoning, International Journal of Computers Communications & Control, 12(4), 577-591, 2017.
https://doi.org/10.15837/ijccc.2017.4.2914

Zhang, D.; Sui, J.; Gong, Y. (2017). Large Scales Software Test Data Generation based on Collective Constraint and Weighted Combination Method, Tehnicki Vjesnik, 24(4), 1041- 1050, 2017.
https://doi.org/10.17559/TV-20170319045945




DOI: https://doi.org/10.15837/ijccc.2019.6.3704



Copyright (c) 2019 Zhong minjuan

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2018: IF=1.585..

IJCCC is indexed in Scopus from 2008 (CiteScore2018 = 1.56):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.

 

 Impact Factor in JCR2018 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.585 (Q3). Scopus: CiteScore2018=1.56 (Q2);

SCImago Journal & Country Rank

Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.