Efficient Opinion Summarization on Comments with Online-LDA


  • Jun Ma
  • Senlin Luo
  • Jianguo Yao
  • Shuxin Cheng
  • Xi Chen


Opinion summarization, Latent Dirichlet Allocation (LDA), online - LDA, imbalanced data, big data


Customer reviews and comments on web pages are important information n our daily life. For example, we prefer to choose a hotel with positive comments rom previous customers. As the huge amounts of such information demonstrate the haracteristics of big data, it places heavy burdens on the assimilation of the customercontributed pinions. To overcoming this problem, we study an efficient opinion ummarization approach for a set of massive user reviews and comments associated ith an online resource, to summarize the opinions into two categories, i.e., positive nd negative. In this paper, we proposed a framework including: (1) overcoming the ig data problem of online comments using the efficient online-LDA approach; (2) electing meaningful topics from the imbalanced data; (3) summarizing the opinion f comments with high precision and recall. This framework is different from much f the previous work in that the topics are pre-defined and selected the topics for etter opinion summarization. To evaluate the proposed framework, we perform the xperiments on a dataset of hotel reviews for the variety of topics contained. The esults show that our framework can gain a significant performance improvement on pinion summarization.


A. Divtt and K. Ahmad (2007); Sentiment polarity identification in financial news: A ohesion-based approach, In ACL'07, Prague, Czech Republic, June 2007, 1-8.

B. Pang, L. Lee and S. Vaithyanathan (2002); Thumbs up?: sentiment classification using achine learning techniques, EMNLP'02: Proc of the ACL'02 conference on Empirical ethods in Natural Language Processing, Morristown, NJ, USA, 10: 79-86.

D.M. Blei, A. Ng and M. Jordan (2003); Latent Dirichlet Allocation, Journal of Machine earning Research, January 2003, 3:993-1022.

D.M. Blei and J.D. McAuliffe (2007); Supervised topic models, In NIPS'07, Vancouver, .C., Canada, 1-8.

D. Ramage, D. Hall, R. Nallapati and C.D. Manning (2009);

Labeled LDA: a supervised opic model for credit attribution in multi-labeled corpora, In EMNLP'02: Proc. of the CL'02 conference on Empirical Methods in Natural Language Processing,Stroudsburg, PA, SA, 2009.

D.M.W. Powers (2001); Evaluation: Precision, Recall and F-measure to ROC, Informedness, arkedness & Correlation, Journal of Machine Learning Technologies, 2(1):37-63.

T. Hofmann (1999); Probabilistic latent semantic indexing, In SIGIR'99: Proc. of the 22nd nnual Intl. ACMSIGIR Conf. on Research and Development in Information Retrieval, New ork, NY, USA.

M.D. Hoffman, D.M. Blei and F. Bach (2010); On-line learning for Latent Dirichlet Allocation, IPS2010, Proceedings of the 22nd annual international ACM SIGIR conference on esearch and development in information retrieval, Lake Tahoe, Nevada, USA, 50-57.

H. Wang, Y. Lu and CX. Zhai (2011); Latent Aspect Rating Analysis without Aspect eyword Supervision, KDD'11, Proc. of the 17th ACM SIGKDD intl. conf. on Knowledge iscovery and data mining, San Diego, California, USA, 618-626.

I. Titov and R. McDonald (2008); A Joint Model of Text and Aspect Ratings for Sentiment ummarization, Proc. of ACL'08, Columbus, Ohio, USA, 308-316.

C. Lin and Y. He (2009); Joint Sentiment/Topic Model for Sentiment Analysis, CIKM'09, roceedings of the 18th ACM conference on Information and knowledge management, Hong ong, China, 375-384.

L.-W. Ku, Y.-T. Liang and H.-H. Chen (2006); Opinion extraction, summarization and racking in news and blog corpora, AAAI-CAAW'06, Proceedings ofAAAI-CAAW-06, the pring Symposia on Computational Approaches to Analyzing Weblogs, Stanford, California, SA, 1-8.

Y. Lu, C. Zhai and N. Sundaresan (2009); Rated aspect summarization of short comments, WW'09, Proceedings of the 18th international conference on World wide web, ACM, NY, SA, 131-140.

P.D. Turney (2002); Thumbs up or thumbs down?: semantic orientation applied to unsupervised lassification of reviews, ACL'02, Proceedings of the 40th Annual Meeting on ssociation for Computational Linguistics, Morristown, NJ, USA, 417-424.

P.D. Turney and D.L. Littman (2003); Measuring praise and criticism: Inference of semantic rientation from assocation, ACM Trans. Inf. Syst., 21(4):315-346. http://dx.doi.org/10.1145/944012.944013

P. Stenetorp, S. Pyysalo, G. Topic, S. Ananiadou and J. Tsujii (2012); BRAT: a web-based ool for NLP-Assisted text annotation, EACL '12 Proceedings of the Demonstrations at the 3th Conference of the European Chapter of the Association for Computational Linguistics, vignon, France, 102-107.

Q. Mei, X. Ling, M. Wondra, H. Su and C. Zhai (2007); Topic sentiment mixture: modeling acets and opinions in weblogs, WWW '07 Proceedings of the 16th international conference n World Wide Web, Banff, Alberta, Canada, 171-180

B. Walsh (2002); Markov chain Monte Carlo and Gibbs sampling, Lecture notes for EEB 96z, 2002.

X. Chen and M. Wasikowski (2008); Fast: A roc-based feature selection metric for small amples and imbalanced data classification problems, Proceedings of the ACM SIGKDD nternational Conference on Knowledge Discovery and Data Mining, 124-132.

YW. Chen and CJ. Lin (2015); Combining SVMS with various feature selection strategies, vailable: www.csie.ntu.edu.tw/ cjlin/papers/fearutes.pdf.

Z. Ma, A. Sun, Q. Yuan and G. Cong (2012); Topic-Driven reader comments summarization, IKM'12, Maui, HI, USA, 265-274.



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.