Latent Semantic Analysis using a Dennis Coefficient for English Sentiment Classification in a Parallel System

Phu Vo Ngoc

Abstract


We have already survey many significant approaches for many years because there are many crucial contributions of the sentiment classification which can be applied in everyday life, such as in political activities, commodity production, and commercial activities. We have proposed a novel model using a Latent Semantic Analysis (LSA) and a Dennis Coefficient (DNC) for big data sentiment classification in English. Many LSA vectors (LSAV) have successfully been reformed by using the DNC. We use the DNC and the LSAVs to classify 11,000,000 documents of our testing data set to 5,000,000 documents of our training data set in English. This novel model uses many sentiment lexicons of our basis English sentiment dictionary (bESD). We have tested the proposed model in both a sequential environment and a distributed network system. The results of the sequential system are not as good as that of the parallel environment. We have achieved 88.76% accuracy of the testing data set, and this is better than the accuracies of many previous models of the semantic analysis. Besides, we have also compared the novel model with the previous models, and the experiments and the results of our proposed model are better than that of the previous model. Many different fields can widely use the results of the novel model in many commercial applications and surveys of the sentiment classification.

Keywords


English sentiment classification; parallel system; Cloudera; Hadoop Map and Hadoop Reduce; Dennis Measure; Latent Semantic Analysis

Full Text:

PDF

References


Bai, A.; Hammer, H.; Yazidi, A.; Engelstad, P. (2014); Constructing sentiment lexicons in Norwegian from a large text corpus, 2014 IEEE 17th International Conference on Computational Science and Engineering, 231-237, 2014.

Baldocchi, D.D.; Hincks, B.B.; Meyers, T.P.(1988); Measuring Biosphere-Atmosphere Exchanges of Biologically Related Gases with Micrometeorological Methods, Ecology society of America, 59(5), 1331-1340, 1988.

Choi, S.-S; Cha, S.-H.; Tappert, C.C. (2010); A Survey Of Binary Similarity And Distance Measures, Systemics, Cybernetics And Informatics, 8(1), 43-48, 2010.

Hofmann, T. (2001); Unsupervised Learning by Probabilistic Latent Semantic Analysis, Machine Learning, 42(1-2), 177-196, 2001.
https://doi.org/10.1023/A:1007617005950

Koppel, D.E. (1972); Analysis of Macromolecular Polydispersity in Intensity Correlation Spectroscopy: The Method of Cumulants, The Journal of Chemical Physics, 57(11), 4814, 1972.
https://doi.org/10.1063/1.1678153

Landauer, T.K.; Dumais, S. T. (1997); A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychological Review, 104(2), 211-240, 1997.
https://doi.org/10.1037/0033-295X.104.2.211

Landauer, T.K.; Foltz, P. W.; Laham, D. (2009); An introduction to latent semantic analysis, Discourse Processes, 25(2-3), 259-284, 2009.

Ngoc, P.V.; Ngoc, C.V.T.; Ngoc, T.V.T. et al. (2017); A C4.5 algorithm for english emotional classification, Evolving Systems, 1-27, 2017.

Phu, V.N. ; Tuoi, P.T. (2014); Sentiment classification using Enhanced Contextual Valence Shifters, International Conference on Asian Language Processing (IALP), 224-229, 2014.
https://doi.org/10.1109/IALP.2014.6973485

Phu, V.N.; Dat, N.D.; Tran, D.T.N.; Chau, V.T.N.; Nguyen, T.A.(2017); Fuzzy C-Means for English Sentiment Classification in a Distributed System, International Journal of Applied Intelligence, 45(3), 717-738 2017.

Phu, V.N.; Chau, V.T.N.; Tran, D.T.N. (2017); SVM for English Semantic Classification in Parallel Environment, International Journal of Speech Technology, 20(3), 487-508, 2017.
https://doi.org/10.1007/s10772-017-9421-5

Phu, V.N.; Tran, V.T.N.; Chau, V.T.N. et al. (2017); A Decision Tree using ID3 Algorithm for English Semantic Analysis, International Journal of Speech Technology, 20(3), 593-613, 2017.
https://doi.org/10.1007/s10772-017-9429-x

Phu, V.N.; Chau, V.T.N.; Tran, V.T.N. et al. (2017); A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics, International Journal of Artificial Intelligence Review (AIR), 1-67, 2017

Phu, V.N., Chau, V.T.N., Dat, N.D. et al. (2017); A Valences-Totaling Model for English Sentiment Classification, International Journal of Knowledge and Information Systems, 53(3), 579-636, 2017.
https://doi.org/10.1007/s10115-017-1054-0

Phu, V.N.; Chau, V.T.N.; Tran, V.T.N(2017); Shifting Semantic Values of English Phrases for Classification, International Journal of Speech Technology, 20(3), 579-636, 2017.

Phu, V.N., Chau, V.T.N., Tran, V.T.N. et al. (2017); A Valence-Totaling Model for Vietnamese Sentiment Classification, International Journal of Evolving Systems, 1-47, 2017.

Phu, V.N., Tran, V.T.N., Chau, V.T.N. et al. (2017); Semantic Lexicons of English Nouns for Classification, International Journal of Evolving Systems, 1-69, 2017.

Turney, D. P.; Littman, M.L. (2002); Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus, arXiv:cs/0212012, Learning, 2002.

Cambridge English Dictionary (2017); http://dictionary.cambridge.org/

Longman English Dictionary (2017); http://www.ldoceonline.com/

Oxford English Dictionary (2017); http://www.oxforddictionaries.com/




DOI: https://doi.org/10.15837/ijccc.2018.3.3044



Copyright (c) 2018 Phu Vo Ngoc

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2016: IF=1.374. .

IJCCC is indexed in Scopus from 2008 (SNIP2016 = 0.701, SJR2016 =0.319):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.