Latent Semantic Analysis using a Dennis Coefficient for English Sentiment Classification in a Parallel System

Abstract

We have already survey many significant approaches for many years because there are many crucial contributions of the sentiment classification which can be applied in everyday life, such as in political activities, commodity production, and commercial activities. We have proposed a novel model using a Latent Semantic Analysis (LSA) and a Dennis Coefficient (DNC) for big data sentiment classification in English. Many LSA vectors (LSAV) have successfully been reformed by using the DNC. We use the DNC and the LSAVs to classify 11,000,000 documents of our testing data set to 5,000,000 documents of our training data set in English. This novel model uses many sentiment lexicons of our basis English sentiment dictionary (bESD). We have tested the proposed model in both a sequential environment and a distributed network system. The results of the sequential system are not as good as that of the parallel environment. We have achieved 88.76% accuracy of the testing data set, and this is better than the accuracies of many previous models of the semantic analysis. Besides, we have also compared the novel model with the previous models, and the experiments and the results of our proposed model are better than that of the previous model. Many different fields can widely use the results of the novel model in many commercial applications and surveys of the sentiment classification.

References

[1] Bai, A.; Hammer, H.; Yazidi, A.; Engelstad, P. (2014); Constructing sentiment lexicons in Norwegian from a large text corpus, 2014 IEEE 17th International Conference on Computational Science and Engineering, 231-237, 2014.

[2] Baldocchi, D.D.; Hincks, B.B.; Meyers, T.P.(1988); Measuring Biosphere-Atmosphere Exchanges of Biologically Related Gases with Micrometeorological Methods, Ecology society of America, 59(5), 1331-1340, 1988.

[3] Choi, S.-S; Cha, S.-H.; Tappert, C.C. (2010); A Survey Of Binary Similarity And Distance Measures, Systemics, Cybernetics And Informatics, 8(1), 43-48, 2010.

[4] Hofmann, T. (2001); Unsupervised Learning by Probabilistic Latent Semantic Analysis, Machine Learning, 42(1-2), 177-196, 2001.
https://doi.org/10.1023/A:1007617005950

[5] Koppel, D.E. (1972); Analysis of Macromolecular Polydispersity in Intensity Correlation Spectroscopy: The Method of Cumulants, The Journal of Chemical Physics, 57(11), 4814, 1972.
https://doi.org/10.1063/1.1678153

[6] Landauer, T.K.; Dumais, S. T. (1997); A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychological Review, 104(2), 211-240, 1997.
https://doi.org/10.1037/0033-295X.104.2.211

[7] Landauer, T.K.; Foltz, P. W.; Laham, D. (2009); An introduction to latent semantic analysis, Discourse Processes, 25(2-3), 259-284, 2009.

[8] Ngoc, P.V.; Ngoc, C.V.T.; Ngoc, T.V.T. et al. (2017); A C4.5 algorithm for english emotional classification, Evolving Systems, 1-27, 2017.

[9] Phu, V.N. ; Tuoi, P.T. (2014); Sentiment classification using Enhanced Contextual Valence Shifters, International Conference on Asian Language Processing (IALP), 224-229, 2014.
https://doi.org/10.1109/IALP.2014.6973485

[10] Phu, V.N.; Dat, N.D.; Tran, D.T.N.; Chau, V.T.N.; Nguyen, T.A.(2017); Fuzzy C-Means for English Sentiment Classification in a Distributed System, International Journal of Applied Intelligence, 45(3), 717-738 2017.

[11] Phu, V.N.; Chau, V.T.N.; Tran, D.T.N. (2017); SVM for English Semantic Classification in Parallel Environment, International Journal of Speech Technology, 20(3), 487-508, 2017.
https://doi.org/10.1007/s10772-017-9421-5

[12] Phu, V.N.; Tran, V.T.N.; Chau, V.T.N. et al. (2017); A Decision Tree using ID3 Algorithm for English Semantic Analysis, International Journal of Speech Technology, 20(3), 593-613, 2017.
https://doi.org/10.1007/s10772-017-9429-x

[13] Phu, V.N.; Chau, V.T.N.; Tran, V.T.N. et al. (2017); A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics, International Journal of Artificial Intelligence Review (AIR), 1-67, 2017

[14] Phu, V.N., Chau, V.T.N., Dat, N.D. et al. (2017); A Valences-Totaling Model for English Sentiment Classification, International Journal of Knowledge and Information Systems, 53(3), 579-636, 2017.
https://doi.org/10.1007/s10115-017-1054-0

[15] Phu, V.N.; Chau, V.T.N.; Tran, V.T.N(2017); Shifting Semantic Values of English Phrases for Classification, International Journal of Speech Technology, 20(3), 579-636, 2017.

[16] Phu, V.N., Chau, V.T.N., Tran, V.T.N. et al. (2017); A Valence-Totaling Model for Vietnamese Sentiment Classification, International Journal of Evolving Systems, 1-47, 2017.

[17] Phu, V.N., Tran, V.T.N., Chau, V.T.N. et al. (2017); Semantic Lexicons of English Nouns for Classification, International Journal of Evolving Systems, 1-69, 2017.

[18] Turney, D. P.; Littman, M.L. (2002); Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus, arXiv:cs/0212012, Learning, 2002.

[19] Cambridge English Dictionary (2017); http://dictionary.cambridge.org/

[20] Longman English Dictionary (2017); http://www.ldoceonline.com/

[21] Oxford English Dictionary (2017); http://www.oxforddictionaries.com/
Published
2018-05-27
How to Cite
VO NGOC, Phu. Latent Semantic Analysis using a Dennis Coefficient for English Sentiment Classification in a Parallel System. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, [S.l.], v. 13, n. 3, p. 408-428, may 2018. ISSN 1841-9844. Available at: <http://univagora.ro/jour/index.php/ijccc/article/view/3044>. Date accessed: 05 july 2020. doi: https://doi.org/10.15837/ijccc.2018.3.3044.

Keywords

English sentiment classification; parallel system; Cloudera; Hadoop Map and Hadoop Reduce; Dennis Measure; Latent Semantic Analysis