Research on Key Technology of Web Hierarchical Topic Detection and Evolution Based on Behaviour Tracking Analysis

Mo Chen

Abstract


In the development background of today’s big data era, the research direction of Web hierarchical topic detection and evolution characterized by the semistructured or unstructured data has caught wide attention for academicians. This paper proposes an idea of Web hierarchical topic detection and evolution based on behaviour tracking analysis taking the network big data as the research object, and expounds main implementation methods, which include the instance analysis of the usage mode, the instance analysis of the seed, the set analysis of similar instance supporting the topics, the set analysis of similar instance supporting the events, the evolution analysis of the event, and expounds the algorithm of Web hierarchical topic detection and evolution based on behaviour tracking analysis. The process of experimental analysis is organized as follows, first of all, the experiment analyses the quality of topic detection, the accuracy rate with the number of instance concerned and the seed threshold variation trend, the accuracy rate with the number of instance concerned and the probability threshold variation trend, secondly, the experiment analyses the quality of topic evolution, the accuracy rate with the variation trend of parameter adjustment, the accuracy rate with the number of instance concerned and the similar threshold variation trend, finally, the experiment analyses the time consuming to solve main research problem under different method, the qualitative result of topic detection and evolution under different data set. The results of experimental analysis show the idea is feasible, verifiable and superior, which plays a major role in reconfiguring Web hierarchical topic corpus and providing an intelligent big data warehouse for the network information evolution application.

Keywords


Web hierarchical topic, topic detection, event evolution, behaviour tracking analysis

Full Text:

PDF

References


Ahila, S.S.; Shunmuganathan, K.L. (2016). Role of Agent Technology in Web Usage Mining: Homomorphic Encryption Based Recommendation for E-commerce Applications, Wireless Personal Communications, 87(2), 499-512, 2016.
https://doi.org/10.1007/s11277-015-3082-y

Alam, M.H.; Ryu, W.J.; Lee, S. (2017). Hashtag-Based Topic Evolution in Social Media, World Wide Web-Internet and Web Information Systems, 20(6), 1527-1549, 2017.
https://doi.org/10.1007/s11280-017-0451-3

Aujla, G.S.; Kumar, N.; Zomaya, A.Y. (2018). Optimal Decision Making for Big Data Processing at Edge-Cloud Environment: An SDN Perspective, IEEE Transactions on Industrial Informatics, 14(2), 778-782, 2018.
https://doi.org/10.1109/TII.2017.2738841

Chen, B.T.; Tsutsui, S.; Ding, Y.; Ma, F.C. (2017). Understanding the Topic Evolution in a Scientific Domain: an Exploratory Study for the Field of Information Retrieval, Journal of Informetrics, 11(4), 1175-1189, 2017.
https://doi.org/10.1016/j.joi.2017.10.003

Chen, M.; Yang, X.P. (2016). Research on Model of Network Information Extraction Based on Improved Topic-Focused Web Crawler Key Technology, Tehnicki vjesnik/Technical Gazette, 23(4), 49-54, 2016.
https://doi.org/10.17559/TV-20150314134638

Chen, M.; Yang, X.P.; Sun, M.; Zhao, Y. (2014). Research on Model of Network Information Currency Evaluation Based on Web Semantic Extraction Method, International Journal of Future Generation Communication and Networking, 7(2), 103-116, 2014.
https://doi.org/10.14257/ijfgcn.2014.7.2.11

Chen, Y.; Zhang, H.; Liu, R.; Ye, Z.W.; Lin, J.Y. (2019). Experimental Explorations on Short Text Topic Mining Between LDA and NMF Based Schemes, Knowledge-Based Systems, 163, 1-3, 2019.
https://doi.org/10.1016/j.knosys.2018.08.011

Dai, Y.; Wu, W.; Zhou, H.B.; Zhang, J.; Ma, F.Y. (2018). Numerical Simulation and Optimization of Oil Jet Lubrication for Rotorcraft Meshing Gears, International Journal of Simulation Modelling, 17(2), 318-326, 2018.
https://doi.org/10.2507/IJSIMM17(2)CO6

Dai, Y.; Zhu, X.; Zhou, H.; Mao, Z.; Wu, W. (2018). Trajectory Tracking Control for Seafloor Tracked Vehicle by Adaptive Neural-Fuzzy Inference System Algorithm, International Journal of Computers Communications & Control, 13(4), 465-476, 2018.
https://doi.org/10.15837/ijccc.2018.4.3267

Du, J.; Sun, Y.; Ren, H. (2018). The Relationship of Delivery Frequency with the Cost and Resource Operational Efficiency: A Case Study of Jingdong Logistics, Mathematics and Computer Science, 3(6), 129-140, 2018.

Fatima, B.; Ramzan, H.; Asghar, S. (2016). Session Identification Techniques Used in Web Usage Mining a Systematic Mapping of Scholarly Literature, Online Information Review, 40(7), 1033-1053, 2016.
https://doi.org/10.1108/OIR-08-2015-0274

Gaul, W.G.; Vincent, D. (2017). Evaluation of the Evolution of Relationships between Topics over Time, Advances in Data Analysis and Classification, 11(1), 159-178, 2017.
https://doi.org/10.1007/s11634-016-0241-2

Jimenez-Marquez, J.L.; Gonzalez-Carrasco, I.; Lopez-Cuadrado, J.L.; Ruiz-Mezcua, B. (2019). Towards a Big Data Framework for Analysing Social Media Content, International Journal of Information Management, 44, 1-3, 2019.
https://doi.org/10.1016/j.ijinfomgt.2018.09.003

Kaseb, M.R.; Khafagy, M.H.; Ali, I.A.; Saad, E.M. (2019). An Improved Technique for Increasing Availability in Big Data Replication, Future Generation Computer Systems-The International Journal of Escience, 91, 493-497, 2019.
https://doi.org/10.1016/j.future.2018.08.015

Kausel, E.E. (2018). Big Data at Work: The Data Science Revolution and Organizational Psychology, Personnel Psychology, 71(1), 135-136, 2018.
https://doi.org/10.1111/peps.12255

Kho, N.D. (2018). The State of Big Data, Econtent, 41(1), 11-12, 2018.
https://doi.org/10.1007/978-3-319-63962-8_255-1

Liu, J.; Fang, C.; Ansari, N. (2016). Request Dependency Graph: a Model for Web Usage Mining in Large-Scale Web of Things, IEEE Internet of Things Journal, 3(4), 598-608, 2016.
https://doi.org/10.1109/JIOT.2015.2452964

Makkie, M.; Huang, H.; Zhao, Y.; Vasilakos, A.V.; Liu, T.M. (2019). Fast and Scalable Distributed Deep Convolutional Autoencoder for fMRI Big Data Analytics, Neurocomputing, 325, 20-22, 2019.
https://doi.org/10.1016/j.neucom.2018.09.066

Osman, A.M.S. (2019). A Novel Big Data Analytics Framework for Smart Cities, Future Generation Computer Systems-The International Journal of Escience, 91, 620-623, 2019.
https://doi.org/10.1016/j.future.2018.06.046

O'Halloran, K.L.; Tan, S.; Duc-Son, P. (2018). A Digital Mixed Methods Research Design: Integrating Multimodal Analysis with Data Mining and Information Visualization for Big Data Analytics, Journal of Mixed Methods Research, 12(1), 11-15, 2018.
https://doi.org/10.1177/1558689816651015

Pandian, P.S.; Srinivasan, S. (2016). A Unified Model for Preprocessing and Clustering Technique for Web Usage Mining, Journal of Multiple-Valued Logic and Soft Computing, 26(3), 205-220, 2016.

Sagi, T.; Gal, A. (2018). Non-Binary Evaluation Measures for Big Data Integration, VLDB Journal, 27(1), 105-110, 2018.
https://doi.org/10.1007/s00778-017-0489-y

Tran, Q.T.; Nguyen, S.D.; Seo, T.I. (2019). Algorithm for Estimating Online Bearing Fault Upon the Ability to Extract Meaningful Information From Big Data of Intelligent Structures, IEEE Transactions on Industrial Electronics, 66(5), 3804-3806, 2019.
https://doi.org/10.1109/TIE.2018.2847704

Uma, R.; Muneeswaran, K. (2017). OMIR: Ontology-Based Multimedia Information Retrieval System for Web Usage Mining, Cybernetics and Systems, 48(4), 393-414, 2017.
https://doi.org/10.1080/01969722.2017.1285163

Wu, P.J.; Lin, K.C. (2018); Unstructured Big Data Analytics for Retrieving E-Commerce Logistics Knowledge, Telematics and Informatics, 35(1), 237-241, 2018.
https://doi.org/10.1016/j.tele.2017.11.004

Yao, L.; Ge, Z.Q. (2019). Scalable Semisupervised GMM for Big Data Quality Prediction in Multimode Processes, IEEE Transactions on Industrial Electronics, 66(5), 3681-3684, 2019.
https://doi.org/10.1109/TIE.2018.2856200

Zhang, D. (2017). High-Speed Train Control System Big Data Analysis Based on Fuzzy RDF Model and Uncertain Reasoning, International Journal of Computers Communications & Control, 12(4), 11-15, 2017.
https://doi.org/10.15837/ijccc.2017.4.2914

Zhang, D.; Sui, J.; Gong, Y. (2017). Large Scale Software Test Data Generation Based on Collective Constraint and Weighted Combination Method, Tehnicki Vjesnik, 24(4), 1041- 1050, 2017.
https://doi.org/10.17559/TV-20170319045945

Zhang, D.; Jin, D.; Gong, Y. (2015). Research of Alarm Correlations Based on Static Defect Detection, Tehnicki vjesnik, 22(2), 311-318, 2015.
https://doi.org/10.17559/TV-20150317102804

Zhou, H.K.; Yu, H.M.; Hu, R. (2017). Topic Discovery and Evolution in Scientific Literature Based on Content and Citations, Frontiers of Information Technology & Electronic Engineering, 18(10), 1511-1524, 2017.
https://doi.org/10.1631/FITEE.1601125

Zhou, H.K.; Yu, H.M.; Hu, R. (2017). Topic Evolution Based on the Probabilistic Topic Model: a Review, Frontiers of Computer Science, 11(5), 786-802, 2017.
https://doi.org/10.1007/s11704-016-5442-5




DOI: https://doi.org/10.15837/ijccc.2019.3.3534



Copyright (c) 2019 Mo Chen

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2016: IF=1.374. .

IJCCC is indexed in Scopus from 2008 (CiteScore 2017 = 1.04; SNIP2017 = 0.616, SJR2017 =0.326):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.

 

 Impact Factor in JCR2017 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.29 (Q3). Scopus: CiteScore2017=1.04 (Q2); Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.