Research on Key Technology of Web Hierarchical Topic Detection and Evolution Based on Behaviour Tracking Analysis


  • Mo Chen Beijing Union University


Web hierarchical topic, topic detection, event evolution, behaviour tracking analysis


In the development background of today’s big data era, the research direction of Web hierarchical topic detection and evolution characterized by the semistructured or unstructured data has caught wide attention for academicians. This paper proposes an idea of Web hierarchical topic detection and evolution based on behaviour tracking analysis taking the network big data as the research object, and expounds main implementation methods, which include the instance analysis of the usage mode, the instance analysis of the seed, the set analysis of similar instance supporting the topics, the set analysis of similar instance supporting the events, the evolution analysis of the event, and expounds the algorithm of Web hierarchical topic detection and evolution based on behaviour tracking analysis. The process of experimental analysis is organized as follows, first of all, the experiment analyses the quality of topic detection, the accuracy rate with the number of instance concerned and the seed threshold variation trend, the accuracy rate with the number of instance concerned and the probability threshold variation trend, secondly, the experiment analyses the quality of topic evolution, the accuracy rate with the variation trend of parameter adjustment, the accuracy rate with the number of instance concerned and the similar threshold variation trend, finally, the experiment analyses the time consuming to solve main research problem under different method, the qualitative result of topic detection and evolution under different data set. The results of experimental analysis show the idea is feasible, verifiable and superior, which plays a major role in reconfiguring Web hierarchical topic corpus and providing an intelligent big data warehouse for the network information evolution application.


Ahila, S.S.; Shunmuganathan, K.L. (2016). Role of Agent Technology in Web Usage Mining: Homomorphic Encryption Based Recommendation for E-commerce Applications, Wireless Personal Communications, 87(2), 499-512, 2016.

Alam, M.H.; Ryu, W.J.; Lee, S. (2017). Hashtag-Based Topic Evolution in Social Media, World Wide Web-Internet and Web Information Systems, 20(6), 1527-1549, 2017.

Aujla, G.S.; Kumar, N.; Zomaya, A.Y. (2018). Optimal Decision Making for Big Data Processing at Edge-Cloud Environment: An SDN Perspective, IEEE Transactions on Industrial Informatics, 14(2), 778-782, 2018.

Chen, B.T.; Tsutsui, S.; Ding, Y.; Ma, F.C. (2017). Understanding the Topic Evolution in a Scientific Domain: an Exploratory Study for the Field of Information Retrieval, Journal of Informetrics, 11(4), 1175-1189, 2017.

Chen, M.; Yang, X.P. (2016). Research on Model of Network Information Extraction Based on Improved Topic-Focused Web Crawler Key Technology, Tehnicki vjesnik/Technical Gazette, 23(4), 49-54, 2016.

Chen, M.; Yang, X.P.; Sun, M.; Zhao, Y. (2014). Research on Model of Network Information Currency Evaluation Based on Web Semantic Extraction Method, International Journal of Future Generation Communication and Networking, 7(2), 103-116, 2014.

Chen, Y.; Zhang, H.; Liu, R.; Ye, Z.W.; Lin, J.Y. (2019). Experimental Explorations on Short Text Topic Mining Between LDA and NMF Based Schemes, Knowledge-Based Systems, 163, 1-3, 2019.

Dai, Y.; Wu, W.; Zhou, H.B.; Zhang, J.; Ma, F.Y. (2018). Numerical Simulation and Optimization of Oil Jet Lubrication for Rotorcraft Meshing Gears, International Journal of Simulation Modelling, 17(2), 318-326, 2018.

Dai, Y.; Zhu, X.; Zhou, H.; Mao, Z.; Wu, W. (2018). Trajectory Tracking Control for Seafloor Tracked Vehicle by Adaptive Neural-Fuzzy Inference System Algorithm, International Journal of Computers Communications & Control, 13(4), 465-476, 2018.

Du, J.; Sun, Y.; Ren, H. (2018). The Relationship of Delivery Frequency with the Cost and Resource Operational Efficiency: A Case Study of Jingdong Logistics, Mathematics and Computer Science, 3(6), 129-140, 2018.

Fatima, B.; Ramzan, H.; Asghar, S. (2016). Session Identification Techniques Used in Web Usage Mining a Systematic Mapping of Scholarly Literature, Online Information Review, 40(7), 1033-1053, 2016.

Gaul, W.G.; Vincent, D. (2017). Evaluation of the Evolution of Relationships between Topics over Time, Advances in Data Analysis and Classification, 11(1), 159-178, 2017.

Jimenez-Marquez, J.L.; Gonzalez-Carrasco, I.; Lopez-Cuadrado, J.L.; Ruiz-Mezcua, B. (2019). Towards a Big Data Framework for Analysing Social Media Content, International Journal of Information Management, 44, 1-3, 2019.

Kaseb, M.R.; Khafagy, M.H.; Ali, I.A.; Saad, E.M. (2019). An Improved Technique for Increasing Availability in Big Data Replication, Future Generation Computer Systems-The International Journal of Escience, 91, 493-497, 2019.

Kausel, E.E. (2018). Big Data at Work: The Data Science Revolution and Organizational Psychology, Personnel Psychology, 71(1), 135-136, 2018.

Kho, N.D. (2018). The State of Big Data, Econtent, 41(1), 11-12, 2018.

Liu, J.; Fang, C.; Ansari, N. (2016). Request Dependency Graph: a Model for Web Usage Mining in Large-Scale Web of Things, IEEE Internet of Things Journal, 3(4), 598-608, 2016.

Makkie, M.; Huang, H.; Zhao, Y.; Vasilakos, A.V.; Liu, T.M. (2019). Fast and Scalable Distributed Deep Convolutional Autoencoder for fMRI Big Data Analytics, Neurocomputing, 325, 20-22, 2019.

Osman, A.M.S. (2019). A Novel Big Data Analytics Framework for Smart Cities, Future Generation Computer Systems-The International Journal of Escience, 91, 620-623, 2019.

O'Halloran, K.L.; Tan, S.; Duc-Son, P. (2018). A Digital Mixed Methods Research Design: Integrating Multimodal Analysis with Data Mining and Information Visualization for Big Data Analytics, Journal of Mixed Methods Research, 12(1), 11-15, 2018.

Pandian, P.S.; Srinivasan, S. (2016). A Unified Model for Preprocessing and Clustering Technique for Web Usage Mining, Journal of Multiple-Valued Logic and Soft Computing, 26(3), 205-220, 2016.

Sagi, T.; Gal, A. (2018). Non-Binary Evaluation Measures for Big Data Integration, VLDB Journal, 27(1), 105-110, 2018.

Tran, Q.T.; Nguyen, S.D.; Seo, T.I. (2019). Algorithm for Estimating Online Bearing Fault Upon the Ability to Extract Meaningful Information From Big Data of Intelligent Structures, IEEE Transactions on Industrial Electronics, 66(5), 3804-3806, 2019.

Uma, R.; Muneeswaran, K. (2017). OMIR: Ontology-Based Multimedia Information Retrieval System for Web Usage Mining, Cybernetics and Systems, 48(4), 393-414, 2017.

Wu, P.J.; Lin, K.C. (2018); Unstructured Big Data Analytics for Retrieving E-Commerce Logistics Knowledge, Telematics and Informatics, 35(1), 237-241, 2018.

Yao, L.; Ge, Z.Q. (2019). Scalable Semisupervised GMM for Big Data Quality Prediction in Multimode Processes, IEEE Transactions on Industrial Electronics, 66(5), 3681-3684, 2019.

Zhang, D. (2017). High-Speed Train Control System Big Data Analysis Based on Fuzzy RDF Model and Uncertain Reasoning, International Journal of Computers Communications & Control, 12(4), 11-15, 2017.

Zhang, D.; Sui, J.; Gong, Y. (2017). Large Scale Software Test Data Generation Based on Collective Constraint and Weighted Combination Method, Tehnicki Vjesnik, 24(4), 1041- 1050, 2017.

Zhang, D.; Jin, D.; Gong, Y. (2015). Research of Alarm Correlations Based on Static Defect Detection, Tehnicki vjesnik, 22(2), 311-318, 2015.

Zhou, H.K.; Yu, H.M.; Hu, R. (2017). Topic Discovery and Evolution in Scientific Literature Based on Content and Citations, Frontiers of Information Technology & Electronic Engineering, 18(10), 1511-1524, 2017.

Zhou, H.K.; Yu, H.M.; Hu, R. (2017). Topic Evolution Based on the Probabilistic Topic Model: a Review, Frontiers of Computer Science, 11(5), 786-802, 2017.



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.