Model of Network Topic Detection Based on Web Usage Behaviour Mode Analysis and Mining Technology
AbstractThis research has caught researchers’ wide attention for detecting network topic exactly with the arrival of big data era characterized by semi-structured or unstructured text. This paper proposes a model of network topic detection based on web usage behaviour mode analysis and mining technology taking Web news as object of research. The author elaborates main function and method proposed in this model, which include the analysis module of Web news instance clicking mode, the analysis module of Web news instance retrieval mode, the analysis module of Web news instance seed and the analysis module of similar Web news instance supporting topics. Based on these functions and methods, the author elaborates main algorithm proposed in this model, which include the mining algorithm of Web news seed instances and the mining algorithm of similar Web news instances supporting topics. These functional algorithms have been applied in processing module of model, and focus on how to detect network topic efficiently from a large number of web usage behaviour towards to Web news instances, in order to explore a research method for network topic detection. The process of experimental analysis includes three steps, firstly, the author analyses the precision of topic detection under different method, secondly, the author completes the impact analysis of Web news topic detection quality from the number of Web news instances concerned and seed threshold, finally, the author completes the quality impact analysis of Web news instances mined supporting topic from the number of Web news instances concerned and probability threshold. The results of experimental analysis show the feasibility, validity and superiority of model design and play an important role in constructing topic-focused Web news corpus so as to provide a real-time data source for topic evolution tracking.
 Pandey Suraj, Nepal Surya, Cloud Computing and Scientific Applications-Big Data, Scalable Analytics, and Beyond, Future Generation Computer Systems, 29(7): 1774-1775.
 Zhu Zhiguo, A novel method for discovering frequent changing patterns from historical web access data, ICIC Express Letters, 8(9): 2443-2445.
 Nasomyont, Tamrerk, A study on the relationship between search engine optimization factors and rank on google search result page, Advanced Materials Research, 3(4): 1462-1464.
 Guo Yi, Chen Hao, Microblog user ranking based on PageRank and Hadoop, WIT Transactions on Information and Communication Technologies, 49(1): 1083-1085.
 Zhang Hongli, Huang Shouming, Web Information Extraction Method Based on MapReduce, Journal of Anhui Science and Technology University, 27(2): 72-74.
 Li Wen, Zheng Bangxi, Deng Wu, Research on Web Information Extraction Model Based on XML and DOM Technologies, Journal of Dalian Jiaotong University, 34(3): 96-98.
 Zhang Yaming, Tang Chaosheng, Information propagation model based on the dynamics of complex networks in mircoblogging, Journal of Computational Information Systems, 10(1): 443-445.
 Wu Jiagao, Zhou Fankun, Zhang Xueying, Research of the Extraction Method of Event Properties Based on the Combining of HMM and Syntactic Analysis, Journal of Nanjing Normal University(Natural Science Edition), 37(1): 30-32.
 Yang Yuzhen, Liu Peiyu, Fei Shaodong, Zhang Chenggong, A topic link detection method based on improved information bottleneck theory, Zidonghua Xuebao/Acta Automatica Sinica, 40(3): 471-479.
 Suhara, Yoshihiko, Toda, Hiroyuki, Nishioka, Shuichi, Susaki, Seiji, Automatically generated spam detection based on sentence-level topic information, WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web, 1157-1160.
 Pang Junbiao, Jia Fei, Zhang Chunjie, Zhang Chenggong, Unsupervised Web Topic Detection Using A Ranked Clustering-Like Pattern Across Similarity Cascades, IEEE TRANSACTIONS ON MULTIMEDIA, 17(6): 843-853.
 Dziczkowski, Grzegorz, Wegrzyn-Wolska, Katarzyna, Bougueroua, Lamine, An opinion mining approach for web user identification and clients' behaviour analysis, IEEE Computer Society, 79-84.
 Karakostas, Bill, Theodoulidis, Babis, A MapReduce architecture for web site user behaviour monitoring in real time, DATA 2013 - Proceedings of the 2nd International Conference on Data Technologies and Applications, 45-52.
 Zhang Yongheng, Feng Zhang, Fei You, A New Replacement Algorithm of Web Search Engine Cache based on User Behavior, Applied Mathematics & Information Sciences, 8(6): 3049-3054.
 Chen Mo, Yang Xiaoping, Research on Model of Network Information Extraction Based on Improved Topic-Focused Web Crawler Key Technology, Tehnicki vjesnik/Technical Gazette, 23(4): 49-54.
 Chen Xuegang, Research and realization of E-commerce monitor system based on focused web crawler, Information Technology Journal, 12(17): 4033-4035.
 Balla, Andoena, Real-time web crawler detection, 2011 18th International Conference on Telecommunications, 428-430.
 Ahmadi-Abkenari, F, A clickstream-based web page significance ranking metric for web crawlers, 2011 5th Malaysian Conference in Software Engineering, 223-225.
 Chen Mo, Yang Xiaoping, Liu Ting, A research on user behavior sequence analysis based on social networking service use-case model, International Journal of u- and e- Service, Science and Technology, 7(2): 1-4.
 Chen Mo, Yang Xiaoping, Sun Meng, Zhao Yun, Research on model of network information currency evaluation based on web semantic extraction method, International Journal of Future Generation Communication and Networking, 7(2): 103-105.
 Zhu Tao, Lin Yumin, Cheng Ji,Wang Xiaoling, Efficient diverse rank of hot-topics-discussion on social network, Lecture Notes in Computer Science, 8485(1): 522-524.
 Lu Ran, Xue Suzhi, Ren Yuanyuan, Zhu Zhenfang, A modified approach of hot topics found on micro-blog, Lecture Notes in Electrical Engineering, 269(1): 603-605.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.