An Abnormal Network Traffic Detection Algorithm Based on Big Data Analysis

  • Haipeng Yao Beijing University of Posts and Telecommunications
  • Yiqing Liu State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications No 10, Xitucheng Road, Haidian District, Beijing, PRC
  • Chao Fang 1. Beijing Advanced Innovation Center for Future Internet Technology Beijing University of Technology 100 Ping Le Yuan, Chaoyang District, Beijing, PRC 2. College of Electronic Information and Control Engineering Beijing University of Technology 100 Ping Le Yuan, Chaoyang District, Beijing, PRC


Anomaly network detection is a very important way to analyze and detect malicious behavior in network. How to effectively detect anomaly network flow under the pressure of big data is a very important area, which has attracted more and more researchers’ attention. In this paper, we propose a new model based on big data analysis, which can avoid the influence brought by adjustment of network traffic distribution, increase detection accuracy and reduce the false negative rate. Simulation results reveal that, compared with k-means, decision tree and random forest algorithms, the proposed model has a much better performance, which can achieve a detection rate of 95.4% on normal data, 98.6% on DoS attack, 93.9% on Probe attack, 56.1% on U2R attack, and 77.2% on R2L attack.


[1] Patcha, A.; Park, J.M. (2007); An overview of anomaly detection techniques: Existing solutions and latest technological trends, Computer Networks, ISSN 1389-1286, 51(12): 3448- 3470.

[2] Lazarevic, A.; Kumar, V.; Srivastava, J. (2005); Intrusion detection: A survey, Managing Cyber Threats, ISSN 0924-6703, 5: 19-78.

[3] Axelsson, S. (1998); Research in intrusion-detection systems: a survey, Department of Computer Engineering, Chalmers University of Technology, Goteborg. Sweden, Technical Report 98-17.

[4] Om, H.; Kundu, A. (2012); A hybrid system for reducing the false alarm rate of anomaly intrusion detection system, IEEE 1st International Conference on Recent Advances in Information Technology (RAIT), ISBN 978-1-4577-0694-3, 131-136.

[5] Kaisler, S. et al (2013); Big data: Issues and challenges moving forward, IEEE 46th Hawaii International Conference on System Sciences (HICSS), ISSN 1530-1605, 995-1004.

[6] Michael, K.; Miller, K.W. (2013); Big Data: New Opportunities and New Challenges, Computer, ISSN 0018-9162, 46(6):22-24.

[7] Russom, P. et al (2011); Big Data Analytics, TDWI Best Practices Report, Fourth Quarter.

[8] Fan, W.; Bifet, A. (2013); Mining big data: current status, and forecast to the future, ACM SIGKDD Explorations Newsletter, ISSN 1931-0145, 14(2): 1-5.

[9] James, G. et al (2013); An introduction to statistical learning, Springer, ISSN 1431-875X.

[10] Guan, Y.; Ghorbani, A.A.; Belacel, N. (2003); Y-means: A clustering method for intrusion detection, IEEE Canadian Conference on Electrical and Computer Engineering, ISSN 0840- 7789, 2:1083-1086.

[11] Quinlan, J.R. (1993); C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., ISBN 1558602402.

[12] Elbasiony, R.M. et al (2013); A hybrid network intrusion detection framework based on random forests and weighted k-means, Ain Shams Engineering Journal, ISSN 2090-4479, 4(4): 753-762.

[13] KDD Cup 1999, May 2015

[14] Lippmann, R.P. et al (2000); Evaluating intrusion detection systems: The 1998 DARPA offline intrusion detection evaluation, IEEE Proceedings of DARPA Information Survivability Conference and Exposition (DISCEX), ISBN 0-7695-0490-6, 2:12-26.

[15] Tavallaee, M. et al (2009); A detailed analysis of the KDD CUP 99 data set, Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications (CISDA), ISBN 978-1-4244-3763-4, 1-6.

[16] Pfahringer, B. (2000); Winning the KDD99 classification cup: bagged boosting, ACM SIGKDD Explorations Newsletter, ISSN 1931-0145, 1(2): 65-66.

[17] Yu, G. D. et al (2014); Multi-objective rescheduling model for product collaborative design considering disturbance, International journal of simulation modelling, ISSN 1726-4529, 13(4): 472-484.

[18] Gusel, L. R. et al (2015); Genetic based approach to predicting the elongation of drawn alloy, International journal of simulation modelling, ISSN 1726-4529, 14(1): 39-47.

[19] Prasad, K. et al (2016); A knowledge-based system for end mill selection, Advances in Production Engineering & Management, ISSN 1856-6250, 11(1): 15-28.
How to Cite
YAO, Haipeng; LIU, Yiqing; FANG, Chao. An Abnormal Network Traffic Detection Algorithm Based on Big Data Analysis. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, [S.l.], v. 11, n. 4, p. 567-579, july 2016. ISSN 1841-9844. Available at: <>. Date accessed: 13 july 2020. doi:


Anomaly Traffic Detection, Big Data, K-means, Decision Tree, Random Forest