A Parallel FP-Growth Mining Algorithm with Load Balancing Constraints for Traffic Crash Data
Traffic safety is an important part of the roadway in sustainable development. Freeway traffic crashes typically cause serious casualties and property losses, being a serious threat to public safety. Figuring out the potential correlation between various risk factors and revealing their coupling mechanisms are of effective ways to explore and identity freeway crash causes. However, the existing association rule mining algorithms still have some limitations in both efficiency and accuracy. Based on this consideration, using the freeway traffic crash data obtained from WDOT (Washington Department of Transportation), this research constructed a multi-dimensional multilevel system for traffic crash analysis. Considering the load balancing, the FP-Growth (Frequent Pattern- Growth) algorithm was optimized parallelly based on Hadoop platform, to achieve an efficient and accurate association rule mining calculation for massive amounts of traffic crash data; then, according to the results of the coupling mechanism among the crash precursors, the causes of freeway traffic crashes were identified and revealed. The results show that the parallel FPgrowth algorithm with load balancing constraints has a better operating speed than both the conventional FP-growth algorithm and parallel FP-growth algorithm towards processing big data. This improved algorithm makes full use of Hadoop cluster resources and is more suitable for large traffic crash data sets mining while retaining the original advantages of conventional association rule mining algorithm. In addition, the mining association rules model with the improvement of multi-dimensional interaction proposed in this research can catch the occurrence mechanism of freeway traffic crash with serious consequences (lower support degree probably) accurately and efficiently.
Yang Y., Wang K., Yuan Z.; Liu D. (2022). Predicting freeway traffic crash severity using XGBoost-Bayesian network model with consideration of features interaction[J]. Journal of Advanced Transportation, 2022: 4257865. https://doi.org/10.1155/2022/4257865.
Yahya T.; Mohammed A.; Ghaith A.; Dirar A.D.; Noor Z.; Omar D.(2022). International Journal of Computers Communications & Control[J]. 2022, Vol. 17, No.3. https://doi.org/10.15837/ijccc.2022.3.4482.
Hamed M. M; Al-Eideh B. M. (2020). An exploratory analysis of traffic accidents and vehicle ownership decisions using a random parameters logit model with heterogeneity in means[J]. Analytic methods in accident research, 2020, 25: 100116. https://doi.org/10.1016/j.amar.2020.100116.
Yu Q. (2013). Causes and prevention measures of secondary rear-end accidents in the rescue of highway traffic accidents[J]. Procedia Engineering, 2013, 52: 571-577. https://doi.org/10.1016/j.proeng.2013.02.187.
Kwayu K. M.; Kwigizile V.; Lee K.; Oh J. (2020). Discovering latent themes in traffic fatal crash narratives using text mining analytics and network topology[J]. Accident Analysis & Prevention, 2020, 150:105899. https://doi.org/10.1016/j.aap.2020.105899.
Sun J.; Li T.; Li F.; Chen F. (2016). Analysis of safety factors for urban expressways considering the effect of congestion in Shanghai, China[J]. Accident Analysis & Prevention, 2016, 95: 503-511. https://doi.org/10.1016/j.aap.2015.12.011.
Yang Y.; Yuan Z.; Meng R. (2022). Exploring traffic crash occurrence mechanism towards crossarea freeways via an improved data mining approach[J]. Journal of Transportation Engineering Part A Systems, 2022. https://doi.org/10.1061/JTEPBS.0000698.
Yang Y.; Yuan Z.; Chen J.; Guo M. (2017). Assessment of osculating value method based on entropy weight to transportation energy conservation and emission reduction [J]. Environmental Engineering
& Management Journal. 16 (2017) 2413-2424. https://doi.org/10.30638/eemj.2017.249.
Chen S.; Luo X.; Li X.; Fu X. (2022). Risk Management of Road Engineering Project Based on Analytic Hierarchy Process[J]. Technical gazette, 2022(29): 2. https://doi.org/10.17559/TV- 20210410091404.
Singh S.; Garg R.; Mishra P.K. (2018). Performance Optimization of MapReduce-based Apriori Algorithm on Hadoop Cluster[J]. Computers & Electrical Engineering, 2018:348-364. https://doi.org/10.1016/j.compeleceng.2017.10.008.
Wang H.; Parrish A.; Smith R. K.; Vrbsky S. (2005). Improved variable and value ranking techniques for mining categorical traffic accident data[J]. Expert Systems with Applications, 2005, 29(4): 795-806. https://doi.org/10.1016/j.eswa.2005.06.007.
Yu L.; Du B.; Hu X.; Han L.; Lv W. (2021). Deep spatio-temporal graph convolutional network for traffic accident prediction[J]. Neurocomputing, 2021, 423: 135-147. https://doi.org/10.1016/j.neucom.2020.09.043.
Jiang F.; Yuen K. K. R.; Lee E. W. M. (2020). Analysis of motorcycle accidents using association rule mining-based framework with parameter optimization and GIS technology[J]. Journal of safety research, 2020, 75: 292-309. https://doi.org/10.1016/j.jsr.2020.09.004.
Huang Y.; Huang J.; Liu C.; Zhang C. (2020). PFPMine: A parallel approach for discovering interacting data entities in data-intensive cloud workflows[J]. Future Generation Computer Systems, 2020, 113: 474-487. https://doi.org/10.1016/j.future.2020.07.018.
Samerei S. A.; Aghabayk K.; Mohammadi A.; Shiwakoti N. (2021). Data mining approach to model bus crash severity in Australia[J]. Journal of safety research, 2021, 76: 73-82. https://doi.org/10.1016/j.jsr.2020.12.004.
Montella A.; de Oña R.; Mauriello F.; Riccardi M. R.; Silvestro G. (2020) A data mining approach to investigate patterns of powered two-wheeler crashes in Spain[J]. Accident Analysis & Prevention, 2020, 134: 105251. https://doi.org/10.1016/j.aap.2019.07.027.
Zheng Z.; Lu P.; Lantz B. (2018). Commercial truck crash injury severity analysis using gradient boosting data mining model[J]. Journal of safety research, 2018, 65: 115-124. https://doi.org/10.1016/j.jsr.2018.03.002.
Bechini A.; Marcelloni F.; Segatori A. (2016). A Map Reduce solution for associative classification of big data[J]. Information Sciences: An International Journal, 2016, 332:33-55. https://doi.org/10.1016/j.ins.2015.10.041.
Pramudiono I.; Kitsuregawa M. (2003) Parallel FP-Growth on PC Cluster. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science, vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_47.
Zhou L.; Zhong Z.; Chang J.; Li J.; Huang J. Z.; Feng S. (2010). Balanced parallel FP-Growth with MapReduce Information Computing and Telecommunications (YC-ICT),2010 IEEE Youth Conference on 28-30 Nov.2010: 243 -246. https://doi.org/10.1109/YCICT.2010.5713090.
Yang Y.; He K.; Wang Y.; Yuan Z.; Yin Y.; Guo M. (2022). Identification of dynamic traffic crash risk for cross-area freeways based on statistical and machine learning methods[J]. Physica A: Statistical Mechanics and its Applications, 595(2022): 127083. https://doi.org/10.1016/j.physa.2022.127083.
Yang Y.; Tian N.; Wang Y.; Yuan Z. (2022). A Parallel FP-Growth Mining Algorithm with Load Balancing Constraints for Traffic Crash Data, 01 March 2022, PREPRINT (Version 1) available at Research Square. https://doi.org/10.21203/rs.3.rs-1311180/v1.
Copyright (c) 2022 Yang Yang, Na Tian, Yunpeng Wang, Zhenzhou Yuan
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.