A Parallel FP-Growth Mining Algorithm with Load Balancing Constraints for Traffic Crash Data


  • Yang Yang Beihang University
  • Na Tian Big data and transportation Digital Planning Research Center, China Design Group Co., Ltd, Nanjing
  • Yunpeng Wang Beihang University
  • Zhenzhou Yuan Beijing Jiaotong University




Traffic safety is an important part of the roadway in sustainable development. Freeway traffic crashes typically cause serious casualties and property losses, being a serious threat to public safety. Figuring out the potential correlation between various risk factors and revealing their coupling mechanisms are of effective ways to explore and identity freeway crash causes. However, the existing association rule mining algorithms still have some limitations in both efficiency and accuracy. Based on this consideration, using the freeway traffic crash data obtained from WDOT (Washington Department of Transportation), this research constructed a multi-dimensional multilevel system for traffic crash analysis. Considering the load balancing, the FP-Growth (Frequent Pattern- Growth) algorithm was optimized parallelly based on Hadoop platform, to achieve an efficient and accurate association rule mining calculation for massive amounts of traffic crash data; then, according to the results of the coupling mechanism among the crash precursors, the causes of freeway traffic crashes were identified and revealed. The results show that the parallel FPgrowth algorithm with load balancing constraints has a better operating speed than both the conventional FP-growth algorithm and parallel FP-growth algorithm towards processing big data. This improved algorithm makes full use of Hadoop cluster resources and is more suitable for large traffic crash data sets mining while retaining the original advantages of conventional association rule mining algorithm. In addition, the mining association rules model with the improvement of multi-dimensional interaction proposed in this research can catch the occurrence mechanism of freeway traffic crash with serious consequences (lower support degree probably) accurately and efficiently.


Yang Y., Wang K., Yuan Z.; Liu D. (2022). Predicting freeway traffic crash severity using XGBoost-Bayesian network model with consideration of features interaction[J]. Journal of Advanced Transportation, 2022: 4257865. https://doi.org/10.1155/2022/4257865.

Yahya T.; Mohammed A.; Ghaith A.; Dirar A.D.; Noor Z.; Omar D.(2022). International Journal of Computers Communications & Control[J]. 2022, Vol. 17, No.3. https://doi.org/10.15837/ijccc.2022.3.4482.

Hamed M. M; Al-Eideh B. M. (2020). An exploratory analysis of traffic accidents and vehicle ownership decisions using a random parameters logit model with heterogeneity in means[J]. Analytic methods in accident research, 2020, 25: 100116. https://doi.org/10.1016/j.amar.2020.100116.

Yu Q. (2013). Causes and prevention measures of secondary rear-end accidents in the rescue of highway traffic accidents[J]. Procedia Engineering, 2013, 52: 571-577. https://doi.org/10.1016/j.proeng.2013.02.187.

Kwayu K. M.; Kwigizile V.; Lee K.; Oh J. (2020). Discovering latent themes in traffic fatal crash narratives using text mining analytics and network topology[J]. Accident Analysis & Prevention, 2020, 150:105899. https://doi.org/10.1016/j.aap.2020.105899.

Sun J.; Li T.; Li F.; Chen F. (2016). Analysis of safety factors for urban expressways considering the effect of congestion in Shanghai, China[J]. Accident Analysis & Prevention, 2016, 95: 503-511. https://doi.org/10.1016/j.aap.2015.12.011.

Yang Y.; Yuan Z.; Meng R. (2022). Exploring traffic crash occurrence mechanism towards crossarea freeways via an improved data mining approach[J]. Journal of Transportation Engineering Part A Systems, 2022. https://doi.org/10.1061/JTEPBS.0000698.

Yang Y.; Yuan Z.; Chen J.; Guo M. (2017). Assessment of osculating value method based on entropy weight to transportation energy conservation and emission reduction [J]. Environmental Engineering

& Management Journal. 16 (2017) 2413-2424. https://doi.org/10.30638/eemj.2017.249.

Chen S.; Luo X.; Li X.; Fu X. (2022). Risk Management of Road Engineering Project Based on Analytic Hierarchy Process[J]. Technical gazette, 2022(29): 2. https://doi.org/10.17559/TV- 20210410091404.

Singh S.; Garg R.; Mishra P.K. (2018). Performance Optimization of MapReduce-based Apriori Algorithm on Hadoop Cluster[J]. Computers & Electrical Engineering, 2018:348-364. https://doi.org/10.1016/j.compeleceng.2017.10.008.

Wang H.; Parrish A.; Smith R. K.; Vrbsky S. (2005). Improved variable and value ranking techniques for mining categorical traffic accident data[J]. Expert Systems with Applications, 2005, 29(4): 795-806. https://doi.org/10.1016/j.eswa.2005.06.007.

Yu L.; Du B.; Hu X.; Han L.; Lv W. (2021). Deep spatio-temporal graph convolutional network for traffic accident prediction[J]. Neurocomputing, 2021, 423: 135-147. https://doi.org/10.1016/j.neucom.2020.09.043.

Jiang F.; Yuen K. K. R.; Lee E. W. M. (2020). Analysis of motorcycle accidents using association rule mining-based framework with parameter optimization and GIS technology[J]. Journal of safety research, 2020, 75: 292-309. https://doi.org/10.1016/j.jsr.2020.09.004.

Huang Y.; Huang J.; Liu C.; Zhang C. (2020). PFPMine: A parallel approach for discovering interacting data entities in data-intensive cloud workflows[J]. Future Generation Computer Systems, 2020, 113: 474-487. https://doi.org/10.1016/j.future.2020.07.018.

Samerei S. A.; Aghabayk K.; Mohammadi A.; Shiwakoti N. (2021). Data mining approach to model bus crash severity in Australia[J]. Journal of safety research, 2021, 76: 73-82. https://doi.org/10.1016/j.jsr.2020.12.004.

Montella A.; de Oña R.; Mauriello F.; Riccardi M. R.; Silvestro G. (2020) A data mining approach to investigate patterns of powered two-wheeler crashes in Spain[J]. Accident Analysis & Prevention, 2020, 134: 105251. https://doi.org/10.1016/j.aap.2019.07.027.

Zheng Z.; Lu P.; Lantz B. (2018). Commercial truck crash injury severity analysis using gradient boosting data mining model[J]. Journal of safety research, 2018, 65: 115-124. https://doi.org/10.1016/j.jsr.2018.03.002.

Bechini A.; Marcelloni F.; Segatori A. (2016). A Map Reduce solution for associative classification of big data[J]. Information Sciences: An International Journal, 2016, 332:33-55. https://doi.org/10.1016/j.ins.2015.10.041.

Pramudiono I.; Kitsuregawa M. (2003) Parallel FP-Growth on PC Cluster. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science, vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_47.

Zhou L.; Zhong Z.; Chang J.; Li J.; Huang J. Z.; Feng S. (2010). Balanced parallel FP-Growth with MapReduce Information Computing and Telecommunications (YC-ICT),2010 IEEE Youth Conference on 28-30 Nov.2010: 243 -246. https://doi.org/10.1109/YCICT.2010.5713090.

Yang Y.; He K.; Wang Y.; Yuan Z.; Yin Y.; Guo M. (2022). Identification of dynamic traffic crash risk for cross-area freeways based on statistical and machine learning methods[J]. Physica A: Statistical Mechanics and its Applications, 595(2022): 127083. https://doi.org/10.1016/j.physa.2022.127083.

Yang Y.; Tian N.; Wang Y.; Yuan Z. (2022). A Parallel FP-Growth Mining Algorithm with Load Balancing Constraints for Traffic Crash Data, 01 March 2022, PREPRINT (Version 1) available at Research Square. https://doi.org/10.21203/rs.3.rs-1311180/v1.

Additional Files



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.