Revealing New Technologies in Ocean Engineering Research using Machine Learning


  • Xin Li Jilin University
  • Yanchun Liang
  • Biqian Chen
  • Baorun He
  • Yu Jiang


Ocean Engineering, Latent Dirichlet Allocation, Machine Learning


On par with aerospace engineering, ocean engineering has caught a lot of attention re-cently. In this paper we employ machine learning and natural language processing methods to reveal new technologies and research hotspots in the ocean engineering field. Our data collection includes 14 high-impact journals, and the abstracts of almost 30,000 papers pub- lished from 2010 to 2019. We employed two topic models, Latent Dirichlet Allocation (LDA) and PhraseLDA. Used independently, the LDA model may lack interpretability and the PhraseLDA result may lose information in the final topics. We hence combined these two models and discovered the research hotspots for each year using affinity propagation cluster- ing and word-cloud-based visualization. The results reveal that several topics such as "wind power" and "ship structure", areas such as the European and Arctic seas, and some common research methods are increasing in popularity. This work consists of data collection, topic modelling, clustering, and visualization, which can help researchers understand the trends and important topics in ocean engineering as well as other fields.


[1] H. Qin, C. Wang, Y. Jiang, Z. C. Deng, and W. Zhang. (2018); Trend prediction of the 3D thermocline"s lateral boundary based on the SVR method, EURASIP Journal on Wireless Communications and Networking, 1, pp.252, 2018.

[2] Y. Jiang, M. Zhao, C. Hu, L. He, H. Bai, and J. Wang. (2019) A parallel FP-growth algorithm onWorld Ocean Atlas data with multi-core CPU, The journal of Supercomputing, 2, 732-745, 2019.

[3] M. H. Zhao, C. Q. Hu, F.L. Wei, K. Wang, C. Wang, Y. Jiang. (2019) Real-Time Underwa- ter Image Recognition with FPGA Embedded System for Convolutional Neural Network, Sensors, 2, pp.350 2019.

[4] Y. Jiang, Y. Gou, T. Zhang, K. Wang, C. Hu. (2017) A machine learning approach to argo data analysis in a thermocline, Sensors,10, pp.2225, 2017.

[5] H. D. Qin, H. Chen, and Y.C. Sun. (2019) Distributed finite-time fault-tolerant contain- ment control for multiple Ocean Bottom Flying Nodes, Journal of the Franklin Institute, doi:10.1016/j.jfranklin.2019.05.034, 2019

[6] Y. Jiang, T. Zhang, Y. Gou, L. He, H. Bai, and C. Hu.(2018) High-resolution tempera- ture and salinity model analysis using support vector regression, J. Ambient Intell. Hum. Comput, 1-9, 2018.

[7] H. D. Qin, H. Chen, Y.C. Sun, L.L. Chen. (2019) Distributed finite-time fault-tolerant containment control for multiple ocean Bottom Flying node systems with error constraints, Ocean Engineering, doi: 10.1016 /j.oceaneng.2019.106341, 2019

[8] P. Kujala et al.(2019) Review of risk-based design for ice-class ships, Mar. Struct., 63, 181-195, 2019

[9] O. Hizir, M. Kim, O. Turan, A. Day, A. Incecik, and Y. Lee. (2019) Numerical studies on non-linearity of added resistance and ship motions of KVLCC2 in short and long waves, Int. J. Nav. Archit. Ocean Eng., 1, 143-153, Jan. 2019.

[10] A. M. Cohen (2005) A survey of current work in biomedical text mining, Brief. Bioinform., 1, 57-71, 2005.

[11] M. Rei. (2017) Semi-supervised Multitask Learning for Sequence Labeling, in Proc. 55th ACL 2017, 2121-2130, 2017.

[12] Y. Chen, H. Zhang, R. Liu, Z. Ye, and J. Lin (2019) Experimental explorations on short text topic mining between LDA and NMF based schemes, Knowl.-Based Syst., 163, 1-13, 2019.

[13] D. D. Lee and H. S. Seung(2001) Algorithms for non-negative matrix factorization, Adv. Neural Inf. Process. Syst., 556-562, 2001.

[14] D. M. Blei, A. Y. Ng, and M. I. Jordan. (2003) Latent Dirichlet allocation, J. Mach. Learn. Res., 3, 993-1022, 2003.

[15] A. El-Kishky, Y. Song, C. Wang, C. Voss, and J. Han (2014) Scalable topical phrase mining from text corpora, in Proc. VLDB endowment, 3, 305-316, 2014.

[16] Y.K Tang, X.L Mao, and H.Y Huang. (2016) Labeled phrase latent Dirichlet allocation, in Proc. WISE 2016, 525-536, 2016

[17] B. J. Frey and D. Dueck. (2017) Clustering by passing messages between data points, Science, 5814, 972-976, Feb. 2007.

[18] [8] Scientific Journal Rankings website, 2019.

[19] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer. (2016) Neural architectures for named entity recognition, in Proc. NAACL-HLT, 260-270, 2016.

[20] WindEurope, WindEurope-Annual-Statistics-2018," 2018.

Additional Files



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.