A Latent-Dirichlet-Allocation Based Extension for Domain Ontology of Enterprise’s Technological Innovation

Qianqian Zhang, Shifeng Liu, Daqing Gong, Qun Tu


This paper proposed a method for building enterprise's technological innovation domain ontology automatically from plain text corpus based on Latent Dirichlet Allocation (LDA). The proposed method consisted of four modules: 1) introducing the seed ontology for domain of enterprise's technological innovation, 2) using Natural Language Processing (NLP) technique to preprocess the collected textual data, 3) mining domain specific terms from document collections based on LDA, 4) obtaining the relationship between the terms through the defined relevant rules. The experiments have been carried out to demonstrate the effectiveness of this method and the results indicated that many terms in domain of enterprise's technological innovation and the semantic relations between terms are discovered. The proposed method is a process of continuously cycles and iterations, that is the obtained objective ontology can be re-iterated as initial seed ontology. The constant knowledge acquisition in the domain of enterprise's technological innovation to update and perfect the initial seed ontology.


Latent Dirichlet Allocation (LDA), ontology extension, enterprise’s technological innovation, semantic web, text mining

Full Text:



Bisson, G.; Nédellec, C. Canamero, D.(2000); Designing Clustering Methods for Ontology Building-The Mo'K Workbench, ECAI workshop on ontology learning, 31, 2000.

Blei, D.M.; Ng, A.Y.; Jordan, M.I. (2003); Latent dirichlet allocation, Journal of machine Learning research, 3(Jan), 993–1022, 2003.

Bradford, R.B. (2006); Relationship discovery in large text collections using latent semantic indexing, Proceedings of the Fourth Workshop on Link Analysis, Counterterrorism, and Security, 2006.

Bradford, R.B. (2005); Efficient discovery of new information in large text databases, International Conference on Intelligence and Security Informatics, 374–380, 2005.

Burgelman, R.A.; Maidique, M.A.; Wheelwright, S.C. (1996); Strategic Management of Technology and Innovation, Chicago,IL:lrwin, 1996.

Cimiano, P.; and Völker, J. (2005); text2onto, International conference on application of natural language to information systems, 227–238, 2005.

Colace, F.; De Santo, M.; Greco, L.; Amato, F.; Moscato, V.; Picariello, A. (2014); Terminological ontology learning and population using latent dirichlet allocation, Journal of Visual Languages & Computing, 25(6), 818-826, 2014.

Dai, Y.; Wu, W.; Zhou, H.B.; Zhang, J.; Ma, F.Y. (2018); Numerical simulation and optimization of oil jet lubrication for rotorcraft meshing gears, International Journal of Simulation Modelling, 17(2), 318–326, 2018.

Dai, Y.; Zhu, X.; Zhou, H.; Mao, Z.; Wu, W.(2018); Trajectory tracking control for seafloor tracked vehicle by adaptive neural-fuzzy inference system algorithm, International Journal of Computers, Communications & Control 13(4), 465–476, 2018.

De Knijff, J.; Frasincar, F.;Hogenboom, F. (2013); Domain taxonomy learning from text: The subsumption method versus hierarchical clustering Data & Knowledge Engineering, 83, 54-69, 2013.

Dellschaft, K; Staab, S. (2008); Strategies for the evaluation of ontology learning, Ontology Learning and Population, 167, 253–272, 2008.

Deng, L; Wang, X; Lin, Y; He, F.Z. (2005); Model of Multiple Fuzzy Synthetical Evaluation for Enterprise Technology Innovation, Journal of Chongqing University (Natural Science Edition), 7, 004, 2005.

Guan, J.C.; Yam, R.C.; Mok, C.K.; Ma, N. (2006); A study of the relationship between competitiveness and technological innovation capability based on DEA models, European Journal of Operational Research, 170(3), 971-986, 2006.

Guarino, N.; Poli, R. (1993); Toward principles for the design of ontologies used for knowledge sharing, In Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer Academic Publishers, in press. Substantial revision of paper presented at the International Workshop on Formal Ontology, 1993.

Hennig, L. (2009); Topic-based multi-document summarization with probabilistic latent semantic analysis, Proceedings of the International Conference RANLP-2009, 144–149, 2009.

Hofmann, T. (2001); Unsupervised learning by probabilistic latent semantic analysis, Machine learning, 42(1-2), 177–196, 2001.

Khan, L.; Luo, F. (2002); Ontology construction for information selection, Proceeding of Tools with Artificial Intelligence, 122-127, 2002.

Lee, C.S.; Kao, Y.F.; Kuo, Y.H.; Wang, M. H. (2007); Automated ontology construction for unstructured text documents, Data & Knowledge Engineering, 60(3), 547–566, 2007.

Liu, Q.; Zhang, H.; Yu, H.; Cheng, X. (2004); Chinese lexical analysis using cascaded hidden markov model, Journal of Computer Research and Development, 41(8), 1421–1429, 2004.

Ni, N.; Liu, K.; Li, Y. (2011); An automatic multi-domain thesauri construction method based on lda, 2011 10th International Conference on Machine Learning and Applications Workshops, 235-240, 2011.

Raghuveer, K. (2012); Legal documents clustering using latent dirichlet allocation, International Journal of Applied Information Systems, 2(1), 34-37, 2012.

Saunila, M.; Ukko, J. (2012); A conceptual for the measurement of innovation capability and its effects, Baltic Journal of Management, 7(4), 355–375, 2012.

Tho, Q.T.; Hui, S.C.; Fong, A.C.M.; Cao, T.H. (2006); Automatic fuzzy ontology generation for semantic web, IEEE transactions on knowledge and data engineering, 18(6), 842-856, 2006.

Tsai, M.T; Chuang, S.S; Hsieh W.P. (2008); Using Analytic Hierarchy Process to Evaluate Organizational Innovativeness in High-Tech Industry, Decision Sciences Institute 2008 Annual Meeting (DSI), 1231-1236, 2008.

Wang, T. J; Chang, L. (2011); The development of the enterprise innovation value diagnosis system with the use of systems engineering, System Science and Engineering (ICSSE), 2011 International Conference on IEEE, 373–378, 2011.

Wang, C; Lu, I; Chen, C. (2008); Evaluating firm technological innovation capability under uncertainty, Technovation, 28(6), 349–363, 2008.

Wei, W.; Guo, C.; Chen, J.; Tang, L.; Sun, L. (2017); CCODM: conditional co-occurrence degree matrix document representation method, Soft Computing, 1-17, 2017.

Wei, W.; Guo, C.; Chen, J.;Zhang, Z. (2017); Textual topic evolution analysis based on term co-occurrence: A case study on the government work report of the State Council (1954–2017), Intelligent Systems and Knowledge Engineering, 1-6, 2017.

Yeh, J.H.; Yang, N. (2008); Ontology construction based on latent topic extraction in a digital library, International Conference on Asian Digital Libraries, 93–103, 2008.

Yliherva, J. (2004); Management model of an organization's innovation capabilities; development of innovation capabilities as part of the management system, dissertation, Department of Industrial Engineering and Management, University of Oulu.

Zhang, W.; Zhang, Z.; Chao, H.C.; Tseng, F.H. (2018); Kernel mixture model for probability density estimation in Bayesian classifiers. Data Mining and Knowledge Discovery, Data Mining and Knowledge Discovery, 32(3), 675–707, 2018.

Zhang, W.; Zhang, Z.; Qi, D.; Liu, Y. (2014); Automatic crack detection and classification method for subway tunnel safety monitoring, Sensors, 14(10), 19307–19328, 2014.

Zhao, W.; Zeng, Y. (2011); Construction and design of evaluation index system of innovative enterprises on innovative capacities, Science and Technology Management Research, 1, 005, 2011.

Zavitsanos, E.; Paliouras, G.; Vouros, G.A.; Petridis, S. (2010); Learning subsumption hierarchies of ontology concepts from texts, Web Intelligence and Agent Systems: An International Journal, 8(1), 37-51, 2010.

Zavitsanos, E.; Paliouras, G.; Vouros, G.A.; Petridis, S. (2010); Discovering subsumption hierarchies of ontology concepts from text corpora, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, 402–408, 2007.

DOI: https://doi.org/10.15837/ijccc.2019.1.3366

Copyright (c) 2019 Qianqian Zhang, Shifeng Liu, Daqing Gong, Qun Tu

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

IJCCC is an Open Access Journal : CC-BY-NC.

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]

INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2018: IF=1.585..

IJCCC is indexed in Scopus from 2008 (CiteScore2018 = 1.56):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.


 Impact Factor in JCR2018 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.585 (Q3). Scopus: CiteScore2018=1.56 (Q2);

SCImago Journal & Country Rank

Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.