A Latent-Dirichlet-Allocation Based Extension for Domain Ontology of Enterprise’s Technological Innovation

  • Qianqian Zhang Beijing Jiaotong University
  • Shifeng Liu Beijing Jiaotong University
  • Daqing Gong Beijing Jiaotong University
  • Qun Tu Beijing Jiaotong University


This paper proposed a method for building enterprise's technological innovation domain ontology automatically from plain text corpus based on Latent Dirichlet Allocation (LDA). The proposed method consisted of four modules: 1) introducing the seed ontology for domain of enterprise's technological innovation, 2) using Natural Language Processing (NLP) technique to preprocess the collected textual data, 3) mining domain specific terms from document collections based on LDA, 4) obtaining the relationship between the terms through the defined relevant rules. The experiments have been carried out to demonstrate the effectiveness of this method and the results indicated that many terms in domain of enterprise's technological innovation and the semantic relations between terms are discovered. The proposed method is a process of continuously cycles and iterations, that is the obtained objective ontology can be re-iterated as initial seed ontology. The constant knowledge acquisition in the domain of enterprise's technological innovation to update and perfect the initial seed ontology.


Latent Dirichlet Allocation (LDA), ontology extension, enterprise’s technological innovation, semantic web, text mining