Human-inspired Identification of High-level Concepts using OWA and Linguistic Quantifiers
AbstractIntelligent agent based system can be used to identify high-level concepts matching sets of keywords provided by users. A new human-inspired approach to concept identification in documents is introduced here. The proposed method takes keywords and builds concept structures based on them. These concept structures are represented as hierarchies of concepts (HofC). The ontology is used to enrich HofCs with terms and other concepts (sub-concepts) based on concept definitions, as well as with related concepts. Additionally, the approach uses levels of importance of terms defining the concepts. The levels of importance of terms are continuously updated based on a flow of documents using an Adaptive Assignment of Term Importance (AATI) schema. The levels of activation of concepts identified in a document that match these in the HofC are estimated using ordered weighted averaging (OWA) operators with linguistic quantifiers. A simple case study presented in the paper is designed to illustrate the approach.
 J. A. Gulla and P. G. Auran and K. M. Risvik, Linguistics in Large-Scale Web Search, in: Natural Language Processing and Information Systems, 218-222, 2002.
 A. Spink and D. Wolfram and M. B. J. Jansen and T. Saracevic, Searching the Web: the public and their queries, Journal of the American Society for Information Science and Technology, 52: 226-234, 2001.
 V. Vidulin, M. Lustrek and M. Gams, Training a genre classifier for automatic classification of Web pages, Journal of Computing and Information Technology, 15(4): 305-311, 2007.
 Y. Aphinyanaphongs, I Tsamardinos, A. Statnikov, H. Douglas and C. F. Aliferis, Text categorization models for high-quality article retrieval in internal medicine, Journal of the American Medical Information Association, 12(2): 207-216, 2005.
 A. Anagnostopoulos, A. Broder and K. Punera, Effective and efficient classification on a search-engine modeling, Knowledge and Information Systems, 16(2): 129-154, 2008.
 B. Choi and X. Peng, Dynamic and hierarchical classification of Web pages, Online Information Review, 28(2): 139-147, 2004.
 P. J. Anick, Adapting a full-text information retrieval system to the computer troubleshooting domain, Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 349-358, 1994.
 W. A. Woods, 1997, Conceptual indexing: a better way to organize knowledge, Technical Report: TR-97-61, Sun Microsystems, Inc. Mountain View, CA,USA. http://research.sun.com/techrep/1997/abstract-61.html,
 T. E. Gruber, A translation approach to portable ontology specifications, Knowledge Acquisition, 5: 199-220, 1993.
 G. A. Miller, WordNet: a lexical database for English, Communications of the ACM, 38(11): 39-41, 1995.
 E. M. Voorhees, Query expansion using lexical-semantic relations, Proceedings of the 17th Annual ACM SIGIR conference on research and development in information retrieval, New York, NY, USA, 61-69, 1994.
 Z. Gong, C.-W. Cheang and L. H. U, Web query expansion by WordNet, Lecture notes in computer science, 3588, 166-175, 2005.
 M. Baziz, M. Boughanem, N. Aussenac-Gilles, and C. Chrisment, Semantic Cores for Representing Documents in IR, Proceedings of 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, 1011-1017, 2005.
 S.G. Kolte and S.G. Bhirud, Word Sense Disambiguation UsingWordNet Domains, First International Conference on Emerging Trends in Engineering and Technology, 2008. ICETET
 Y.-B. Kim and Y.-S. Kim, Latent Semantic Kernels for WordNet: Transforming a Tree- Like Structure into a Matrix, ALPIT '08 International Conference on Advanced Language Processing and Web Information Technology, 76-80, July, 2008.
 K. Knight and R. Whitney, Ontology Creation and Use: SENSUS, Information sciences institute, University of Southern California, http://www.isi.edu/naturallanguage/ resources/sensus.html, 1997.
 N. Guarino, C. Masolo and G. Vetere, OntoSeek: Content-based Access to the Web, IEEE Intelligent Systems, 14(3): 70-80, 1999.
 S. E Lewis, Gene Ontology: looking backwards and forwards, Genome Biology, 6(1): 103, 2005.
 I. Spasic, E. Simeonidis, H. L. Messiha, N. W. Paton, and D. B. Kell, KiPar, a tool for systematic information retrieval regarding parameters for kinetic modelling of yeast metabolic pathways, Bioinformatics, 25(11): 1404-1411, 2009.
 D.W. Embley, Towards Semantic Understanding - An Approach Based on Information Extraction Ontologies, Proceedings of the Fifteenth Australasian Database Conference (ADC2004), 3-12, 2004.
 H.-M.Muller, E. E. Kenny, and P. W. Sternberg, Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature, PLoS Biology, 2(11): 1984-1998, 2004.
 H. B. Styltsvig, Ontology-based information retrieval (PHD Thesis), Denmark, 2006.
 Ph. Cimiano, P. Haase, M. Herold, M. Mantel, and P.Buitelaar, LexOnto: A Model for Ontology Lexicons for Ontology-based NLP, Proceedings of the OntoLex (From Text to Knowledge: The Lexicon/Ontology Interface) workshop at ISWC07 (International Semantic Web Conference), Busan, South-Korea, 2007.
 M. Morneau, G. W. Mineau, and D. Corbett, LexOnto: A Model for Ontology Lexicons for Ontology-based NLP, Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, Hongkong, China, 449-455, 2006.
 D. Vallet, M. Fernandez and P. Castells, An ontology-based information retrieval model, Proceedings of 2nd European Semantic Web Conference, ESWC 2005, Grete, Greece, 455- 470, June, 2005
 O. Dridi and M.B. Ahmed, Building an ontology-based framework for semantic information retrieval: application to breast cancer, 3rd International Conference on Information and Communication Technologies: From Theory to Applications, ICTTA 2008. , April, 1-6, 2008.
 H. Cunningham, D. Maynard, K. Bontcheva , V. Tablan, C. Ursu, M. Dimitrov, M.Dowman, N. Aswani, I.Roberts, Y. Li, A. Shafirin and A. Funk, Developing Language Processing Components with GATE, The University of Sheffield, April, 2009, http://gate.ac.uk/sale/tao/index.html.
 P. Castells, M. Fernandez and D. Vallet, An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval, IEEE Transactions on Knowledge and Data Engineering, 19(2): 261-272, 2007.
 S. L. Tomassen, Searching with Document Space Adapted Ontologies, in Emerging Technologies and Information Systems for the Knowledge Society, 5288, 513-522, 2008.
 V. Snasel, P. Moravec and J. Pokomy, WordNet Ontology Based Model for Web Retrieval, Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration, Tokyo,Japan, 220-225, April, 8-9, 2005.
 G. Antoniou and F. van Harmelen, A Semantic Web Primer(2nd Edition), The MIT Press, Cambridge, Massachusetts, London, England, 2008.
 B. McBride, RDF Primer, Aug., 2008, http://www.w3.org/TR/REC-rdf-syntax.
 R.R. Yager, On ordered weighted averaging aggregation operators in multi-criteria decision making, IEEE Transactions on Systems, Man and Cybernetics, 18: 183-190, 1988.
 L.A. Zadeh, A computational approach to fuzzy quantifiers in natural language, Computers and Mathematics with Applications 9: 149-184, 1983.
 R.R. Yager, Families of OWA operators, Fuzzy Sets and Systems, 59: 125-148, 1993.
 R.R.Yager, A Hierarchical Document Retrieval Language, Information Retrieval, 3: 357-377, 2000.
 Wikipedia, Power Iteration, May, 2008, http://en.wikipedia.org/wiki/Power_method.
 Z.Li and M.Reformat, A Schema for Ontology-based Concept Definition and Identification, International Journal of Computer Applications in Technology, 38(4): 333-345, 2010.
 D. Ferrucci and A. Lally, UIMA: an architectural approach to unstructured information processing in the corporate research environment, Natural Language Engineering, 10(3-4): 327-348, 2004.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.