Human-inspired Identification of High-level Concepts using OWA and Linguistic Quantifiers

Authors

  • Marek Z. Reformat thinkS2: thinking Software and Systems laboratory Electrical and Computer Engineering University of Alberta, Canada
  • Ronald R. Yager 1. Machine Intelligence Institute Iona Collage, New Rochelle, NY, USA 2. Visiting Distinguished Scientist King Saud University, Riyadh, Saudi Arabia
  • Zhan Li thinkS2: thinking Software and Systems laboratory Electrical and Computer Engineering University of Alberta, Canada
  • Naif Alajlan Advanced Lab for Intelligent Systems Research College of Computer and Information Sciences King Saud University, Riyadh, Saudi Arabia

Keywords:

concept identification, text documents, ontology, hierarchy of concepts, ordered weighted averaging operator, importance of concepts

Abstract

Intelligent agent based system can be used to identify high-level concepts matching sets of keywords provided by users. A new human-inspired approach to concept identification in documents is introduced here. The proposed method takes keywords and builds concept structures based on them. These concept structures are represented as hierarchies of concepts (HofC). The ontology is used to enrich HofCs with terms and other concepts (sub-concepts) based on concept definitions, as well as with related concepts. Additionally, the approach uses levels of importance of terms defining the concepts. The levels of importance of terms are continuously updated based on a flow of documents using an Adaptive Assignment of Term Importance (AATI) schema. The levels of activation of concepts identified in a document that match these in the HofC are estimated using ordered weighted averaging (OWA) operators with linguistic quantifiers. A simple case study presented in the paper is designed to illustrate the approach.

References

H.M.Haav and T.-L. Lubi, A survery of concept-based information retrieval tools on the web, Proceedings of 5th East-European conference ADBIS, Vilnius, Lithuania, 29-41, 2001.

J. A. Gulla and P. G. Auran and K. M. Risvik, Linguistics in Large-Scale Web Search, in: Natural Language Processing and Information Systems, 218-222, 2002.

A. Spink and D. Wolfram and M. B. J. Jansen and T. Saracevic, Searching the Web: the public and their queries, Journal of the American Society for Information Science and Technology, 52: 226-234, 2001. http://dx.doi.org/10.1002/1097-4571(2000)9999:99993.0.CO;2-R

V. Vidulin, M. Lustrek and M. Gams, Training a genre classifier for automatic classification of Web pages, Journal of Computing and Information Technology, 15(4): 305-311, 2007. http://dx.doi.org/10.1109/iti.2007.4283750

Y. Aphinyanaphongs, I Tsamardinos, A. Statnikov, H. Douglas and C. F. Aliferis, Text categorization models for high-quality article retrieval in internal medicine, Journal of the American Medical Information Association, 12(2): 207-216, 2005. http://dx.doi.org/10.1197/jamia.M1641

A. Anagnostopoulos, A. Broder and K. Punera, Effective and efficient classification on a search-engine modeling, Knowledge and Information Systems, 16(2): 129-154, 2008. http://dx.doi.org/10.1007/s10115-007-0102-6

B. Choi and X. Peng, Dynamic and hierarchical classification of Web pages, Online Information Review, 28(2): 139-147, 2004. http://dx.doi.org/10.1108/14684520410531673

P. J. Anick, Adapting a full-text information retrieval system to the computer troubleshooting domain, Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 349-358, 1994. http://dx.doi.org/10.1007/978-1-4471-2099-5_36

W. A. Woods, 1997, Conceptual indexing: a better way to organize knowledge, Technical Report: TR-97-61, Sun Microsystems, Inc. Mountain View, CA,USA. http://research.sun.com/techrep/1997/abstract-61.html,

T. E. Gruber, A translation approach to portable ontology specifications, Knowledge Acquisition, 5: 199-220, 1993. http://dx.doi.org/10.1006/knac.1993.1008

G. A. Miller, WordNet: a lexical database for English, Communications of the ACM, 38(11): 39-41, 1995. http://dx.doi.org/10.1145/219717.219748

E. M. Voorhees, Query expansion using lexical-semantic relations, Proceedings of the 17th Annual ACM SIGIR conference on research and development in information retrieval, New York, NY, USA, 61-69, 1994. http://dx.doi.org/10.1007/978-1-4471-2099-5_7

Z. Gong, C.-W. Cheang and L. H. U, Web query expansion by WordNet, Lecture notes in computer science, 3588, 166-175, 2005. http://dx.doi.org/10.1007/11546924_17

M. Baziz, M. Boughanem, N. Aussenac-Gilles, and C. Chrisment, Semantic Cores for Representing Documents in IR, Proceedings of 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, 1011-1017, 2005. http://dx.doi.org/10.1145/1066677.1066911

S.G. Kolte and S.G. Bhirud, Word Sense Disambiguation UsingWordNet Domains, First International Conference on Emerging Trends in Engineering and Technology, 2008. ICETET

Y.-B. Kim and Y.-S. Kim, Latent Semantic Kernels for WordNet: Transforming a Tree- Like Structure into a Matrix, ALPIT '08 International Conference on Advanced Language Processing and Web Information Technology, 76-80, July, 2008.

K. Knight and R. Whitney, Ontology Creation and Use: SENSUS, Information sciences institute, University of Southern California, http://www.isi.edu/naturallanguage/ resources/sensus.html, 1997.

N. Guarino, C. Masolo and G. Vetere, OntoSeek: Content-based Access to the Web, IEEE Intelligent Systems, 14(3): 70-80, 1999. http://dx.doi.org/10.1109/5254.769887

S. E Lewis, Gene Ontology: looking backwards and forwards, Genome Biology, 6(1): 103, 2005. http://dx.doi.org/10.1186/gb-2004-6-1-103

I. Spasic, E. Simeonidis, H. L. Messiha, N. W. Paton, and D. B. Kell, KiPar, a tool for systematic information retrieval regarding parameters for kinetic modelling of yeast metabolic pathways, Bioinformatics, 25(11): 1404-1411, 2009. http://dx.doi.org/10.1093/bioinformatics/btp175

D.W. Embley, Towards Semantic Understanding - An Approach Based on Information Extraction Ontologies, Proceedings of the Fifteenth Australasian Database Conference (ADC2004), 3-12, 2004.

H.-M.Muller, E. E. Kenny, and P. W. Sternberg, Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature, PLoS Biology, 2(11): 1984-1998, 2004. http://dx.doi.org/10.1371/journal.pbio.0020309

H. B. Styltsvig, Ontology-based information retrieval (PHD Thesis), Denmark, 2006.

Ph. Cimiano, P. Haase, M. Herold, M. Mantel, and P.Buitelaar, LexOnto: A Model for Ontology Lexicons for Ontology-based NLP, Proceedings of the OntoLex (From Text to Knowledge: The Lexicon/Ontology Interface) workshop at ISWC07 (International Semantic Web Conference), Busan, South-Korea, 2007.

M. Morneau, G. W. Mineau, and D. Corbett, LexOnto: A Model for Ontology Lexicons for Ontology-based NLP, Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, Hongkong, China, 449-455, 2006.

D. Vallet, M. Fernandez and P. Castells, An ontology-based information retrieval model, Proceedings of 2nd European Semantic Web Conference, ESWC 2005, Grete, Greece, 455- 470, June, 2005 http://dx.doi.org/10.1007/11431053_31

O. Dridi and M.B. Ahmed, Building an ontology-based framework for semantic information retrieval: application to breast cancer, 3rd International Conference on Information and Communication Technologies: From Theory to Applications, ICTTA 2008. , April, 1-6, 2008.

H. Cunningham, D. Maynard, K. Bontcheva , V. Tablan, C. Ursu, M. Dimitrov, M.Dowman, N. Aswani, I.Roberts, Y. Li, A. Shafirin and A. Funk, Developing Language Processing Components with GATE, The University of Sheffield, April, 2009, http://gate.ac.uk/sale/tao/index.html.

P. Castells, M. Fernandez and D. Vallet, An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval, IEEE Transactions on Knowledge and Data Engineering, 19(2): 261-272, 2007. http://dx.doi.org/10.1109/TKDE.2007.22

S. L. Tomassen, Searching with Document Space Adapted Ontologies, in Emerging Technologies and Information Systems for the Knowledge Society, 5288, 513-522, 2008.

V. Snasel, P. Moravec and J. Pokomy, WordNet Ontology Based Model for Web Retrieval, Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration, Tokyo,Japan, 220-225, April, 8-9, 2005. http://dx.doi.org/10.1109/wiri.2005.38

G. Antoniou and F. van Harmelen, A Semantic Web Primer(2nd Edition), The MIT Press, Cambridge, Massachusetts, London, England, 2008.

B. McBride, RDF Primer, Aug., 2008, http://www.w3.org/TR/REC-rdf-syntax.

R.R. Yager, On ordered weighted averaging aggregation operators in multi-criteria decision making, IEEE Transactions on Systems, Man and Cybernetics, 18: 183-190, 1988. http://dx.doi.org/10.1109/21.87068

L.A. Zadeh, A computational approach to fuzzy quantifiers in natural language, Computers and Mathematics with Applications 9: 149-184, 1983. http://dx.doi.org/10.1016/0898-1221(83)90013-5

R.R. Yager, Families of OWA operators, Fuzzy Sets and Systems, 59: 125-148, 1993. http://dx.doi.org/10.1016/0165-0114(93)90194-M

R.R.Yager, A Hierarchical Document Retrieval Language, Information Retrieval, 3: 357-377, 2000. http://dx.doi.org/10.1023/A:1009911900286

Wikipedia, Power Iteration, May, 2008, http://en.wikipedia.org/wiki/Power_method.

Z.Li and M.Reformat, A Schema for Ontology-based Concept Definition and Identification, International Journal of Computer Applications in Technology, 38(4): 333-345, 2010. http://dx.doi.org/10.1504/IJCAT.2010.034534

D. Ferrucci and A. Lally, UIMA: an architectural approach to unstructured information processing in the corporate research environment, Natural Language Engineering, 10(3-4): 327-348, 2004. http://dx.doi.org/10.1017/S1351324904003523

Published

2011-09-10

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.