Human-inspired Identification of High-level Concepts using OWA and Linguistic Quantifiers

Marek Z. Reformat, Ronald R. Yager, Zhan Li, Naif Alajlan

Abstract


Intelligent agent based system can be used to identify high-level concepts matching sets of keywords provided by users. A new human-inspired approach to concept identification in documents is introduced here. The proposed method takes keywords and builds concept structures based on them. These concept structures are represented as hierarchies of concepts (HofC). The ontology is used to enrich HofCs with terms and other concepts (sub-concepts) based on concept definitions, as well as with related concepts. Additionally, the approach uses levels of importance of terms defining the concepts. The levels of importance of terms are continuously updated based on a flow of documents using an Adaptive Assignment of Term Importance (AATI) schema. The levels of activation of concepts identified in a document that match these in the HofC are estimated using ordered weighted averaging (OWA) operators with linguistic quantifiers. A simple case study presented in the paper is designed to illustrate the approach.

Keywords


concept identification, text documents, ontology, hierarchy of concepts, ordered weighted averaging operator, importance of concepts

Full Text:

PDF

References


H.M.Haav and T.-L. Lubi, A survery of concept-based information retrieval tools on the web, Proceedings of 5th East-European conference ADBIS, Vilnius, Lithuania, 29-41, 2001.

J. A. Gulla and P. G. Auran and K. M. Risvik, Linguistics in Large-Scale Web Search, in: Natural Language Processing and Information Systems, 218-222, 2002.

A. Spink and D. Wolfram and M. B. J. Jansen and T. Saracevic, Searching the Web: the public and their queries, Journal of the American Society for Information Science and Technology, 52: 226-234, 2001.
http://dx.doi.org/10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.0.CO;2-R

V. Vidulin, M. Lustrek and M. Gams, Training a genre classifier for automatic classification of Web pages, Journal of Computing and Information Technology, 15(4): 305-311, 2007.
http://dx.doi.org/10.1109/iti.2007.4283750

Y. Aphinyanaphongs, I Tsamardinos, A. Statnikov, H. Douglas and C. F. Aliferis, Text categorization models for high-quality article retrieval in internal medicine, Journal of the American Medical Information Association, 12(2): 207-216, 2005.
http://dx.doi.org/10.1197/jamia.M1641

A. Anagnostopoulos, A. Broder and K. Punera, Effective and efficient classification on a search-engine modeling, Knowledge and Information Systems, 16(2): 129-154, 2008.
http://dx.doi.org/10.1007/s10115-007-0102-6

B. Choi and X. Peng, Dynamic and hierarchical classification of Web pages, Online Information Review, 28(2): 139-147, 2004.
http://dx.doi.org/10.1108/14684520410531673

P. J. Anick, Adapting a full-text information retrieval system to the computer troubleshooting domain, Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 349-358, 1994.
http://dx.doi.org/10.1007/978-1-4471-2099-5_36

W. A. Woods, 1997, Conceptual indexing: a better way to organize knowledge, Technical Report: TR-97-61, Sun Microsystems, Inc. Mountain View, CA,USA. http://research.sun.com/techrep/1997/abstract-61.html,

T. E. Gruber, A translation approach to portable ontology specifications, Knowledge Acquisition, 5: 199-220, 1993.
http://dx.doi.org/10.1006/knac.1993.1008

G. A. Miller, WordNet: a lexical database for English, Communications of the ACM, 38(11): 39-41, 1995.
http://dx.doi.org/10.1145/219717.219748

E. M. Voorhees, Query expansion using lexical-semantic relations, Proceedings of the 17th Annual ACM SIGIR conference on research and development in information retrieval, New York, NY, USA, 61-69, 1994.
http://dx.doi.org/10.1007/978-1-4471-2099-5_7

Z. Gong, C.-W. Cheang and L. H. U, Web query expansion by WordNet, Lecture notes in computer science, 3588, 166-175, 2005.
http://dx.doi.org/10.1007/11546924_17

M. Baziz, M. Boughanem, N. Aussenac-Gilles, and C. Chrisment, Semantic Cores for Representing Documents in IR, Proceedings of 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, 1011-1017, 2005.
http://dx.doi.org/10.1145/1066677.1066911

S.G. Kolte and S.G. Bhirud, Word Sense Disambiguation UsingWordNet Domains, First International Conference on Emerging Trends in Engineering and Technology, 2008. ICETET

Y.-B. Kim and Y.-S. Kim, Latent Semantic Kernels for WordNet: Transforming a Tree- Like Structure into a Matrix, ALPIT '08 International Conference on Advanced Language Processing and Web Information Technology, 76-80, July, 2008.

K. Knight and R. Whitney, Ontology Creation and Use: SENSUS, Information sciences institute, University of Southern California, http://www.isi.edu/naturallanguage/ resources/sensus.html, 1997.

N. Guarino, C. Masolo and G. Vetere, OntoSeek: Content-based Access to the Web, IEEE Intelligent Systems, 14(3): 70-80, 1999.
http://dx.doi.org/10.1109/5254.769887

S. E Lewis, Gene Ontology: looking backwards and forwards, Genome Biology, 6(1): 103, 2005.
http://dx.doi.org/10.1186/gb-2004-6-1-103

I. Spasic, E. Simeonidis, H. L. Messiha, N. W. Paton, and D. B. Kell, KiPar, a tool for systematic information retrieval regarding parameters for kinetic modelling of yeast metabolic pathways, Bioinformatics, 25(11): 1404-1411, 2009.
http://dx.doi.org/10.1093/bioinformatics/btp175

D.W. Embley, Towards Semantic Understanding - An Approach Based on Information Extraction Ontologies, Proceedings of the Fifteenth Australasian Database Conference (ADC2004), 3-12, 2004.

H.-M.Muller, E. E. Kenny, and P. W. Sternberg, Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature, PLoS Biology, 2(11): 1984-1998, 2004.
http://dx.doi.org/10.1371/journal.pbio.0020309

H. B. Styltsvig, Ontology-based information retrieval (PHD Thesis), Denmark, 2006.

Ph. Cimiano, P. Haase, M. Herold, M. Mantel, and P.Buitelaar, LexOnto: A Model for Ontology Lexicons for Ontology-based NLP, Proceedings of the OntoLex (From Text to Knowledge: The Lexicon/Ontology Interface) workshop at ISWC07 (International Semantic Web Conference), Busan, South-Korea, 2007.

M. Morneau, G. W. Mineau, and D. Corbett, LexOnto: A Model for Ontology Lexicons for Ontology-based NLP, Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, Hongkong, China, 449-455, 2006.

D. Vallet, M. Fernandez and P. Castells, An ontology-based information retrieval model, Proceedings of 2nd European Semantic Web Conference, ESWC 2005, Grete, Greece, 455- 470, June, 2005
http://dx.doi.org/10.1007/11431053_31

O. Dridi and M.B. Ahmed, Building an ontology-based framework for semantic information retrieval: application to breast cancer, 3rd International Conference on Information and Communication Technologies: From Theory to Applications, ICTTA 2008. , April, 1-6, 2008.

H. Cunningham, D. Maynard, K. Bontcheva , V. Tablan, C. Ursu, M. Dimitrov, M.Dowman, N. Aswani, I.Roberts, Y. Li, A. Shafirin and A. Funk, Developing Language Processing Components with GATE, The University of Sheffield, April, 2009, http://gate.ac.uk/sale/tao/index.html.

P. Castells, M. Fernandez and D. Vallet, An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval, IEEE Transactions on Knowledge and Data Engineering, 19(2): 261-272, 2007.
http://dx.doi.org/10.1109/TKDE.2007.22

S. L. Tomassen, Searching with Document Space Adapted Ontologies, in Emerging Technologies and Information Systems for the Knowledge Society, 5288, 513-522, 2008.

V. Snasel, P. Moravec and J. Pokomy, WordNet Ontology Based Model for Web Retrieval, Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration, Tokyo,Japan, 220-225, April, 8-9, 2005.
http://dx.doi.org/10.1109/wiri.2005.38

G. Antoniou and F. van Harmelen, A Semantic Web Primer(2nd Edition), The MIT Press, Cambridge, Massachusetts, London, England, 2008.

B. McBride, RDF Primer, Aug., 2008, http://www.w3.org/TR/REC-rdf-syntax.

R.R. Yager, On ordered weighted averaging aggregation operators in multi-criteria decision making, IEEE Transactions on Systems, Man and Cybernetics, 18: 183-190, 1988.
http://dx.doi.org/10.1109/21.87068

L.A. Zadeh, A computational approach to fuzzy quantifiers in natural language, Computers and Mathematics with Applications 9: 149-184, 1983.
http://dx.doi.org/10.1016/0898-1221(83)90013-5

R.R. Yager, Families of OWA operators, Fuzzy Sets and Systems, 59: 125-148, 1993.
http://dx.doi.org/10.1016/0165-0114(93)90194-M

R.R.Yager, A Hierarchical Document Retrieval Language, Information Retrieval, 3: 357-377, 2000.
http://dx.doi.org/10.1023/A:1009911900286

Wikipedia, Power Iteration, May, 2008, http://en.wikipedia.org/wiki/Power_method.

Z.Li and M.Reformat, A Schema for Ontology-based Concept Definition and Identification, International Journal of Computer Applications in Technology, 38(4): 333-345, 2010.
http://dx.doi.org/10.1504/IJCAT.2010.034534

D. Ferrucci and A. Lally, UIMA: an architectural approach to unstructured information processing in the corporate research environment, Natural Language Engineering, 10(3-4): 327-348, 2004.
http://dx.doi.org/10.1017/S1351324904003523




DOI: https://doi.org/10.15837/ijccc.2011.3.2132



Copyright (c) 2017 Marek Z. Reformat, Ronald R. Yager, Zhan Li, Naif Alajlan

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2018: IF=1.585..

IJCCC is indexed in Scopus from 2008 (CiteScore2018 = 1.56):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.

 

 Impact Factor in JCR2018 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.585 (Q3). Scopus: CiteScore2018=1.56 (Q2); Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.