Outlier Detection with Nonlinear Projection Pursuit

Mihaela Breaban, Henri Luchian

Abstract


The current work proposes and investigates a new method to identify outliers in multivariate numerical data, driving its roots in projection pursuit. Projection pursuit is basically a method to deliver meaningful linear combinations of attributes. The novelty of our approach resides in introducing nonlinear combinations, able to model more complex interactions among attributes. The exponential increase of the search space with the increase of the polynomial degree is tackled with a genetic algorithm that performs monomial selection. Synthetic test cases highlight the benefits of the new approach over classical linear projection pursuit.


Keywords


outlier detection, nonlinear projections, genetic algorithms

Full Text:

PDF

References


H.-P. Kriegel, P. Kröger, A. Zimek, Outlier Detection Techniques, Tutorial at 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington DC, 2010.

V. Hodge, J. Austin, A Survey of Outlier Detection Methodologies, Artif. Intell. Rev., 22(2):85-126, 2004.
http://dx.doi.org/10.1023/B:AIRE.0000045502.10941.a9

Irad Ben-Gal, Outlier detection, In: Maimon O. and Rockach L. (Eds.), Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic Publishers, 2005.
http://dx.doi.org/10.1007/b107408

V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Comput. Surv., 41(3), Art. 15, 2009.
http://dx.doi.org/10.1145/1541880.1541882

J. H. Friedman and J. W. Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Trans. Comput., C23(9):881-890, 1974.
http://dx.doi.org/10.1109/T-C.1974.224051

Stahel, W. A., Breakdown of covariance estimators, Research report 31, Fachgruppe fur Statistik, E.T.H. Zuurich, 1981.

J.H. Friedman, Exploratory Projection Pursuit, J AM STAT ASSOC, 82(1):249-266, 1987.
http://dx.doi.org/10.1080/01621459.1987.10478427

V. Vemuri and W. Cedeńo, Multi-Niche Crowding for Multimodal Search. Practical Handbook of Genetic Algorithms: New Frontiers, Ed. Lance Chambers, vol.2, 1995.

A. Ruiz-Gazen, S. L. Marie-Sainte, and A. Berro, Detecting multivariate outliers using projection pursuit with particle swarm optimization, Proc. of COMPSTAT2010, 89-98, 2010.

Knorr, E.M. and Ng, R.T., A unified approach for mining outliers, Proc. Conf. of the Centre for Advanced Studies on Collaborative Research (CASCON), Toronto, Canada, 1997.

Knorr, E.M. and Ng, R.T., Finding intensional knowledge of distance-based outliers, Proc. Int. Conf. on Very Large Data Bases (VLDB), Edinburgh, Scotland, 1999.

Angiulli, F. and Pizzuti, C., Fast outlier detection in high dimensional spaces, Proc. European Conf. on Principles of Knowledge Discovery and Data Mining, Helsinki, Finland, 2002.
http://dx.doi.org/10.1007/3-540-45681-3_2

Hautamaki, V., Karkkainen, I., and Franti, P. Outlier detection using k-nearest neighbour graph, Proc. IEEE Int. Conf. on Pattern Recognition (ICPR), Cambridge, UK, 2004.

A. Sierra, High-order Fisher's discriminant analysis, Pattern Recognition, 35(6):1291-1302, 2002.
http://dx.doi.org/10.1016/S0031-3203(01)00107-8

J. Handl, J. Knowles, Feature subset selection in unsupervised learning via multiobjective optimization, Int. J. of Computational Intelligence Research, 3:217-238, 2006.

M. Breaban, H. Luchian, A unifying criterion for unsupervised clustering and feature selection, Pattern Recognition, 44(4):854-865, 2011.
http://dx.doi.org/10.1016/j.patcog.2010.10.006




DOI: https://doi.org/10.15837/ijccc.2013.1.165



Copyright (c) 2017 Mihaela Breaban, Henri Luchian

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2018: IF=1.585..

IJCCC is indexed in Scopus from 2008 (CiteScore2018 = 1.56):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.

 

 Impact Factor in JCR2018 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.585 (Q3). Scopus: CiteScore2018=1.56 (Q2); Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.