Outlier Detection with Nonlinear Projection Pursuit

Authors

  • Mihaela Breaban "Alexandru Ioan Cuza" University of Iasi
  • Henri Luchian "Alexandru Ioan Cuza" University of Iasi,

Keywords:

outlier detection, nonlinear projections, genetic algorithms

Abstract

The current work proposes and investigates a new method to identify outliers in multivariate numerical data, driving its roots in projection pursuit. Projection pursuit is basically a method to deliver meaningful linear combinations of attributes. The novelty of our approach resides in introducing nonlinear combinations, able to model more complex interactions among attributes. The exponential increase of the search space with the increase of the polynomial degree is tackled with a genetic algorithm that performs monomial selection. Synthetic test cases highlight the benefits of the new approach over classical linear projection pursuit.

References

H.-P. Kriegel, P. Kröger, A. Zimek, Outlier Detection Techniques, Tutorial at 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington DC, 2010.

V. Hodge, J. Austin, A Survey of Outlier Detection Methodologies, Artif. Intell. Rev., 22(2):85-126, 2004. http://dx.doi.org/10.1023/B:AIRE.0000045502.10941.a9

Irad Ben-Gal, Outlier detection, In: Maimon O. and Rockach L. (Eds.), Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic Publishers, 2005. http://dx.doi.org/10.1007/b107408

V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Comput. Surv., 41(3), Art. 15, 2009. http://dx.doi.org/10.1145/1541880.1541882

J. H. Friedman and J. W. Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Trans. Comput., C23(9):881-890, 1974. http://dx.doi.org/10.1109/T-C.1974.224051

Stahel, W. A., Breakdown of covariance estimators, Research report 31, Fachgruppe fur Statistik, E.T.H. Zuurich, 1981.

J.H. Friedman, Exploratory Projection Pursuit, J AM STAT ASSOC, 82(1):249-266, 1987. http://dx.doi.org/10.1080/01621459.1987.10478427

V. Vemuri and W. Cedeńo, Multi-Niche Crowding for Multimodal Search. Practical Handbook of Genetic Algorithms: New Frontiers, Ed. Lance Chambers, vol.2, 1995.

A. Ruiz-Gazen, S. L. Marie-Sainte, and A. Berro, Detecting multivariate outliers using projection pursuit with particle swarm optimization, Proc. of COMPSTAT2010, 89-98, 2010.

Knorr, E.M. and Ng, R.T., A unified approach for mining outliers, Proc. Conf. of the Centre for Advanced Studies on Collaborative Research (CASCON), Toronto, Canada, 1997.

Knorr, E.M. and Ng, R.T., Finding intensional knowledge of distance-based outliers, Proc. Int. Conf. on Very Large Data Bases (VLDB), Edinburgh, Scotland, 1999.

Angiulli, F. and Pizzuti, C., Fast outlier detection in high dimensional spaces, Proc. European Conf. on Principles of Knowledge Discovery and Data Mining, Helsinki, Finland, 2002. http://dx.doi.org/10.1007/3-540-45681-3_2

Hautamaki, V., Karkkainen, I., and Franti, P. Outlier detection using k-nearest neighbour graph, Proc. IEEE Int. Conf. on Pattern Recognition (ICPR), Cambridge, UK, 2004.

A. Sierra, High-order Fisher's discriminant analysis, Pattern Recognition, 35(6):1291-1302, 2002. http://dx.doi.org/10.1016/S0031-3203(01)00107-8

J. Handl, J. Knowles, Feature subset selection in unsupervised learning via multiobjective optimization, Int. J. of Computational Intelligence Research, 3:217-238, 2006.

M. Breaban, H. Luchian, A unifying criterion for unsupervised clustering and feature selection, Pattern Recognition, 44(4):854-865, 2011. http://dx.doi.org/10.1016/j.patcog.2010.10.006

Published

2012-11-13

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.