Outlier Detection with Nonlinear Projection Pursuit
Keywords:
outlier detection, nonlinear projections, genetic algorithmsAbstract
The current work proposes and investigates a new method to identify outliers in multivariate numerical data, driving its roots in projection pursuit. Projection pursuit is basically a method to deliver meaningful linear combinations of attributes. The novelty of our approach resides in introducing nonlinear combinations, able to model more complex interactions among attributes. The exponential increase of the search space with the increase of the polynomial degree is tackled with a genetic algorithm that performs monomial selection. Synthetic test cases highlight the benefits of the new approach over classical linear projection pursuit.
References
H.-P. Kriegel, P. Kröger, A. Zimek, Outlier Detection Techniques, Tutorial at 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington DC, 2010.
V. Hodge, J. Austin, A Survey of Outlier Detection Methodologies, Artif. Intell. Rev., 22(2):85-126, 2004. http://dx.doi.org/10.1023/B:AIRE.0000045502.10941.a9
Irad Ben-Gal, Outlier detection, In: Maimon O. and Rockach L. (Eds.), Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic Publishers, 2005. http://dx.doi.org/10.1007/b107408
V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Comput. Surv., 41(3), Art. 15, 2009. http://dx.doi.org/10.1145/1541880.1541882
J. H. Friedman and J. W. Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Trans. Comput., C23(9):881-890, 1974. http://dx.doi.org/10.1109/T-C.1974.224051
Stahel, W. A., Breakdown of covariance estimators, Research report 31, Fachgruppe fur Statistik, E.T.H. Zuurich, 1981.
J.H. Friedman, Exploratory Projection Pursuit, J AM STAT ASSOC, 82(1):249-266, 1987. http://dx.doi.org/10.1080/01621459.1987.10478427
V. Vemuri and W. Cedeńo, Multi-Niche Crowding for Multimodal Search. Practical Handbook of Genetic Algorithms: New Frontiers, Ed. Lance Chambers, vol.2, 1995.
A. Ruiz-Gazen, S. L. Marie-Sainte, and A. Berro, Detecting multivariate outliers using projection pursuit with particle swarm optimization, Proc. of COMPSTAT2010, 89-98, 2010.
Knorr, E.M. and Ng, R.T., A unified approach for mining outliers, Proc. Conf. of the Centre for Advanced Studies on Collaborative Research (CASCON), Toronto, Canada, 1997.
Knorr, E.M. and Ng, R.T., Finding intensional knowledge of distance-based outliers, Proc. Int. Conf. on Very Large Data Bases (VLDB), Edinburgh, Scotland, 1999.
Angiulli, F. and Pizzuti, C., Fast outlier detection in high dimensional spaces, Proc. European Conf. on Principles of Knowledge Discovery and Data Mining, Helsinki, Finland, 2002. http://dx.doi.org/10.1007/3-540-45681-3_2
Hautamaki, V., Karkkainen, I., and Franti, P. Outlier detection using k-nearest neighbour graph, Proc. IEEE Int. Conf. on Pattern Recognition (ICPR), Cambridge, UK, 2004.
A. Sierra, High-order Fisher's discriminant analysis, Pattern Recognition, 35(6):1291-1302, 2002. http://dx.doi.org/10.1016/S0031-3203(01)00107-8
J. Handl, J. Knowles, Feature subset selection in unsupervised learning via multiobjective optimization, Int. J. of Computational Intelligence Research, 3:217-238, 2006.
M. Breaban, H. Luchian, A unifying criterion for unsupervised clustering and feature selection, Pattern Recognition, 44(4):854-865, 2011. http://dx.doi.org/10.1016/j.patcog.2010.10.006
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.