Comparison and Weighted Summation Type of Fuzzy Cluster Validity Indices

Authors

  • Kaile Zhou Hefei University of Technology
  • Shuai Ding Hefei University of Technology
  • Chao Fu School of Management Hefei University of Technology Hefei 230009, China
  • Shanlin Yang Hefei University of Technology

Keywords:

fuzzy clustering, fuzzy c-means (FCM), cluster validity indices (CVIs), WSCVI

Abstract

Finding the optimal cluster number and validating the partition results
of a data set are difficult tasks since clustering is an unsupervised learning process.
Cluster validity index (CVI) is a kind of criterion function for evaluating the clustering
results and determining the optimal number of clusters. In this paper, we present an
extensive comparison of ten well-known CVIs for fuzzy clustering. Then we extend
traditional single CVIs by introducing the weighted method and propose a weighted
summation type of CVI (WSCVI). Experiments on nine synthetic data sets and four
real-world UCI data sets demonstrate that no one CVI performs better on all data
sets than others. Nevertheless, the proposed WSCVI is more effective by properly
setting the weights.

References

A.K. Jain, M.N. Murty, P.J. Flynn (1999). Data Clustering: A Review, ACM Computer Surveys, 31(3):264-323. http://dx.doi.org/10.1145/331499.331504

P.A. Devijver, J. Kittler (1982). Pattern Recognition: A Statistical Approach, Prentice-Hall, London.

F. Hoppner, F. Klawon, R. Kruse, T. Runkler (1999). Fuzzy Cluster Analysis: Methods for Classifications Data Analysis and Image Recognition, Wiley, New York.

M. Kim, R.S. Ramakrishna (2005). New Indices for Cluster Validity Assessment, Pattern Recognition Letters, 26 (15):2353-2363. http://dx.doi.org/10.1016/j.patrec.2005.04.007

W. Wang, Y. Zhang (2007). On Fuzzy Cluster Validity Indices, Fuzzy Sets and Systems, 158(19):2095-2117. http://dx.doi.org/10.1016/j.fss.2007.03.004

E. Dimitriadou, S. Dolnicar, A. Weingessel (2002). An Examination of Indexes for Determining the Number of Clusters in Binary Data Sets, Psychometrika, 67(1):137-159. http://dx.doi.org/10.1007/BF02294713

O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J.M. P¨Śrez, I. Perona (2013). An Extensive Comparative Study of Cluster Validity Indices, Pattern Recognition, 46(1):243-256. http://dx.doi.org/10.1016/j.patcog.2012.07.021

K.L. Wu, M.S. Yang (2005). A Cluster Validity Index for Fuzzy Clustering, Pattern Recognition Letters, 26 (9):1275-1291. http://dx.doi.org/10.1016/j.patrec.2004.11.022

H. Le Capitaine, C. Frelicot (2011). A Cluster-validity Index Combining an Overlap Measure and a Separation Measure based on Fuzzy-aggregation Operators, IEEE Transactions on Fuzzy Systems, 19(3):580-588. http://dx.doi.org/10.1109/TFUZZ.2011.2106216

U. Maulik, S. Bandyopadhyay (2002). Performance Evaluation of Some Clustering Algorithms and Validity Indices, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:1650-1654. http://dx.doi.org/10.1109/TPAMI.2002.1114856

K.R. Zalik (2010). Cluster Validity Index for Estimation of Fuzzy Clusters of Different Sizes and Densities, Pattern Recognition, 43(10):3374-3390. http://dx.doi.org/10.1016/j.patcog.2010.04.025

W. Sheng, S. Swift, L. Zhang, X. Liu (2005). A Weighted Sum Validity Function for Clustering with a Hybrid Niching Genetic Algorithm, IEEE Transactions on Systems, Man, and Cybernetics - Part B, Cybernetics, 35(6):1156-1167. http://dx.doi.org/10.1109/TSMCB.2005.850173

J.C. Bezdek, R. Ehrlish, W. Full (1984). FCM: The Fuzzy C-means Clustering Algorithm, Computers & Geosciences, 10(2-3):191-203. http://dx.doi.org/10.1016/0098-3004(84)90020-7

J.C. Bezdek (1974). Numerical Taxonomy with Fuzzy Sets, Journal of Mathematical Biology, 7(1):57-71. http://dx.doi.org/10.1007/BF02339490

M. Roubens (1978). Pattern Classification Problems and Fuzzy Sets, Fuzzy Sets and Systems, 1(4):239-253. http://dx.doi.org/10.1016/0165-0114(78)90016-7

J.C. Bezdek (1974). Cluster Validity with Fuzzy Sets, Journal of Cybernetics, 3(3):58-72. http://dx.doi.org/10.1080/01969727308546047

J.C. Dunn (1977). Fuzzy Automata and Decision Processes, Elsevier, New York.

X.L. Xie, G. Beni (1991). A Validity Measure for Fuzzy Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8):841-847. http://dx.doi.org/10.1109/34.85677

S.H. Kwon (1998). Cluster Validity Index for Fuzzy Clustering, Electronics Letters, 34(22):2176-2177. http://dx.doi.org/10.1049/el:19981523

M.K. Pakhira, S. Bandyopadhyay, U. Maulik (2004). Validity Index for Crisp and Fuzzy Clusters, Pattern Recognition, 37(3):487-501. http://dx.doi.org/10.1016/j.patcog.2003.06.005

Y. Fukuyama, M. Sugeno (1989). A New Method of Choosing the Number of Cluster for the Fuzzy C-means Method, Proceedings of the 5th Fuzzy Systems Symposium, 247-250.

Y.G. Tang, F.C. Sun, Z.Q. Sun (2005). Improved Validation Index for Fuzzy Clustering, American Control Conference, 1120-1125.

A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger, J.A. Arrington, R.F. Murtagh (1996). Validity-guided (Re) Clustering with Applications to Image Segmentation, IEEE Transactions on Fuzzy Systems, 4(2):112-123. http://dx.doi.org/10.1109/91.493905

K.L. Zhou, S.L. Yang (2013). A Fuzzy Cluster Validity Index in Consideration of Different Size and Density of Data Set, Journal of the China Society for Scientific and Technical Information, 32(3):306-313.

A. Asuncion, D.J. Newman (2007). UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, http://www.ics.uci.edu/mlearn/MLRepositor-y.html.

Published

2014-04-04

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.