Comparison and Weighted Summation Type of Fuzzy Cluster Validity Indices
Keywords:
fuzzy clustering, fuzzy c-means (FCM), cluster validity indices (CVIs), WSCVIAbstract
Finding the optimal cluster number and validating the partition results
of a data set are difficult tasks since clustering is an unsupervised learning process.
Cluster validity index (CVI) is a kind of criterion function for evaluating the clustering
results and determining the optimal number of clusters. In this paper, we present an
extensive comparison of ten well-known CVIs for fuzzy clustering. Then we extend
traditional single CVIs by introducing the weighted method and propose a weighted
summation type of CVI (WSCVI). Experiments on nine synthetic data sets and four
real-world UCI data sets demonstrate that no one CVI performs better on all data
sets than others. Nevertheless, the proposed WSCVI is more effective by properly
setting the weights.
References
A.K. Jain, M.N. Murty, P.J. Flynn (1999). Data Clustering: A Review, ACM Computer Surveys, 31(3):264-323. http://dx.doi.org/10.1145/331499.331504
P.A. Devijver, J. Kittler (1982). Pattern Recognition: A Statistical Approach, Prentice-Hall, London.
F. Hoppner, F. Klawon, R. Kruse, T. Runkler (1999). Fuzzy Cluster Analysis: Methods for Classifications Data Analysis and Image Recognition, Wiley, New York.
M. Kim, R.S. Ramakrishna (2005). New Indices for Cluster Validity Assessment, Pattern Recognition Letters, 26 (15):2353-2363. http://dx.doi.org/10.1016/j.patrec.2005.04.007
W. Wang, Y. Zhang (2007). On Fuzzy Cluster Validity Indices, Fuzzy Sets and Systems, 158(19):2095-2117. http://dx.doi.org/10.1016/j.fss.2007.03.004
E. Dimitriadou, S. Dolnicar, A. Weingessel (2002). An Examination of Indexes for Determining the Number of Clusters in Binary Data Sets, Psychometrika, 67(1):137-159. http://dx.doi.org/10.1007/BF02294713
O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J.M. P¨Śrez, I. Perona (2013). An Extensive Comparative Study of Cluster Validity Indices, Pattern Recognition, 46(1):243-256. http://dx.doi.org/10.1016/j.patcog.2012.07.021
K.L. Wu, M.S. Yang (2005). A Cluster Validity Index for Fuzzy Clustering, Pattern Recognition Letters, 26 (9):1275-1291. http://dx.doi.org/10.1016/j.patrec.2004.11.022
H. Le Capitaine, C. Frelicot (2011). A Cluster-validity Index Combining an Overlap Measure and a Separation Measure based on Fuzzy-aggregation Operators, IEEE Transactions on Fuzzy Systems, 19(3):580-588. http://dx.doi.org/10.1109/TFUZZ.2011.2106216
U. Maulik, S. Bandyopadhyay (2002). Performance Evaluation of Some Clustering Algorithms and Validity Indices, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:1650-1654. http://dx.doi.org/10.1109/TPAMI.2002.1114856
K.R. Zalik (2010). Cluster Validity Index for Estimation of Fuzzy Clusters of Different Sizes and Densities, Pattern Recognition, 43(10):3374-3390. http://dx.doi.org/10.1016/j.patcog.2010.04.025
W. Sheng, S. Swift, L. Zhang, X. Liu (2005). A Weighted Sum Validity Function for Clustering with a Hybrid Niching Genetic Algorithm, IEEE Transactions on Systems, Man, and Cybernetics - Part B, Cybernetics, 35(6):1156-1167. http://dx.doi.org/10.1109/TSMCB.2005.850173
J.C. Bezdek, R. Ehrlish, W. Full (1984). FCM: The Fuzzy C-means Clustering Algorithm, Computers & Geosciences, 10(2-3):191-203. http://dx.doi.org/10.1016/0098-3004(84)90020-7
J.C. Bezdek (1974). Numerical Taxonomy with Fuzzy Sets, Journal of Mathematical Biology, 7(1):57-71. http://dx.doi.org/10.1007/BF02339490
M. Roubens (1978). Pattern Classification Problems and Fuzzy Sets, Fuzzy Sets and Systems, 1(4):239-253. http://dx.doi.org/10.1016/0165-0114(78)90016-7
J.C. Bezdek (1974). Cluster Validity with Fuzzy Sets, Journal of Cybernetics, 3(3):58-72. http://dx.doi.org/10.1080/01969727308546047
J.C. Dunn (1977). Fuzzy Automata and Decision Processes, Elsevier, New York.
X.L. Xie, G. Beni (1991). A Validity Measure for Fuzzy Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8):841-847. http://dx.doi.org/10.1109/34.85677
S.H. Kwon (1998). Cluster Validity Index for Fuzzy Clustering, Electronics Letters, 34(22):2176-2177. http://dx.doi.org/10.1049/el:19981523
M.K. Pakhira, S. Bandyopadhyay, U. Maulik (2004). Validity Index for Crisp and Fuzzy Clusters, Pattern Recognition, 37(3):487-501. http://dx.doi.org/10.1016/j.patcog.2003.06.005
Y. Fukuyama, M. Sugeno (1989). A New Method of Choosing the Number of Cluster for the Fuzzy C-means Method, Proceedings of the 5th Fuzzy Systems Symposium, 247-250.
Y.G. Tang, F.C. Sun, Z.Q. Sun (2005). Improved Validation Index for Fuzzy Clustering, American Control Conference, 1120-1125.
A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger, J.A. Arrington, R.F. Murtagh (1996). Validity-guided (Re) Clustering with Applications to Image Segmentation, IEEE Transactions on Fuzzy Systems, 4(2):112-123. http://dx.doi.org/10.1109/91.493905
K.L. Zhou, S.L. Yang (2013). A Fuzzy Cluster Validity Index in Consideration of Different Size and Density of Data Set, Journal of the China Society for Scientific and Technical Information, 32(3):306-313.
A. Asuncion, D.J. Newman (2007). UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, http://www.ics.uci.edu/mlearn/MLRepositor-y.html.
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.