Enhanced Dark Block Extraction Method Performed Automatically to Determine the Number of Clusters in Unlabeled Data Sets

Authors

  • Puniethaa Prabhu Department of Master of Computer Application K.S. Rangasamy College of Technology Tamil Nadu, India.
  • K. Duraiswamy Department of Master of Computer Application K.S. Rangasamy College of Technology Tamil Nadu, India.

Keywords:

Enhanced DBE, Automatic clustering, Cluster tendency, Visual assessment, Reordered dissimilarity image.

Abstract

One of the major issues in data cluster analysis is to decide the number of clusters or groups from a set of unlabeled data. In addition, the presentation of cluster should be analyzed to provide the accuracy of clustering objects. This paper propose a new method called Enhanced-Dark Block Extraction (E-DBE), which automatically identifies the number of objects groups in unlabeled datasets. The proposed algorithm relies on the available algorithm for visual assessment of cluster tendency of a dataset, by using several common signal and image processing techniques. The method includes the following steps: 1.Generating an Enhanced Visual Assessment Tendency (E-VAT) image from a dissimilarity matrix which is the input for E-DBE algorithm. 2. Processing image segmentation on E-VAT image to obtain a binary image then performs filter techniques. 3. Performing distance transformation to the filtered binary image and projecting the pixels in the main diagonal alignment of the image to figure a projection signal. 4. Smoothing the outcrop signal, computing its first-order derivative and then detecting major peaks and valleys in the resulting signal to acquire the number of clusters. E-DBE is a parameter-free algorithm to perform cluster analysis. Experiments of the method are presented on several UCI, synthetic and real world datasets.

References

R. Xu and D. Wunsch II, Survey of Clustering Algorithms, IEEE Trans. Neural Networks, 16(3): 645-678,2005. http://dx.doi.org/10.1109/TNN.2005.845141

Shuliang Wang, Wenyan Gan, Deyi Li and Deren Li, Data Field for Hierarchical Clustering, Int J Data Warehousing and Mining, 7(4): 43-63, 2011. http://dx.doi.org/10.4018/jdwm.2011100103

A.K. Jain, and R.C. Dubes, Algorithms for Clustering Data, Englewood Cliffs, NJ: Prentice- Hall, 1988.

Ling Tan, David Taniar, Kate A. Smith, A clustering algorithm based on an estimated distribution model, Int. J. of Business Intelligent and Data Mining, 1(2): 229-245, 2005. http://dx.doi.org/10.1504/IJBIDM.2005.008364

R.B. Cattell, A Note on Correlation Clusters and Cluster Search Methods, Psychometrika, 9(3): 169-184, 1944. http://dx.doi.org/10.1007/BF02288721

P. Sneath, A Computer Approach to Numerical Taxonomy, J. General Microbiology, 17: 201-226, 1957. http://dx.doi.org/10.1099/00221287-17-1-201

G.D. Floodgate and P.R. Hayes, The Adansonian Taxonomy of Some Yellow Pigmented Marine Bacteria, J. General Microbiology, 30: 237-244, 1963. http://dx.doi.org/10.1099/00221287-30-2-237

R.F. Ling, A Computer Generated Aid for Cluster Analysis, Comm. ACM, 16: 355-361, 1973. http://dx.doi.org/10.1145/362248.362263

J.C. Bezdek and R. Hathaway, VAT: A Tool for Visual Assessment of (Cluster) Tendency, Proc. Int Joint Conf. Neural Networks (IJCNN '02), 2225-2230, 2002.

R.C. Gonzalez and R.E. Woods, Digital Image Processing, Prentice Hall, 2002.

Puniethaa Prabhu and K.Duraiswamy, Enhanced VAT for Cluster Quality Assessment in Unlabeled Datasets, J. of Circuits, Systems and Computers (JCSC), 21(1): 1-19, 2012.

I. Sledge, J. Huband, and J.C. Bezdek, (Automatic) Cluster Count Extraction from Unlabeled Datasets, Joint Proc. Fourth Int Conf. Natural Comput (ICNC) and Fifth Int Conf. Fuzzy Systems and Knowledge Discovery (FSKD), 2008.

G. Milligan and M. Cooper, An Examination of Procedures for Determining the Number of Clusters in a Data Set, Psychometrika, 50: 159-179, 1985. http://dx.doi.org/10.1007/BF02294245

R.B. Calinski and J. Harabasz, A Dendrite Method for Cluster Analysis, Comm. in Statistics, 3: 1-27, 1974.

R. Tibshirani, G. Walther, and T. Hastie, Estimating the Number of Clusters in a Dataset via the Gap Statistics, J. Royal Statistical Soc. B, 63: 411-423, 2001. http://dx.doi.org/10.1111/1467-9868.00293

U. Maulik and S. Bandyopadhyay, Performance Evaluation of Some Clustering Algorithms and Validity Indices, IEEE Trans. Pattern Analysis and Machine Intelligence, 24(12): 1650- 1654, 2002. http://dx.doi.org/10.1109/TPAMI.2002.1114856

J.C. Bezdek, W. Li, Y. Attikiouzel, and M.P. Windham, A Geometric Approach to Cluster Validity for Normal Mixtures, Soft Computing, 1: 166-179, 1997. http://dx.doi.org/10.1007/s005000050019

J.C. Bezdek and N.R. Pal, Some New Indices of Cluster Validity, IEEE Trans. System, Man and Cybernetics, 28(3): 301-315, 1998. http://dx.doi.org/10.1109/3477.678624

W. Wang and Y. Zhang, On Fuzzy Cluster Validity Indices, Fuzzy Sets and Systems, 158: 2095-2117, 2007. http://dx.doi.org/10.1016/j.fss.2007.03.004

Decomposition Methodology for Knowledge Discovery and Data Mining, O. Maimon and L. Rokach, eds., World Scientific, 90-94, 2005. http://dx.doi.org/10.1142/5686

P. Guo, C. Chen, and M. Lyu, Cluster Number Selection for aSmall Set of Samples Using the Bayesian Ying-Yang Model, IEEE Trans. Neural Networks, 13(3): 757-763, 2002. http://dx.doi.org/10.1109/TNN.2002.1000144

X. Hu and L. Xu, A Comparative Study of Several Cluster Number Selection Criteria, Proc. Fourth Int'l Conf. Intelligent Data Eng. and Automated Learning (IDEAL '03), 195-202, 2003.

P.J. Rousseeuw, A Graphical Aid to the Interpretations and Validation of Cluster Analysis, J. Comput. and Applied Math., 20: 53-65, 1987. http://dx.doi.org/10.1016/0377-0427(87)90125-7

Yun Sing Koh, Russel Pears and Gillian Dobbie, Automatic Item Weight Generation for Pattern Mining and its Application, Int. J. Data Warehousing and Mining, 7(3): 30-49, 2011. http://dx.doi.org/10.4018/jdwm.2011070102

Liang Wang, Christopher Leckie, Kotagiri Ramamohanarao and James Bezdek, Automatically Determining the Number of Clusters in Unlabeled Data Sets, IEEE Transactions on knowledge and Data Engineering, 21(3): 335-350, 2009. http://dx.doi.org/10.1109/TKDE.2008.158

N. Otsu, A Threshold Selection Method from Gray-level Histograms, IEEE Trans. Systems, Man, and Cybernetics, 9(1): 62-66, 1979. http://dx.doi.org/10.1109/TSMC.1979.4310076

Mehmet Sezgin and Bulent Sankur, Survey over image thresholding techniques and quantitative performance Evaluation, Journal of Electronic Imaging, 13(1): 146-165, 2004. http://dx.doi.org/10.1117/1.1631315

Amit Saxena and John Wang, Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy, Int J Data Warehousing and Mining, 6(2): 22-40, 2010. http://dx.doi.org/10.4018/jdwm.2010040102

A. Savitzky and M.J.E Golay, Smoothing and differentiation of data by simplified least squares. Procedures, Analytical Chemistry, 36(8): 1627-1639, 1964. http://dx.doi.org/10.1021/ac60214a047

UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/mlearn/MLRepository.html.

Published

2013-02-18

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.