Enhanced Dark Block Extraction Method Performed Automatically to Determine the Number of Clusters in Unlabeled Data Sets

Puniethaa Prabhu, K. Duraiswamy

Abstract


One of the major issues in data cluster analysis is to decide the number of clusters or groups from a set of unlabeled data. In addition, the presentation of cluster should be analyzed to provide the accuracy of clustering objects. This paper propose a new method called Enhanced-Dark Block Extraction (E-DBE), which automatically identifies the number of objects groups in unlabeled datasets. The proposed algorithm relies on the available algorithm for visual assessment of cluster tendency of a dataset, by using several common signal and image processing techniques. The method includes the following steps: 1.Generating an Enhanced Visual Assessment Tendency (E-VAT) image from a dissimilarity matrix which is the input for E-DBE algorithm. 2. Processing image segmentation on E-VAT image to obtain a binary image then performs filter techniques. 3. Performing distance transformation to the filtered binary image and projecting the pixels in the main diagonal alignment of the image to figure a projection signal. 4. Smoothing the outcrop signal, computing its first-order derivative and then detecting major peaks and valleys in the resulting signal to acquire the number of clusters. E-DBE is a parameter-free algorithm to perform cluster analysis. Experiments of the method are presented on several UCI, synthetic and real world datasets.

Keywords


Enhanced DBE, Automatic clustering, Cluster tendency, Visual assessment, Reordered dissimilarity image.

Full Text:

PDF

References


R. Xu and D. Wunsch II, Survey of Clustering Algorithms, IEEE Trans. Neural Networks, 16(3): 645-678,2005.
http://dx.doi.org/10.1109/TNN.2005.845141

Shuliang Wang, Wenyan Gan, Deyi Li and Deren Li, Data Field for Hierarchical Clustering, Int J Data Warehousing and Mining, 7(4): 43-63, 2011.
http://dx.doi.org/10.4018/jdwm.2011100103

A.K. Jain, and R.C. Dubes, Algorithms for Clustering Data, Englewood Cliffs, NJ: Prentice- Hall, 1988.

Ling Tan, David Taniar, Kate A. Smith, A clustering algorithm based on an estimated distribution model, Int. J. of Business Intelligent and Data Mining, 1(2): 229-245, 2005.
http://dx.doi.org/10.1504/IJBIDM.2005.008364

R.B. Cattell, A Note on Correlation Clusters and Cluster Search Methods, Psychometrika, 9(3): 169-184, 1944.
http://dx.doi.org/10.1007/BF02288721

P. Sneath, A Computer Approach to Numerical Taxonomy, J. General Microbiology, 17: 201-226, 1957.
http://dx.doi.org/10.1099/00221287-17-1-201

G.D. Floodgate and P.R. Hayes, The Adansonian Taxonomy of Some Yellow Pigmented Marine Bacteria, J. General Microbiology, 30: 237-244, 1963.
http://dx.doi.org/10.1099/00221287-30-2-237

R.F. Ling, A Computer Generated Aid for Cluster Analysis, Comm. ACM, 16: 355-361, 1973.
http://dx.doi.org/10.1145/362248.362263

J.C. Bezdek and R. Hathaway, VAT: A Tool for Visual Assessment of (Cluster) Tendency, Proc. Int Joint Conf. Neural Networks (IJCNN '02), 2225-2230, 2002.

R.C. Gonzalez and R.E. Woods, Digital Image Processing, Prentice Hall, 2002.

Puniethaa Prabhu and K.Duraiswamy, Enhanced VAT for Cluster Quality Assessment in Unlabeled Datasets, J. of Circuits, Systems and Computers (JCSC), 21(1): 1-19, 2012.

I. Sledge, J. Huband, and J.C. Bezdek, (Automatic) Cluster Count Extraction from Unlabeled Datasets, Joint Proc. Fourth Int Conf. Natural Comput (ICNC) and Fifth Int Conf. Fuzzy Systems and Knowledge Discovery (FSKD), 2008.

G. Milligan and M. Cooper, An Examination of Procedures for Determining the Number of Clusters in a Data Set, Psychometrika, 50: 159-179, 1985.
http://dx.doi.org/10.1007/BF02294245

R.B. Calinski and J. Harabasz, A Dendrite Method for Cluster Analysis, Comm. in Statistics, 3: 1-27, 1974.

R. Tibshirani, G. Walther, and T. Hastie, Estimating the Number of Clusters in a Dataset via the Gap Statistics, J. Royal Statistical Soc. B, 63: 411-423, 2001.
http://dx.doi.org/10.1111/1467-9868.00293

U. Maulik and S. Bandyopadhyay, Performance Evaluation of Some Clustering Algorithms and Validity Indices, IEEE Trans. Pattern Analysis and Machine Intelligence, 24(12): 1650- 1654, 2002.
http://dx.doi.org/10.1109/TPAMI.2002.1114856

J.C. Bezdek, W. Li, Y. Attikiouzel, and M.P. Windham, A Geometric Approach to Cluster Validity for Normal Mixtures, Soft Computing, 1: 166-179, 1997.
http://dx.doi.org/10.1007/s005000050019

J.C. Bezdek and N.R. Pal, Some New Indices of Cluster Validity, IEEE Trans. System, Man and Cybernetics, 28(3): 301-315, 1998.
http://dx.doi.org/10.1109/3477.678624

W. Wang and Y. Zhang, On Fuzzy Cluster Validity Indices, Fuzzy Sets and Systems, 158: 2095-2117, 2007.
http://dx.doi.org/10.1016/j.fss.2007.03.004

Decomposition Methodology for Knowledge Discovery and Data Mining, O. Maimon and L. Rokach, eds., World Scientific, 90-94, 2005.
http://dx.doi.org/10.1142/5686

P. Guo, C. Chen, and M. Lyu, Cluster Number Selection for aSmall Set of Samples Using the Bayesian Ying-Yang Model, IEEE Trans. Neural Networks, 13(3): 757-763, 2002.
http://dx.doi.org/10.1109/TNN.2002.1000144

X. Hu and L. Xu, A Comparative Study of Several Cluster Number Selection Criteria, Proc. Fourth Int'l Conf. Intelligent Data Eng. and Automated Learning (IDEAL '03), 195-202, 2003.

P.J. Rousseeuw, A Graphical Aid to the Interpretations and Validation of Cluster Analysis, J. Comput. and Applied Math., 20: 53-65, 1987.
http://dx.doi.org/10.1016/0377-0427(87)90125-7

Yun Sing Koh, Russel Pears and Gillian Dobbie, Automatic Item Weight Generation for Pattern Mining and its Application, Int. J. Data Warehousing and Mining, 7(3): 30-49, 2011.
http://dx.doi.org/10.4018/jdwm.2011070102

Liang Wang, Christopher Leckie, Kotagiri Ramamohanarao and James Bezdek, Automatically Determining the Number of Clusters in Unlabeled Data Sets, IEEE Transactions on knowledge and Data Engineering, 21(3): 335-350, 2009.
http://dx.doi.org/10.1109/TKDE.2008.158

N. Otsu, A Threshold Selection Method from Gray-level Histograms, IEEE Trans. Systems, Man, and Cybernetics, 9(1): 62-66, 1979.
http://dx.doi.org/10.1109/TSMC.1979.4310076

Mehmet Sezgin and Bulent Sankur, Survey over image thresholding techniques and quantitative performance Evaluation, Journal of Electronic Imaging, 13(1): 146-165, 2004.
http://dx.doi.org/10.1117/1.1631315

Amit Saxena and John Wang, Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy, Int J Data Warehousing and Mining, 6(2): 22-40, 2010.
http://dx.doi.org/10.4018/jdwm.2010040102

A. Savitzky and M.J.E Golay, Smoothing and differentiation of data by simplified least squares. Procedures, Analytical Chemistry, 36(8): 1627-1639, 1964.
http://dx.doi.org/10.1021/ac60214a047

UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/mlearn/MLRepository.html.




DOI: https://doi.org/10.15837/ijccc.2013.2.308



Copyright (c) 2017 Puniethaa Prabhu, K. Duraiswamy

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2018: IF=1.585..

IJCCC is indexed in Scopus from 2008 (CiteScore2018 = 1.56):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.

 

 Impact Factor in JCR2018 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.585 (Q3). Scopus: CiteScore2018=1.56 (Q2); Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.