Heterogeneous Data Clustering Considering Multiple User-provided Constraints

Authors

Keywords:

clustering, heterogeneous networks, relational data, multi-typed objects, user constraints

Abstract

Clustering on heterogeneous networks which consist of multi-typed objects and links has proved to be a useful technique in many scenarios. Although numerous clustering methods have achieved remarkable success, current clustering methods for heterogeneous networks tend to consider only internal information of the dataset. In order to utilize background domain knowledge, we propose a general framework for clustering heterogeneous data considering multiple user-provided constrains. Specifically, we summarize that three types of manual constraints on the object can be used to guide the clustering process. Then we propose the User- HeteClus algorithm to solve the key issues in the case of star-structure heterogeneous data, which incorporating the user constraint into similarity measurement between central objects. Experiments on a real-world dataset show the effectiveness of the proposed algorithm.

Author Biography

Yue Huang, Beijing Language and Culture University

School of Information Science

References

Banerjee, A.; Dhillon, I.S.; Ghosh, J. Merugu S.; Modha, D.S. (2004). A generalized maximum entropy approach to Bregman co-clustering and matrix approximation, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 509-514, 2004. https://doi.org/10.1145/1014052.1014111

Bekkerman, R.; El-Yaniv, R.; McCallum, A. (2005). Multi-way distributional clustering via pairwise interactions, Proceedings of the 22nd International Conference on Machine Learning, 41-48, 2005. https://doi.org/10.1145/1102351.1102357

Chen, Y.; Wang, L.; Dong, M. (2010); Non-negative matrix factorization for semisupervised heterogeneous data coclustering, IEEE Transactions on Knowledge and Data Engineering 22(10), 1459-1474, 2010. https://doi.org/10.1109/TKDE.2009.169

Dai, Y.; Wu, W.; Zhou, H.; Zhang, J; Ma, F. (2018). Numerical simulation and optimization of oil jet lubrication for rotorcraft meshing gears, International Journal of Simulation Modelling, 17(2), 318-326, 2018. https://doi.org/10.2507/IJSIMM17(2)CO6

Dai, Y.; Zhu, X.; Zhou, H.; Mao, Z.; Wu, W. (2018). Trajectory tracking control for seafloor tracked vehicle by adaptive neural-fuzzy inference system algorithm, International Journal of Computers Communications & Control, 13(4), 465-476, 2018. https://doi.org/10.15837/ijccc.2018.4.3267

Dhillon, I.S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 269-274, 2001. https://doi.org/10.1145/502512.502550

Dhillon, I.S.; Mallela, S.; Modha, D.S. (2003). Information-theoretic co-clustering, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 89-98, 2003. https://doi.org/10.1145/956750.956764

Gao, B.; Liu, T.; Ma, W. (2006). Star-structured high-order heterogeneous data co-clustering based on consistent information theory, Proceedings of the Sixth IEEE International Conference on Data Mining, 880-884, 2006. https://doi.org/10.1109/ICDM.2006.154

Getz, G.; Levine, E.; Domany, E. (2000). Coupled two-way clustering analysis of gene microarray data, Proceedings of the National Academy of Sciences, 97(22), 12079-12084, 2000. https://doi.org/10.1073/pnas.210134797

Han, J.; Kamber, M.; Pei, J. (2012). Data Mining: Concepts and Techniques (Third Edition), Morgan Kaufmann Publishers, 2012.

Huang, Y. (2016). A three-phase algorithm for clustering multi-typed objects in starstructured heterogeneous data, International Journal of Database Theory and Application, 9(8), 107-118, 2016. https://doi.org/10.14257/ijdta.2016.9.8.12

Huang, Y. (2017). Clustering multi-typed objects in extended star-structured heterogeneous data, Intelligent Data Analysis, 21(2), 225-241, 2017. https://doi.org/10.3233/IDA-150416

Huang, Y.; Gao, X. (2014). Clustering on heterogeneous networks, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(3), 213-233, 2014. https://doi.org/10.1002/widm.1126

Ienco, D.; Robardet, C.; Pensa, R.G.; Meo, R. (2013). Parameter-less co-clustering for star-structured heterogeneous data, Data Mining and Knowledge Discovery, 26(2), 217-254, 2013. https://doi.org/10.1007/s10618-012-0248-z

Long, B.; Zhang, Z.; Wu, X.; Yu, P.S. (2006). Spectral clustering for multi-type relational data, Proceedings of the 23rd International Conference on Machine Learning, 585-592, 2006. https://doi.org/10.1145/1143844.1143918

Mei, J.; Chen, L. (2012). A fuzzy approach for multitype relational data clustering, IEEE Transactions on Fuzzy Systems, 20(2), 358-371, 2012. https://doi.org/10.1109/TFUZZ.2011.2174366

Pio, G.; Serafino, F.; Malerba, D.; Ceci, M. (2018). Multi-type clustering and classification from heterogeneous networks, Information Sciences, 425, 107-126, 2018. https://doi.org/10.1016/j.ins.2017.10.021

Rege, M.; Yu, Q. (2008). Efficient mining of heterogeneous star-structured data, International Journal of Software and Informatics, 2(2), 141-161, 2008.

Sun, Y.; Han, J.; Zhao, P.; Yin, Z.; Cheng, H.; Wu, T. (2009). RankClus: integrating clustering with ranking for heterogeneous information network analysis, Proceedings of the 12nd International Conference on Extending Database Technology: Advances in Database Technology, 565-576, 2009. https://doi.org/10.1145/1516360.1516426

Sun, Y.; Yu, Y.; Han, J. (2009). Ranking-based clustering of heterogeneous information networks with star network schema, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 797-806, 2009. https://doi.org/10.1145/1557019.1557107

Tang, L.; Liu, H. (2009). Uncovering cross-dimension group structures in multi-dimensional networks, Proceedings of SDM Workshop on Analysis of Dynamic Networks, 677-685, 2009.

Tang, L.; Liu, H.; Zhang, J. (2012). Identifying evolving groups in dynamic multimode networks, IEEE Transactions on Knowledge and Data Engineering, 24(1), 72-85, 2012. https://doi.org/10.1109/TKDE.2011.159

Wagstaff, K.; Cardie, C. (2000). Clustering with instance-level constraints, Proceedings of the 17th International Conference on Machine Learning, 1103-1110, 2000.

Yin, X.; Han, J.; Yu, P.S. (2006). LinkClus: efficient clustering via heterogeneous semantic links, Proceedings of the 32nd International Conference on Very Large Data Bases, 427-438, 2006.

Zhang, W.; Zhang, Z.; Chao, H.; Tseng, F. (2018). Kernel mixture model for probability density estimation in Bayesian classifiers, Data Mining and Knowledge Discovery, 32(3), 675-707, 2018. https://doi.org/10.1007/s10618-018-0550-5

Zhang, W.; Zhang, Z.; Qi, D.; Liu, Y. (2014). Automatic crack detection and classification method for subway tunnel safety monitoring, Sensors, 14(10), 19307-19328, 2014. https://doi.org/10.3390/s141019307

Published

2019-04-14

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.