Heterogeneous Data Clustering Considering Multiple User-provided Constraints
Keywords:
clustering, heterogeneous networks, relational data, multi-typed objects, user constraintsAbstract
Clustering on heterogeneous networks which consist of multi-typed objects and links has proved to be a useful technique in many scenarios. Although numerous clustering methods have achieved remarkable success, current clustering methods for heterogeneous networks tend to consider only internal information of the dataset. In order to utilize background domain knowledge, we propose a general framework for clustering heterogeneous data considering multiple user-provided constrains. Specifically, we summarize that three types of manual constraints on the object can be used to guide the clustering process. Then we propose the User- HeteClus algorithm to solve the key issues in the case of star-structure heterogeneous data, which incorporating the user constraint into similarity measurement between central objects. Experiments on a real-world dataset show the effectiveness of the proposed algorithm.References
Banerjee, A.; Dhillon, I.S.; Ghosh, J. Merugu S.; Modha, D.S. (2004). A generalized maximum entropy approach to Bregman co-clustering and matrix approximation, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 509-514, 2004. https://doi.org/10.1145/1014052.1014111
Bekkerman, R.; El-Yaniv, R.; McCallum, A. (2005). Multi-way distributional clustering via pairwise interactions, Proceedings of the 22nd International Conference on Machine Learning, 41-48, 2005. https://doi.org/10.1145/1102351.1102357
Chen, Y.; Wang, L.; Dong, M. (2010); Non-negative matrix factorization for semisupervised heterogeneous data coclustering, IEEE Transactions on Knowledge and Data Engineering 22(10), 1459-1474, 2010. https://doi.org/10.1109/TKDE.2009.169
Dai, Y.; Wu, W.; Zhou, H.; Zhang, J; Ma, F. (2018). Numerical simulation and optimization of oil jet lubrication for rotorcraft meshing gears, International Journal of Simulation Modelling, 17(2), 318-326, 2018. https://doi.org/10.2507/IJSIMM17(2)CO6
Dai, Y.; Zhu, X.; Zhou, H.; Mao, Z.; Wu, W. (2018). Trajectory tracking control for seafloor tracked vehicle by adaptive neural-fuzzy inference system algorithm, International Journal of Computers Communications & Control, 13(4), 465-476, 2018. https://doi.org/10.15837/ijccc.2018.4.3267
Dhillon, I.S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 269-274, 2001. https://doi.org/10.1145/502512.502550
Dhillon, I.S.; Mallela, S.; Modha, D.S. (2003). Information-theoretic co-clustering, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 89-98, 2003. https://doi.org/10.1145/956750.956764
Gao, B.; Liu, T.; Ma, W. (2006). Star-structured high-order heterogeneous data co-clustering based on consistent information theory, Proceedings of the Sixth IEEE International Conference on Data Mining, 880-884, 2006. https://doi.org/10.1109/ICDM.2006.154
Getz, G.; Levine, E.; Domany, E. (2000). Coupled two-way clustering analysis of gene microarray data, Proceedings of the National Academy of Sciences, 97(22), 12079-12084, 2000. https://doi.org/10.1073/pnas.210134797
Han, J.; Kamber, M.; Pei, J. (2012). Data Mining: Concepts and Techniques (Third Edition), Morgan Kaufmann Publishers, 2012.
Huang, Y. (2016). A three-phase algorithm for clustering multi-typed objects in starstructured heterogeneous data, International Journal of Database Theory and Application, 9(8), 107-118, 2016. https://doi.org/10.14257/ijdta.2016.9.8.12
Huang, Y. (2017). Clustering multi-typed objects in extended star-structured heterogeneous data, Intelligent Data Analysis, 21(2), 225-241, 2017. https://doi.org/10.3233/IDA-150416
Huang, Y.; Gao, X. (2014). Clustering on heterogeneous networks, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(3), 213-233, 2014. https://doi.org/10.1002/widm.1126
Ienco, D.; Robardet, C.; Pensa, R.G.; Meo, R. (2013). Parameter-less co-clustering for star-structured heterogeneous data, Data Mining and Knowledge Discovery, 26(2), 217-254, 2013. https://doi.org/10.1007/s10618-012-0248-z
Long, B.; Zhang, Z.; Wu, X.; Yu, P.S. (2006). Spectral clustering for multi-type relational data, Proceedings of the 23rd International Conference on Machine Learning, 585-592, 2006. https://doi.org/10.1145/1143844.1143918
Mei, J.; Chen, L. (2012). A fuzzy approach for multitype relational data clustering, IEEE Transactions on Fuzzy Systems, 20(2), 358-371, 2012. https://doi.org/10.1109/TFUZZ.2011.2174366
Pio, G.; Serafino, F.; Malerba, D.; Ceci, M. (2018). Multi-type clustering and classification from heterogeneous networks, Information Sciences, 425, 107-126, 2018. https://doi.org/10.1016/j.ins.2017.10.021
Rege, M.; Yu, Q. (2008). Efficient mining of heterogeneous star-structured data, International Journal of Software and Informatics, 2(2), 141-161, 2008.
Sun, Y.; Han, J.; Zhao, P.; Yin, Z.; Cheng, H.; Wu, T. (2009). RankClus: integrating clustering with ranking for heterogeneous information network analysis, Proceedings of the 12nd International Conference on Extending Database Technology: Advances in Database Technology, 565-576, 2009. https://doi.org/10.1145/1516360.1516426
Sun, Y.; Yu, Y.; Han, J. (2009). Ranking-based clustering of heterogeneous information networks with star network schema, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 797-806, 2009. https://doi.org/10.1145/1557019.1557107
Tang, L.; Liu, H. (2009). Uncovering cross-dimension group structures in multi-dimensional networks, Proceedings of SDM Workshop on Analysis of Dynamic Networks, 677-685, 2009.
Tang, L.; Liu, H.; Zhang, J. (2012). Identifying evolving groups in dynamic multimode networks, IEEE Transactions on Knowledge and Data Engineering, 24(1), 72-85, 2012. https://doi.org/10.1109/TKDE.2011.159
Wagstaff, K.; Cardie, C. (2000). Clustering with instance-level constraints, Proceedings of the 17th International Conference on Machine Learning, 1103-1110, 2000.
Yin, X.; Han, J.; Yu, P.S. (2006). LinkClus: efficient clustering via heterogeneous semantic links, Proceedings of the 32nd International Conference on Very Large Data Bases, 427-438, 2006.
Zhang, W.; Zhang, Z.; Chao, H.; Tseng, F. (2018). Kernel mixture model for probability density estimation in Bayesian classifiers, Data Mining and Knowledge Discovery, 32(3), 675-707, 2018. https://doi.org/10.1007/s10618-018-0550-5
Zhang, W.; Zhang, Z.; Qi, D.; Liu, Y. (2014). Automatic crack detection and classification method for subway tunnel safety monitoring, Sensors, 14(10), 19307-19328, 2014. https://doi.org/10.3390/s141019307
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.