Optimal Data File Allocation for All-to-All Comparison in Distributed System: A Case Study on Genetic Sequence Comparison
AbstractIn order to solve the problem of unbalanced load of data les in large-scale data all-to-all comparison under distributed system environment, the differences of les themselves arefully considered. This paper aims to fully utilize the advantages of distributed system to enhance the le allocation of all-to-all comparison between the data les in a large dataset. For this purpose, the author formally described the all-to-all comparison problem, and con-structed a data allocation model via mixed integer linear programming (MILP). Meanwhile, a data allocation algorithm was developed on the Matlab using the intlinprog function of branch-and-bound method. Finally, our model and algorithm were veried through several experiments. The results show that the proposed le allocation strategy can achieve the basic load balance of each node in the distributed system without exceeding the storage capacity of any node, and completely localize the data le. The research ndings can be applied to such elds as bioinformatics, biometrics and data mining.
 Dai, Y.; Wu, W.; Zhou, H.B.; Zhang, J.; Ma, F.Y. (2018). Numerical Simulation and Oprimization of Oil Jet Lubrication for Rotorcraft Meshing Gears, International Journal of Simulation Modelling, 17(2), 318-326, 2018.
 Dai, Y.; Zhu, X.; Zhou, H.; Mao, Z.; Wu, W. (2018). Trajectory Tracking Control for Seafloor Tracked Vehicle By Adaptive Neural-Fuzzy Inference System Algorithm, International Journal of Computers Communications & Control, 13(4), 465-476, 2018.
 Deng, J. (2014). Research and Improvement of Mixed Integer Linear Programming Model for Unit Combination, Nanning: Guangxi University, 12-16, 2014.
 Gao, Y.J. (2017). Research on Data Allocation Strategy for All-to-all Comparison of Large Data Sets, Taiyuan: Taiyuan University of Technology, 5-10, 2017.
 Guo, J.W.; Li, Y.; Du, L.P.; Zhao, G.F.; Jiang, J.Y. (2014). Research on distributed data mining system based on hadoop platform, Advances in Intelligent Systems and Computing, 255, 629-636, 2014.
 He, H.; Du, Z.H.; Zhang, W.Z.; Chen, A. (2016). Optimization strategy of Hadoop small file storage for big data in healthcare, Journal of Supercomputing, 72(10), 3696-3707, 2016.
 Hess, M.; Sczyrba, A.; Egan, R.; Kim, T.W.; Chokhawala, H.; Schroth, G.; Luo, S.; Clark, D.S.; Chen, F.; Zhang, T.; Mackie, R.I.; Pennacchio, L.A.; Tringe, S.G.; Visel, A.; Woyke, T.; Wang, Z.; Rubin, E.M. (2011). Metagenomic discovery of biomass-degrading genes and genomes from cow rumen, Science, 331(6016), 463-467, 2011.
 Hu, S.R. (1991). Modern supercomputer system, Journal of computer science, (1), 47-56, 1991.
 Jiao, X.P.; Mu, J.J. (2013). Improved check node decomposition for linear programming decoding, IEEE Communications Letters, 17(2), 377-380, 2013.
 Liao, J.; Trahay, F.; Xiao, G.; Li, L.; Ishikawa, Y. (2017). Performing initiative data prefetching in distributed file systems for cloud computing, IEEE Transactions on Cloud Computing, 5(3), 550-562, 2017.
 Mu, R.; Wu, J.J.; Li, N. (2018). MATLAB and mathematical modeling, Beijing: Science Press, 63-78, 2018.
 MAzller, E.R.; Carlson, R.C.; Junior, W.K. (2016). Intersection control for automated vehicles with MILP, IFAC-PapersOnLine, 49(3), 37-42, 2016.
 Nayahi, J.J.V.; Kavitha, V. (2017). Privacy and utility preserving data clustering for data anonymization and distribution on Hadoop, Future Generation Computer Systems, 74, 393- 408, 2017.
 Pitty, S.S.; Karimi, I.A. (2008). Novel MILP models for scheduling permutation flowshops, Chemical Product and Process Modeling, 3(1), 35-42, 2008.
 Sun, J.Y. (2016). Simulation experiment of operation research model based on MATLAB, Journal of Shenyang University (Natural Science Edition), 28(4), 337-339, 2016.
 Schulman, J.; Duan, Y.; Ho, J.; Lee, A.; Awwal, I.; Bradlow, H. (2014). Motion planning with sequential convex optimization and convex collision checking, International Journal of Robotics Research, 33(9), 1251-1270, 2014.
 Schmidt, B.; Hartmann, C. (2018). Wavepacket: a matlab package for numerical quantum dynamics. ii: open quantum systems, optimal control, and model reduction, Computer Physics Communications, 228, 229-244, 2018.
 Ubarhande, V.; Popescu, A.; González-Vélez, H. (2015). Novel Data-Distribution Technique for Hadoop in Heterogeneous Cloud Environments, 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems, 217-224, 2015.
 Wang, L.Z.; Tao, J.; Ranjan, R.; Marten, H.; Streit, A.; Chen, J.Y.; Chen, D. (2013). GHadoop: MapReduce across distributed data centers for data-intensive computing, Future Generation Computer Systems, 29(3), 739-750, 2013.
 Yang, X.P.; Zhou, X.G.; Cao, B.Y. (2015). Multi-level linear programming subject to addition-min fuzzy relation inequalities with application in Peer-to-Peer file sharing system, Journal of Intelligent and Fuzzy Systems, 28(6), 2679-2689, 2015
 Zhang, Y.F.; Tian, Y.C.; Fidge, C.; Kelly, W. (2016); Data-aware task scheduling for allto- all comparison problems in heterogeneous distributed systems, Journal of Parallel & Distributed Computing, 93(C), 87-101, 2016.
 Zhang, Y.F.; Tian, Y.C.; Kelly, W.; Fidge, C. (2017). Scalable and efficient data distribution for distributed computing of all-to-all comparison problems, Future Generation Computer Systems, 67, 152-162, 2017.
 Zhang, Y.F.; Tian, Y.C.; Kelly, W.; Fidge, C. (2014). A distributed computing framework for All-to-All comparison problems, IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society, 2499-2505, 2014.
 Zhou, J.X.; Shao, X.M.; Qiao, J.Y.; Zhang, Y.W. (2012). MATLAB from the introduction to proficiency (2nd edition), Beijing: People's Post and Telecommunications Publishing House, 35-92, 2012.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.