Hadoop Optimization for Massive Image Processing: Case Study Face Detection
Keywords:Hadoop, MapReduce, Cloud Computing, Face Detection
Face detection applications are widely used for searching, tagging and classifyingÂ people inside very large image databases. This type of applications requires processingÂ of relatively small sized and large number of images. On the other hand, HadoopÂ Distributed File System (HDFS) is originally designed for storing and processing largesizeÂ files. Huge number of small-size images causes slowdown in HDFS by increasingÂ total initialization time of jobs, scheduling overhead of tasks and memory usage of theÂ file system manager (Namenode). The study in this paper presents two approaches toÂ improve small image file processing performance of HDFS. These are (1) convertingÂ the images into single large-size file by merging and (2) combining many images for aÂ single task without merging. We also introduce novel Hadoop file formats and recordÂ generation methods (for reading image content) in order to develop these techniques
Berlinska, J.; M. Drozdowski. (2011); Scheduling Divisible MapReduce Computations, Journal of Parallel and Distributed Computing, 71(3): 450-459. http://dx.doi.org/10.1016/j.jpdc.2010.12.004
Dean, J.; S. Ghemawat. (2010); MapReduce: A Flexible Data Processing Tool, Communications of the ACM, 53(1): 72-77. http://dx.doi.org/10.1145/1629175.1629198
Dean, J.; S. Ghemawat. (2008); MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, 51(1): 1-13.
Ghemawat, S.; H. Gobioff.; S. T. Leung.(2003); The Google File System, Proceedings of the 19th ACM Symposium on Operating System Principles, NY, USA: ACM
White, T. (2009); The Definitive Guide. 2009: O'Reilly Media.
Dong, B.; et al. (2012); An Optimized Approach for Storing and Accessing Small Files on Cloud Storage, Journal of Network and Computer Applications, 35(6): 1847-1862. http://dx.doi.org/10.1016/j.jnca.2012.07.009
Dong, B.; et al. (2010); A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint Files, IEEE International Conference on Services Computing (SCC), Florida, USA: IEEE.
Golpayegani, N.; M. Halem. (2009); Cloud Computing for Satellite Data Processing on High End Compute Clusters, IEEE International Conference on Cloud Computing, Bangalore, India: IEEE, 88-92. http://dx.doi.org/10.1109/CLOUD.2009.71
Krishna, M.; et al. (2010); Implementation and Performance Evaluation of a Hybrid Distributed System for Storing and Processing Images from the Web, 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA: IEEE, 762-767.
Kocakulak, H.; T. T. Temizel. (2011); MapReduce: A Hadoop Solution for Ballistic Image Analysis and Recognition, International Conference on High Performance Computing and Simulation (HPCS), Ë™lstanbul, Turkey, 836-842.
Liu, X.; et al. (2009), Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS, IEEE International Conference on Cluster Computing and Workshops, Louisiana USA: IEEE, 1-8. http://dx.doi.org/10.1109/CLUSTR.2009.5289196
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.