Hadoop Optimization for Massive Image Processing: Case Study Face Detection


  • llginç Demir Advanced Technologies Research Institute The Scientific and Technological Research Council of Turkey
  • Ahmet Sayar Kocaeli University, Computer Engineering Department


Hadoop, MapReduce, Cloud Computing, Face Detection


Face detection applications are widely used for searching, tagging and classifying people inside very large image databases. This type of applications requires processing of relatively small sized and large number of images. On the other hand, Hadoop Distributed File System (HDFS) is originally designed for storing and processing largesize files. Huge number of small-size images causes slowdown in HDFS by increasing total initialization time of jobs, scheduling overhead of tasks and memory usage of the file system manager (Namenode). The study in this paper presents two approaches to improve small image file processing performance of HDFS. These are (1) converting the images into single large-size file by merging and (2) combining many images for a single task without merging. We also introduce novel Hadoop file formats and record generation methods (for reading image content) in order to develop these techniques

Author Biography

Ahmet Sayar, Kocaeli University, Computer Engineering Department

Ahmet Sayar has received his BEng. degree in Management Engineering from Istanbul Technical University (Istanbul, Turkey). He has received his Ms degree in Computer Science from Syracuse University (Syracuse, NY, USA), and his PhD degree in Computer Science from Indiana University (Bloomington, IN, USA). During his Ph.D. study he has worked at Los Alamos National Laboratory (New Mexico, USA) and Community Grids Laboratory (Indiana, USA) as a graduate research assistant. He is currently an Assistant Professor and Assistant Chair of Computer Engineering Department at Kocaeli University in Turkey. His current research interests are Distributed Systems, High Performance Designs and Evaluations, Remote Sensing, WEB-GIS, Spatial Data Infrastructure, Databases and Data Structures.



Berlinska, J.; M. Drozdowski. (2011); Scheduling Divisible MapReduce Computations, Journal of Parallel and Distributed Computing, 71(3): 450-459. http://dx.doi.org/10.1016/j.jpdc.2010.12.004

Dean, J.; S. Ghemawat. (2010); MapReduce: A Flexible Data Processing Tool, Communications of the ACM, 53(1): 72-77. http://dx.doi.org/10.1145/1629175.1629198

Dean, J.; S. Ghemawat. (2008); MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, 51(1): 1-13.

Ghemawat, S.; H. Gobioff.; S. T. Leung.(2003); The Google File System, Proceedings of the 19th ACM Symposium on Operating System Principles, NY, USA: ACM

White, T. (2009); The Definitive Guide. 2009: O'Reilly Media.

Dong, B.; et al. (2012); An Optimized Approach for Storing and Accessing Small Files on Cloud Storage, Journal of Network and Computer Applications, 35(6): 1847-1862. http://dx.doi.org/10.1016/j.jnca.2012.07.009

Dong, B.; et al. (2010); A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint Files, IEEE International Conference on Services Computing (SCC), Florida, USA: IEEE.

Golpayegani, N.; M. Halem. (2009); Cloud Computing for Satellite Data Processing on High End Compute Clusters, IEEE International Conference on Cloud Computing, Bangalore, India: IEEE, 88-92. http://dx.doi.org/10.1109/CLOUD.2009.71

Krishna, M.; et al. (2010); Implementation and Performance Evaluation of a Hybrid Distributed System for Storing and Processing Images from the Web, 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA: IEEE, 762-767.

Kocakulak, H.; T. T. Temizel. (2011); MapReduce: A Hadoop Solution for Ballistic Image Analysis and Recognition, International Conference on High Performance Computing and Simulation (HPCS), ˙lstanbul, Turkey, 836-842.



Liu, X.; et al. (2009), Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS, IEEE International Conference on Cluster Computing and Workshops, Louisiana USA: IEEE, 1-8. http://dx.doi.org/10.1109/CLUSTR.2009.5289196




Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.