Performing MapReduce on Data Centers with Hierarchical Structures

Zeliu Ding, Deke Guo, Xueshan Luo, Xi Chen

Abstract


Data centers are created as distributed information systems for massive data storage and processing. The structure of a data center determines the way that its inner servers, links and switches are interconnected. Several hierarchical structures have been proposed to improve the topological performance of data centers. By using recursively defined topologies, these novel structures can well support general applications and services with high scalability and reliability. However, these structures ignore the details of some specific applications running on data centers, such as MapReduce, a well-known distributed data processing application. The communication and control mechanisms for performing MapReduce on the traditional structure cannot be employed on the hierarchical structures. In this paper, we propose a methodology for performing MapReduce on data centers with hierarchical structures. Our methodology is based on the distributed hash table (DHT), an efficient data retrieval approach on distributed systems. We utilize the advantages of DHT, including decentralization, fault tolerance and scalability, to address the main problems that face hierarchical data centers in supporting MapReduce. Comprehensive evaluation demonstrates the feasibility and excellent performance of our methodology.


Keywords


MapReduce; Data Center; distributed hash table (DHT)

Full Text:

PDF

References


M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center Network Architecture. Proc. ACM SIGCOMM, pp.63-74, Aug. 2008.

D. Borthakur. The Hadoop Distributed File System: Architecture and Design. http://hadoop.apache.org/core/docs/current/hdfsdesign.pdf

C. Bastoul and P. Feautrier. Improving Data Locality by Chunking. Springer Lecture Notes in Computer Science, vol.2622, pp.320-334, 2003.

F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E.Gruber. Bigtable: A Distributed Storage System for Structured Data. Proc. 7th Symposium on Operating Systems Design and Implementation (OSDI), pp.205-218, Nov. 2006.

J. Cohen. Graph Twiddling in a MapReduce world. Computing in Science and Engineering, IEEE Educational Activities Department, vol.2, no.4, pp.29-41, 2009.

J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Proc. 6th Symposium on Operating System Design and Implementation (OSDI), pp.137-150, Dec. 2004.

J. Dean, and S. Ghemawat. MapReduce: A Flexible Data Processing Tool. Communications of the ACM, vol.53, no.1, pp.72-77, 2010.
http://dx.doi.org/10.1145/1629175.1629198

A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel. The Cost of a Cloud: Research Problems in Data Center Networks. ACM SIGCOMM computer communication review, vol.39, no.1, pp.68-73, Jan. 2009.
http://dx.doi.org/10.1145/1496091.1496103

C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers. Proc. ACM SIGCOMM, pp.75-86, Aug. 2008.

A. Greenberg, J.R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D.A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. ACM SIGCOMM Computer Communication Review, vol.39, no.4, pp.51-62, Aug. 2009.
http://dx.doi.org/10.1145/1594977.1592576

C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers. Proc. ACM SIGCOMM, pp.63-74, Aug. 2009.

S. Ghemawat, H. Gobioff, and S.T. Leung. The Google File System. Proc. 19th ACM Symposium on Operating Systems Principles, pp.29-43, Dec. 2003.

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-parallel programs from Sequential Building Blocks. Proc. 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, pp.59-72, Jun. 2007.

W. Jun. A Methodology for the Deployment of Consistent Hashing Proc. 2nd IEEE International Conference on Future Networks, Jan. 2010.

D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, and S. Lu. FiConn: Using Backup Port for Server Interconnection in Data Centers. Proc. IEEE INFOCOM, pp.2276-2285, Apr. 2009.

J. Lin. The Curse of Zipf and Limits to Parallelization: A Look at the Stragglers Problem in MapReduce Workshop on Large-Scale Distributed Systems for Information Retrieval, Jul. 2009.

J. Pang, P.B. Gibbons, M. Kaminsky, S. Seshan, and H. Yu. Defragmenting DHT-based Distributed File Systems Proc. 27th IEEE International Conference on Distributed Computing Systems, Jun. 2007.

T. Redkar. Introducing Cloud Services. Windows Azure Platform, Apress, pp.1-51, 2009.
http://dx.doi.org/10.1007/978-1-4302-2480-8_1

L. Rao, X. Liu, L. Xie, and W. Liu. Minimizing Electricity Cost: Optimization of Distributed Internet Data Centers in a Multi-Electricity-Market Environment Proc. IEEE INFOCOM, Mar. 2010.

I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan. Chord: A Scalable Peertopeer Lookup Service for Internet Applications Proc. ACM SIGCOMM, pp.1-12, Aug. 2001.

D. Talia and P. Trunfio. Enabling Dynamic Querying over Distributed Hash Tables. Elsevier Journal of Parallel and Distributed Computing, vol.70, no.12, pp.1254-1265, 2010.
http://dx.doi.org/10.1016/j.jpdc.2010.08.012

G. Urdaneta, G. Pierre and M.V. Steen. A Survey of DHT Security Techniques. Journal of ACM Computing Surveys, vol.43, no.2, pp.1-49, 2011.
http://dx.doi.org/10.1145/1883612.1883615

X.Wang and D. Loguinov. Load-balancing performance of consistent hashing: asymptotic analysis of random node join IEEE/ACM Transactions on Networking, vol.15, no.4, pp.892-905, 2007.
http://dx.doi.org/10.1109/TNET.2007.893881

http://hadoop.apache.org.




DOI: https://doi.org/10.15837/ijccc.2012.3.1385



Copyright (c) 2017 Zeliu Ding, Deke Guo, Xueshan Luo, Xi Chen

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2018: IF=1.585..

IJCCC is indexed in Scopus from 2008 (CiteScore2018 = 1.56):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.

 

 Impact Factor in JCR2018 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.585 (Q3). Scopus: CiteScore2018=1.56 (Q2);

SCImago Journal & Country Rank

Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.