December 2016 archive

An Optimization Algorithm for Heterogeneous Hadoop Clusters Based on Dynamic Load Balancing

Abstract: Hadoop is a popular cloud computing software, and its major component MapReduce can efficiently complete parallel computing in homogeneous environment. But in practical application heterogeneous cluster is a common phenomenon. In this case, it’s prone to unbalance load. To solve this problem, a model of heterogeneous Hadoop cluster based on dynamic load balancing is proposed in this paper. This model starts from MapReduce and tracks node information in real time by using its monitoring module. A maximum node hit rate priority algorithm (MNHRPA) is designed and implemented in the paper, and it can achieve load balancing by dynamic adjustment of data allocation based on nodes’ computing power and load. The experimental results show that the algorithm can effectively reduce tasks’ completion time and achieve load balancing of the cluster compared with Hadoop’s default algorithm.

Keywords: Hadoop; heterogeneous cluster; data allocation; load balancing.

07943366-1cqh172