December | 2014 | BIG DATA Computation

07
Dec
2014

Performance Modeling and Optimization of Map-Reduce Programs

Abstract:
MapReduce is a developer-friendly framework that encapsulates the underlying complexities of distributed computing. It is increasingly being used across enterprises for advanced data analytics, business intelligence, and data mining tasks. But there are two questions bothering Hadoop users: how to improve the performance of MapReduce workloads, and how to estimate the time needed to run a MapReduce job. In this paper, we provide some performance optimization techniques on the premise of workload characterization. After the cluster achieving the best performance, we further propose a modeling method to help Hadoop users estimate the execution time of MapReduce jobs. For evaluation, typical benchmarks are utilized to evaluate the accuracy of our techniques.

07175726-2c7ctk0

December 2014 archive