Performance Modeling and Optimization of MapReduce

Abstract: MapReduce framework is widely used to parallelize batch jobs of great companies. MapReduce splits the job for each mapper in the map phase and then intermediate tasks are synced in reducers to be processed in the next stage. It exploits a high degree of multi-tasking to process the jobs as soon as possible. However […]