Abstract: We reveal loopholes of Speculative Execution (SE) implementations under a unique fault model: node-level network throughput degradation. This problem appears in many data-parallel frameworks such as Hadoop MapReduce and Spark. To address this, we present PBSE, a robust, path-based speculative execution that employs three key ingredients: path progress, path diversity, and path-straggler detection and speculation. We show how PBSE is superior to other approaches such as cloning and aggressive speculation under the aforementioned fault model. PBSE is a general solution, applicable to many data-parallel frameworks such as Hadoop/HDFS+QFS, Spark and Flume.
Abstract: Hadoop on datacenter is the popular analytical platform for enterprises. Cloud vendors host Hadoop clusters on the datacenter to provide high performance analytical computing facilities to its customers. While many concurrent users try to use the Clusters to execute their jobs, scheduling should be very effective to complete their job in time and at same time use the resources efficiently with effective cost and time management. Workflows are repeatable pattern of dependable jobs. The workflows are executed in the Hadoop datacenter by allocating VMs. In our earlier papers, a mechanism to pack and execute the customer jobs as workflows on Hadoop platform was proposed which minimizes the VM cost and also executes the workflow Hadoop-MapReduce jobs within deadline.In this paper, we propose a Q learning based scheduling method to optimize the cloud resources in workflows. Q Learning is a model free reinforcement learning technique used to find an optimal action – selection policy for a given Markov decision process. The parameters considered for optimization are VMs consumed, bandwidth at data center and the electric power consumption.