Performance Modeling, Evaluation, and Optimization of MapReduce
References
Chastain, Walter Cooper, and Stephen Emille Chin. “System and methods for UICC-based secure communication.” U.S. Patent No. 9,461,993. 4 Oct. 2016.
Song, Linqi, Sundara Rajan Srinivasavaradhan, and Christina Fragouli. “The Benefit of Being Flexible in Distributed Computation.” arXiv preprint arXiv:1705.08464 (2017).
Nasir, Muhammad Anis Uddin, et al. “Load Balancing for Skewed Streams on Heterogeneous Cluster.” arXiv preprint arXiv:1705.09073 (2017).
Yin, Jinsong, and Yuanyuan Qiao. “Performance modeling and optimization of MapReduce programs.” Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference on. IEEE, 2014.
Yan, Wei, et al. “An Optimization Algorithm for Heterogeneous Hadoop Clusters Based on Dynamic Load Balancing.” Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2016 17th International Conference on. IEEE, 2016.
Shi, Yingjie, Lei Wang, and Fang Du. “Performance and energy efficiency of big data systems: characterization, implication and improvement.” Proceedings of the 6th International Conference on Software and Computer Applications. ACM, 2017.
T. Gunarathne, W. Tak-Lon, J. Qiu, G. Fox, “MapReduce in the clouds for science,” 2nd IEEE Conference on Cloud Computing Technology and Science, CloudCom-2010, pp. 565–572, 2010.
R. Ananthanarayanan, K. Gupta, P. Pandey, H. Pucha, P. Sarkar, M. Shah, R. Tewari, “Cloud analytics: Do we really need to reinvent the storage stack?,” In Proc. of the HotCloud Workshop, San Diego, 2009.
R. L. Grossman, “The Case for Cloud Computing,” IT Professional, vol.11, no.2, pp.23-27, March-April 2009.
J. Dean, S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” In Proc. of the 6th Symposium on Operating Systems Design and Implementation, San Francisco CA, 2004.
K. Shvachko, H. Huang, S. Radia, R. Chansler, “The hadoop distributed file system,” in Proc. of the 26th IEEE (MSST2010) Symposium on Massive Storage Systems and Technologies, 2010.
M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” In Proc. of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys ’07), 2007.
M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments,” In USENIX OSDI, 2008.
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, E. Harris, “Reining in the outliers in map-reduce clusters using Mantri,” In Proc. of the 9th USENIX OSDI Symposium, 2010.
K. Ren, Y. Kwon, M. Balazinska, B. Howe, “Hadoop’s adolescence: a comparative workload analysis from three research clusters,” Tech. Report UW-CSE-12-06-01, University of Washington, 2012.
Y. Chen, S. Alspaugh, R. H. Katz, “Design insights for MapReduce from diverse production workloads,” Technical Report UCB/EECS-2012-17, EECS Dep., University of California, Berkeley, 2012.
G. Ananthanarayanan, A. Ghodsi, S. Shenker, I. Stoica, “Effective straggler mitigation: attack of the clones,” In Proc. of the 10th Symp. on Networked Systems Design and Implementation (NSDI), 2013.
F. Ahmad, S. T. Chakradhar, A. Raghunathan, T. N. Vijaykumar, “Tarazu: optimizing mapreduce on heterogeneous clusters,” In Proc. of the 17th ASPLOS Conf., 2012.
G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, E. Harris, “Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters,” In ACM EuroSys, 2011.
H. Karloff, S. Suri, S. Vassilvitskii, “A model of computation for MapReduce,” In Proc. ACM-SIAM Sympos. Discrete Algorithms (SODA), pp. 938–948, 2010.
B. Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy, “A Platform for Scalable One-Pass Analytics using MapReduce,” In Proceedings of ACM SIGMOD Conf., 2011.
X. Yang, J. Sun, “An analytical performance model of mapreduce,” In Proc. of Cloud Computing and Intelligence Systems (CCIS), 2011.
X. Lin, Z. Meng, C. Xu, M. Wang, “A practical performance model for hadoop mapreduce,” In Proc. Of CLUSTER Workshops, 2012.
E. Krevat, T. Shiran, E. Anderson, J. Tucek, J.J. Wylie, G.R. Ganger, “Applying Performance Models to Understand Data-intensive Computing Efficiency,” Technical Report CMU-PDL-10-108, Carnegie Mellon University, Pittsburgh, 2010.
Y. Kwon, M. Balazinska, B. Howe, J. Rolia, “SkewTune: Mitigating skew in MapReduce applications,” In Proc. of SIGMOD Conf., pages 25–36, 2012.
J. Tan, X. Meng, L. Zhang, “Delay tails in MapReduce scheduling,” in Proc. of the SIGMETRICS/PERFORMANCE Conf., 2012.
J. Tan, Y. Wang, W. Yu, and L. Zhang, “Non-work-conserving effects in MapReduce: diffusion limit and criticality,” In Proc. of the 14th ACM SIGMETRICS, Austin, Texas, USA, 2014.
M. Lin, J. Tan, A. Wierman, L. Zhang, “Joint optimization of overlapping phases in MapReduce,” Performance Evaluation, 2013.
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmleegy, and R. Sears, “Mapreduce online,” In Proc. of NSDI, 2010.
R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, J. Zhou, “Scope: Easy and efficient parallel processing of massive data sets,” In Proc. of VLDB, 2008.
Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R. Murthy, “Hive – a warehousing solution over a Map-Reduce framework,” PVLDB, vol.2, no.2, pp. 1626–1629, 2009.
S. Li, S. Hu, S. Wang, L. Su, T. Abdelzaher, I. Gupta, R. Pace. “WOHA: Deadline-Aware Map-Reduce Workflow Scheduling Framework over Hadoop Cluster,” In Proc. of 34th International Conference on Distributed Computing Systems (ICDCS), 2014.
V.A. Saletore, K. Krishnan, V. Viswanathan, M.E. Tolentino, “HcBench: Methodology, Development, and Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark,” IEEE International Symposium on Workload Characterization, 2013.
M. A. Marsan, G. Chiola, “On Petri Nets with Deterministic and Exponentially Distributed Firing Times,” Advances in Petri Nets, LNCS, vol. 266, Springer, pp. 132-145, 1987.
Feldmann, W. Whitt, “Fitting mixtures of exponentials to long-tail distributions to analyze network performance models”, In Proc. of IEEE INFOCOM, 1997.
J. Li , Y.S. Fan, M.C. Zhou “Performance modeling and analysis of workflow,” IEEE Trans. on Systems, Man, Cybernetics—Part A: Systems and Humans, vol. 34, no. 2, pp. 229-242, 2004.
R. B. J. T. Allenby and A. B. Slomson, “How to count: An introduction to combinatorics,” 2nd ed. CRC Press, pp. 51-60, 2011.
Kemper, M. Mandjes, “Mean sojourn times in two-queue fork-join systems: bounds and approximations,” OR Spectrum 34(3), 2012.
S. Lebrecht, W. J. Knottenbelt, “Response Time Approximations in Fork-Join Queues,” in Proc. of the 23rd UK Performance Engineering Workshop (UKPEW), July, 2007.
E. Argollo, A. Falcón, P. Faraboschi, M. Monchiero, D. Ortega, “COTSon: infrastructure for full system simulation,” ACM SIGOPS Operating Systems Review, v.43 n.1, Jan 2009.
Kaw, E. Kalu, “Numerical Methods with Applications,” Holistic Numerical Methods Institute, Dec. 2008.
Makowski, S. Varma, “Interpolation approximations for symmetric fork-join queues,” Performance Evaluation, vol.20, 145-165, 1994.
V. J. Reddi, B. Lee, T. Chilimbi, K. Vaid, “Web Search Using Mobile Cores: Quantifying and Mitigating the Price of Efficiency,” in Proc. of ISCA, 2010.
Gross, J. F. Shortle, J. M. Thompson, C. M. Harris, “Fundamentals of Queueing Theory,” 4th ed. John Wiley&Sons, 2008.
S. Kavulya, J. Tany, R. Gandhi, P. Narasimhan, “An analysis of traces from a production mapreduce cluster,” In 10th IEEE/ACM CCGrid, pages 94–103, 2010.
Y. Chen, S. Alspaugh, and R. H. Katz, “Design insights for mapreduce from diverse production workloads,” Technical Report UCB/EECS-2012-17, EECS Dept, University of California, Berkeley, Jan 2012.
G. Ananthanarayanan, M. Hung, X. Ren, I. Stoica, A. Wierman, and M. Yu, “GRASS: Trimming Stragglers in Approximation Analytics,” In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2014.
S. Nadarajah, S. Kotz, “The generalized Pareto sum,” HYDROLOGICAL PROCESSES 22, doi:10.1002/hyp.6602, pp.288–294, 2008.
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, “DCTCP: Efficient packet transport for the commoditized data center,” In SIGCOMM, 2010.
J. Dean and L. A. Barroso, “The tail at scale,” Communications of the ACM, Vol. 56 No. 2, pp.74-80, 2013.
Lazowska, J. Zahorjan, G. Graham, K. Sevcik, “Quantitative System Performance: Computer System Analysis Using Queueing Network Models,” Prentice-Hall, Englewood Cliffs, NJ, 1984.
J. F. C. Kingman, “The single server queue in heavy traffic,” Proc. Camb. Phil. Soc. 57, pp. 902-904, 1961.
Clauset , C. R. Shalizi , M. E. J. Newman, “Power-Law Distributions in Empirical Data,” SIAM Review, vol.51 no.4, pp.661-703, 2009.