Category Archives: Research

Chinese Academy of Sciences

From April 2016 to June 2016, I was a visiting researcher at Chinese Academy of Sciences (CAS) in Beijing, China. Thanks to all people who accompanied me during my visit.

1st row (standing): Hoho, Yu Huang, Hou Huang, Xikun, FeiFei, Zhang Pan, Shengluan
2nd row (seated): SM Zhang, Lu Ruqian, me, Ju Wang.

frash 🙂

Stochastic Modeling and Optimization of Stragglers

Abstract: MapReduce framework is widely used to parallelize batch jobs since it exploits a high degree of multi-tasking to process them. However, it has been observed that when the number of servers increases, the map phase can take much longer than expected. This paper analytically shows that the stochastic behavior of the servers has a negative effect on the completion time of a MapReduce job, and continuously increasing the number of servers without accurate scheduling can degrade the overall performance. We analytically model the map phase in terms of hardware, system, and application parameters to capture the effects of stragglers on the performance. Mean sojourn time (MST), the time needed to sync the completed tasks at a reducer, is introduced as a performance metric and mathematically formulated. Following that, we stochastically investigate the optimal task scheduling which leads to an equilibrium property in a datacenter with different types of servers. Our experimental results show the performance of the different types of schedulers targeting MapReduce applications. We also show that, in the case of mixed deterministic and stochastic schedulers, there is an optimal scheduler that can always achieve the lowest MST.

Authors: Farshid Farhat and Diman Zad Tootaghaj from Penn State, Yuxiong He from MSR (Microsoft Research)

. The work was done during my visit from MSR in Summer 2015.

Stochastic modeling and optimization of stragglers in mapreduce framework

@phdthesis{farhat2015stochastic,
  title={Stochastic modeling and optimization of stragglers in mapreduce framework},
  author={Farhat, Farshid},
  year={2015},
  school={The Pennsylvania State University}
}

 

Stochastic modeling and optimization of stragglers

@article{farhat2016stochastic,
  title={Stochastic modeling and optimization of stragglers},
  author={Farhat, Farshid and Tootaghaj, Diman and He, Yuxiong and Sivasubramaniam, Anand and Kandemir, Mahmut and Das, Chita},
  journal={IEEE Transactions on Cloud Computing},
  year={2016},
  publisher={IEEE}
}

Thumbnail Generation by Smart Cropping

Given any aspect ratio, oval shape, or automatically AutoThumbGen generates a thumbnail based on the input image. In fact, the most prominent part of the input image is recognized and captured by the app with a proper thumbnail size. The source code is in C. The code has been also embedded in Android via JNI and PHP by exec.

Sample photos:

 

 

The contributors: Jia Li, Farshid Farhat, James Wang.

Node Architecture and Cloud Workload Characteristics Analysis

Abstract
The combined impact of node architecture and workload characteristics on off-chip network traffic with performance/cost analysis has not been investigated before in the context of emerging cloud applications. Motivated by this observation, this paper performs a thorough characterization of twelve cloud workloads using a full-system datacenter simulation infrastructure. We first study the inherent network characteristics of emerging cloud applications including message inter-arrival times, packet sizes, inter-node communication overhead, self-similarity, and traffic volume. Then, we study the effect of hardware architectural metrics on network traffic. Our experimental analysis reveals that (1) the message arrival times and packet-size distributions exhibit variances across different cloud applications, (2) the inter-arrival times imply a large amount of self-similarity as the number of nodes increase, (3) the node architecture can play a significant role in shaping the overall network traffic, and finally, (4) the applications we study can be broadly divided into those which perform better in a scale-out or scale-up configuration at node level and into two categories, namely, those that have long-duration, low-burst flows and those that have short-duration, high-burst flows. Using the results of (3) and (4), the paper discusses the performance/cost trade-offs for scale-out and scale-up approaches and proposes an analytical model that can be used to predict the communication and computation demand for different configurations. It is shown that the difference between two different node architecture’s performance per dollar cost (under same number of cores system wide) can be as high as 154 percent which disclose the need for accurate characterization of cloud applications before wasting the precious cloud resources by allocating wrong architecture. The results of this study can be used for system modeling, capacity planning and managing heterogeneous resources for large-scale system designs.

Full Text > Combined Impact of Node Architecture and Cloud Workload Characteristics

More info > Diman Zad Tootaghaj ‘s Publications

Optimal Scheduling in Parallel Programming Frameworks

FORK-JOIN QUEUE MODELING AND OPTIMAL SCHEDULING IN PARALLEL PROGRAMMING FRAMEWORKS

ABSTRACT

MapReduce framework is widely used to parallelize batch jobs since it exploits a high degree of multi-tasking to process them. However, it has been observed that when the number of servers increases, the map phase can take much longer than expected. This thesis analytically shows that the stochastic behavior of the servers has a negative effect on the completion time of a MapReduce job, and continuously increasing the number of servers without accurate scheduling can degrade the overall performance. We analytically model the map phase in terms of hardware, system, and application parameters to capture the effects of stragglers on the performance. Mean sojourn time (MST), the time needed to sync the completed tasks at a reducer, is introduced as a performance metric and mathematically formulated. Following that, we stochastically investigate the optimal task scheduling which leads to an equilibrium property in a datacenter with different types of servers. Our experimental results show the performance of the different types of schedulers targeting MapReduce applications. We also show that, in the case of mixed deterministic and stochastic schedulers, there is an optimal scheduler that can always achieve the lowest MST.

 

KEYWORDS

Stochastic processes, Computational model, Delayed Tailed Distribution, Optimal scheduling, Cloud computing, Synchronization, Queuing Theory, MapReduce, Stochastic Modeling, Performance Evaluation, Fork-Join Queue.

Optimal Placement in Network On-Chip

Abstract:
Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches cannot increase because of design, cost and technology limits. Then split I-cache and D-cache are used with shared LLC (last level cache). For a unified shared LLC, bus interface is not scalable, and it seems that distributed shared LLC (DSLLC) is a better choice. Most of papers assume a distributed shared LLC beside each core in on-chip network. Many works assume that DSLLCs are placed in all cores; however, we will show that this design ignores the effect of traffic congestion in on-chip network. In fact, our work focuses on optimal placement of cores, DSLLCs and even memory controllers to minimize the expected latency based on traffic load in a mesh on-chip network with fixed number of cores and total cache capacity. We try to do some analytical modeling deriving intended cost function and then optimize the mean delay of the on-chip network communication. This work is supposed to be verified using some traffic patterns that are run on CSIM simulator.

Full text @ OPCCMCNOC

Towards Stochastically Optimizing Data Computing Flows

Abstract:
With rapid growth in the amount of unstructured data produced by memory-intensive applications, large scale data analytics has recently attracted increasing interest. Processing, managing and analyzing this huge amount of data poses several challenges in cloud and data center computing domain. Especially, conventional frameworks for distributed data analytics are based on the assumption of homogeneity and non-stochastic distribution of different data-processing nodes. The paper argues the fundamental limiting factors for scaling big data computation. It is shown that as the number of series and parallel computing servers increase, the tail (mean and variance) of the job execution time increase. We will first propose a model to predict the response time of highly distributed processing tasks and then propose a new practical computational algorithm to optimize the response time.

 

Big Data Computing: Modeling and Optimization

Abstract:
MapReduce framework is widely used to parallelize batch jobs since it exploits a high degree of multi-tasking to process them. However, it has been observed that when the number of servers increases, the map phase can take much longer than expected. This thesis analytically shows that the stochastic behavior of the servers has a negative effect on the completion time of a MapReduce job, and continuously increasing the number of servers without accurate scheduling can degrade the overall performance. We analytically model the map phase in terms of hardware, system, and application parameters to capture the effects of stragglers on the performance. Mean sojourn time (MST), the time needed to sync the completed tasks at a reducer, is introduced as a performance metric and mathematically formulated. Following that, we stochastically investigate the optimal task scheduling which leads to an equilibrium property in a datacenter with different types of servers. Our experimental results show the performance of the different types of schedulers targeting MapReduce applications. We also show that, in the case of mixed deterministic and stochastic schedulers, there is an optimal scheduler that can always achieve the lowest MST.

• Farshid Farhat, Diman Zad Tootaghaj, Anand Sivasubramaniam, Mahmut Kandemir, and Chita R. Das are with the school of electrical engineering and computer science, the Pennsylvania State University, University Park, PA, 16802, USA. Email: {fuf111,dxz149,anand,kandemir,das}@cse.psu.edu.

• Yuxiong He is with the Cloud Computing Futures group, the Microsoft Research, Redmond, WA 98052 USA. Email: yuxhe@microsoft.com.

• The work was done during my visit from MSR in June 2016 in Redmond WA.

Blind detection of low-rate embedding

Abstract:

Steganalysis of least significant bit (LSB) embedded images in spatial domain has been investigated extensively over the past decade and most well-known LSB steganography methods have been shown to be detectable. However, according to the latest findings in the area, two major issues of very low-rate (VLR) embedding and content-adaptive steganography have remained hard to resolve. The problem of VLR embedding is indeed a generic problem to any steganalyser, while the issue of adaptive embedding specifically depends on the hiding algorithm employed. The latter challenge has recently been brought up again to the area of LSB steganalysis by highly undetectable stego image steganography that offers a content-adaptive embedding scheme for grey-scale images. The authors new image steganalysis method suggests analysis of the relative norm of the image Clouds manipulated in an LSB embedding system. The method is a self-dependent image analysis and is capable of operating on low-resolution images. The proposed algorithm is applied to the image in spatial domain through image Clouding, relative auto-decorrelation features extraction and quadratic rate estimation, as the main steps of the proposed analysis procedure. The authors then introduce and use new statistical features, Clouds-Min-Sum and Local-Entropies-Sum, which improve both the detection accuracy and the embedding rate estimation. They analytically verify the functionality of the scheme. Their simulation results show that the proposed approach outperforms some well known, powerful LSB steganalysis schemes, in terms of true and false detection rates and mean squared error.

 

Modeling and Optimization of Straggling Mappers

ABSTRACT
MapReduce framework is widely used to parallelize batch jobs since it exploits a high degree of multi-tasking to process them. However, it has been observed that when the number of mappers increases, the map phase can take much longer than expected. This paper analytically shows that stochastic behavior of mapper nodes has a negative effect on the completion time of a MapReduce job, and continuously increasing the number of mappers without accurate scheduling can degrade the overall performance. We analytically capture the effects of stragglers (delayed mappers) on the performance. Based on an observed delayed exponential distribution (DED) of the response time of mappers, we then model the map phase by means of hardware, system, and application parameters. Mean sojourn time (MST), the time needed to sync the completed map tasks at one reducer, is mathematically formulated. Following that, we optimize MST by finding the task inter-arrival time to each mapper node. The optimal mapping problem leads to an equilibrium property investigated for different types of inter-arrival and service time distributions in a heterogeneous datacenter (i.e., a datacenter with different types of nodes). Our experimental results show the performance and important parameters of the different types of schedulers targeting MapReduce applications. We also show that, in the case of mixed deterministic and stochastic schedulers, there is an optimal scheduler that can always achieve the lowest MST.

[Tech Report] [Master Thesis] [IEEE Trans]

Last version > MapReduce_Performance_Optimization