Autonomic Fault Tolerant Scheduling for Multiple Workflows in Cloud Environment

Bala, Anju

Autonomic Fault Tolerant Scheduling for Multiple Workflows in Cloud Environment

Files

3360.pdf (2.29 MB)

Date

2015-06-08T08:32:49Z

Authors

Bala, Anju

Supervisors

Chana, Inderveer

Abstract

Cloud Computing is becoming an increasingly admired paradigm that owns the characteristics of existing paradigms through strong support for virtualization along with various additional features such as on demand resource provisioning, reduced cost, computing flexibility etc. Most of the scientific communities employ workflow technologies to cope up with the complexity and heterogeneity of large scale scientific applications. As, the scientific workflows need a suitable paradigm for deployment and execution in conjunction with high availability of Cloud services. Thus, Cloud is a current benchmark for effective facilitation of the execution of scientific workflows through flexibility of accessible services such as Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) without allusion to the infrastructure on which these applications are hosted. For the successful execution of the scientific workflows on Clouds, Cloud platform should be able to manage the faults through autonomic fault tolerant approaches during the scheduling of workflow tasks on Cloud resources. Cloud providers also entail efficient scheduling algorithms to schedule these workflows along with autonomic fault tolerant approaches. Although, Cloud Computing technology has evolved but still some of the key challenges like autonomic fault tolerance and workflow scheduling need to be achieved. To achieve the set of challenges for the fault tolerant workflow scheduling, a comprehensive study of workflow scheduling algorithms along with the required set of Quality of Service(QoS) parameters is carried out. In addition, Cloud platforms and workflow engines are extensively explored and metrics relevant to the Cloud services are ascertained. Furthermore, a thorough study of failure prediction approaches, fault tolerant techniques and fault tolerant scheduling has been performed. Based on the literature survey, it is evident that the key challenge in scheduling workflow applications on Cloud that needs to be addressed is fault tolerant scheduling in autonomic way. Hence, to address the assorted challenges, autonomic fault tolerant scheduling for multiple workflows is the main focus of this research work. The proposed solution has been implemented in three stages. At first, intelligent failure prediction model has been proposed to predict the task failures for scientific applications. Multiple scientific workflow data has been collected, analysed and pre-processed for classifying as task failed or task not failed. Various machine learning approaches such as Naive Bayes, Artificial Neural Networks (ANN), Logistic Regression (LR) and Random Forest have been executed and compared underneath to predict the task failure. The performance metrics have been evaluated to compare the accuracy of the available machine learning approaches for predicted and actual failures. The comparative analysis based on the performance evaluation parameters has shown the effectiveness of Naive Bayes with maximum accuracy and minimum error in comparison to other implemented approaches. Thus, the proposed failure prediction model using Naive Bayes has been employed to predict the task failures intelligently before the occurrence of actual task failures. Secondly, VM migration policy has been intended through proposed task failure model for implementing an autonomic fault tolerant technique. The proposed policy evaluates the correlation between resource utilization parameters such as RAM, CPU, Bandwidth and Disk I/O with allocated virtual machines and also finds the inter-correlation between VMs. Then, the VM which has maximum correlation and minimum inter-correlation factor is migrated to another host as to handle the task failures automatically. The proposed technique is further utilized for scheduling multiple workflow applications. Finally, autonomic fault tolerant workflow scheduling algorithm has been proposed that firstly schedules multiple scientific workflows using earliest deadline first based approach, and then proposed hybrid heuristic schedules the tasks of the highest priority workflow. Hybrid heuristic combines the features of Max-Child, Min-Min and FCFS heuristic. To optimally utilize the resources, the proposed heuristic first schedules and executes that task which has maximum number of child nodes, if two or more of the tasks have same number of child nodes, then Min-Min heuristic would be used, again if the two tasks have same execution time, then FCFS has been considered. Furthermore, the proposed algorithm uses the autonomic VM migration approach to migrate VMs to other working host before the occurrence of the actual task failure. Various scheduling factors have been defined to evaluate the performance of proposed scheduling algorithm and multiple scientific applications have also been detailed as workflows to schedule on Cloud resources using proposed approach. The performance of the proposed intelligent failure prediction model has been evaluated on Pegasus and Amazon EC2. Pegasus bridges the scientific domain and the execution environment by automatically mapping high-level workflow descriptions onto Cloud resources and these workflows are further deployed on Amazon EC2. The experimental results have proved that the average accuracy for all the workflows using Naive Bayes is maximum (93%) among other approaches ANN, LR and Random Forest whereas existing model has the average accuracy of (85%). Thus, the proposed model is more effective with Naive Bayes approach for predicting task failures as well as resource failures by considering resource utilization for multiple scientific applications. Then, autonomic fault tolerant technique has been validated by reducing mean execution time, standard deviation of mean time, number of VM migrations and SLAV (Service Level Agreement Violations). The proposed autonomic fault tolerant workflow scheduling technique has also been verified for multiple scientific workflows such as Montage, Cybershake, SIPHT, Epigenome and Inspiral. The proposed technique has been successful in reducing the average makespan for these scientific workflows. These results have been validated using the existing heuristics such as Particle Swarm Optmization (PSO) , Genetic Algorithm (GA) , Min-Min, Max-Min , MCT and proposed hybrid heuristics. It is verified for Epigenome workflow that the PSO performs better than Max-Min and MCT, whereas GA also enhances the performance over Max-Min, Min-Min, MCT, and PSO. Similarly, for Cybershake workflow, PSO performs better than Max-Min, Min-Min, MCT and GA. The proposed hybrid heuristic outperforms all the existing heuristics by reducing the average makespan for large-scale scientific workflows of 1000 jobs such as Cybershake and Epigenome.

Description

PhD Thesis

Keywords

Cloud Computing, Workflow scheduling, Autonomic fault tolerance, Computer Science

URI

http://hdl.handle.net/10266/3360

Collections

Doctoral Theses@CSED

Full item page

Autonomic Fault Tolerant Scheduling for Multiple Workflows in Cloud Environment

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By