Autonomic Fault Tolerant Scheduling for Multiple Workflows in Cloud Environment
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cloud Computing is becoming an increasingly admired paradigm that owns the
characteristics of existing paradigms through strong support for virtualization
along with various additional features such as on demand resource provisioning,
reduced cost, computing flexibility etc. Most of the scientific communities employ
workflow technologies to cope up with the complexity and heterogeneity of large
scale scientific applications. As, the scientific workflows need a suitable paradigm
for deployment and execution in conjunction with high availability of Cloud services.
Thus, Cloud is a current benchmark for effective facilitation of the execution
of scientific workflows through flexibility of accessible services such as Infrastructure
as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service
(SaaS) without allusion to the infrastructure on which these applications are
hosted. For the successful execution of the scientific workflows on Clouds, Cloud
platform should be able to manage the faults through autonomic fault tolerant
approaches during the scheduling of workflow tasks on Cloud resources. Cloud
providers also entail efficient scheduling algorithms to schedule these workflows
along with autonomic fault tolerant approaches. Although, Cloud Computing
technology has evolved but still some of the key challenges like autonomic fault
tolerance and workflow scheduling need to be achieved.
To achieve the set of challenges for the fault tolerant workflow scheduling, a comprehensive
study of workflow scheduling algorithms along with the required set
of Quality of Service(QoS) parameters is carried out. In addition, Cloud platforms
and workflow engines are extensively explored and metrics relevant to the
Cloud services are ascertained. Furthermore, a thorough study of failure prediction
approaches, fault tolerant techniques and fault tolerant scheduling has been
performed. Based on the literature survey, it is evident that the key challenge
in scheduling workflow applications on Cloud that needs to be addressed is fault
tolerant scheduling in autonomic way. Hence, to address the assorted challenges,
autonomic fault tolerant scheduling for multiple workflows is the main focus of
this research work.
The proposed solution has been implemented in three stages. At first, intelligent
failure prediction model has been proposed to predict the task failures for scientific
applications. Multiple scientific workflow data has been collected, analysed
and pre-processed for classifying as task failed or task not failed. Various machine
learning approaches such as Naive Bayes, Artificial Neural Networks (ANN), Logistic
Regression (LR) and Random Forest have been executed and compared
underneath to predict the task failure. The performance metrics have been evaluated
to compare the accuracy of the available machine learning approaches for
predicted and actual failures. The comparative analysis based on the performance
evaluation parameters has shown the effectiveness of Naive Bayes with maximum
accuracy and minimum error in comparison to other implemented approaches.
Thus, the proposed failure prediction model using Naive Bayes has been employed
to predict the task failures intelligently before the occurrence of actual task failures.
Secondly, VM migration policy has been intended through proposed task failure
model for implementing an autonomic fault tolerant technique. The proposed
policy evaluates the correlation between resource utilization parameters such as
RAM, CPU, Bandwidth and Disk I/O with allocated virtual machines and also
finds the inter-correlation between VMs. Then, the VM which has maximum
correlation and minimum inter-correlation factor is migrated to another host as to
handle the task failures automatically. The proposed technique is further utilized
for scheduling multiple workflow applications.
Finally, autonomic fault tolerant workflow scheduling algorithm has been proposed
that firstly schedules multiple scientific workflows using earliest deadline first based
approach, and then proposed hybrid heuristic schedules the tasks of the highest
priority workflow. Hybrid heuristic combines the features of Max-Child, Min-Min
and FCFS heuristic. To optimally utilize the resources, the proposed heuristic first
schedules and executes that task which has maximum number of child nodes, if
two or more of the tasks have same number of child nodes, then Min-Min heuristic
would be used, again if the two tasks have same execution time, then FCFS has
been considered. Furthermore, the proposed algorithm uses the autonomic VM
migration approach to migrate VMs to other working host before the occurrence of
the actual task failure. Various scheduling factors have been defined to evaluate the
performance of proposed scheduling algorithm and multiple scientific applications
have also been detailed as workflows to schedule on Cloud resources using proposed
approach.
The performance of the proposed intelligent failure prediction model has been evaluated
on Pegasus and Amazon EC2. Pegasus bridges the scientific domain and
the execution environment by automatically mapping high-level workflow descriptions
onto Cloud resources and these workflows are further deployed on Amazon
EC2. The experimental results have proved that the average accuracy for all the
workflows using Naive Bayes is maximum (93%) among other approaches ANN,
LR and Random Forest whereas existing model has the average accuracy of (85%).
Thus, the proposed model is more effective with Naive Bayes approach for predicting
task failures as well as resource failures by considering resource utilization
for multiple scientific applications. Then, autonomic fault tolerant technique has
been validated by reducing mean execution time, standard deviation of mean time,
number of VM migrations and SLAV (Service Level Agreement Violations).
The proposed autonomic fault tolerant workflow scheduling technique has also
been verified for multiple scientific workflows such as Montage, Cybershake, SIPHT,
Epigenome and Inspiral. The proposed technique has been successful in reducing
the average makespan for these scientific workflows. These results have been validated
using the existing heuristics such as Particle Swarm Optmization (PSO) ,
Genetic Algorithm (GA) , Min-Min, Max-Min , MCT and proposed hybrid heuristics.
It is verified for Epigenome workflow that the PSO performs better than
Max-Min and MCT, whereas GA also enhances the performance over Max-Min,
Min-Min, MCT, and PSO. Similarly, for Cybershake workflow, PSO performs better
than Max-Min, Min-Min, MCT and GA. The proposed hybrid heuristic outperforms
all the existing heuristics by reducing the average makespan for large-scale
scientific workflows of 1000 jobs such as Cybershake and Epigenome.
Description
PhD Thesis
