Reliable Execution and Deployment of Workflows in Cloud Computing
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cloud computing is a paradigm that provides demand service resources like software,
hardware, platform, and infrastructure. Due to the advantages of cost-effectiveness, ondemand
provision, easy for sharing, scalability, reliability, cloud computing has grown in
popularity with research community for deploying scientific applications such as
workflows. The underlying system in the workflows is Workflow Management System,
which provides the end user with the required data and appropriate application program
for their tasks. Workflow management is a fast evolving technology which is increasingly
being exploited by businesses in a variety of industries. Its primary characteristic is the
automation of processes involving combinations of human and machine-based activities,
particularly those involving interaction with IT applications and tools. The main purpose
of a workflow management system (WfMS) is to support the definition, execution,
registration and control of business processes.
Workflow scheduling is one of the key issues in the workflow management that maps
and manages the execution of inter-dependent tasks on the distributed resources. It
allocates suitable resources to workflow tasks such that the execution can be completed to
satisfy objective functions imposed by users. Proper scheduling can have significant
impact on the performance of the system. Fault tolerance is another major concern to
guarantee availability and reliability of critical services as well as application execution.
In order to minimize failure impact on the system and application execution, failures
should be anticipated and proactively handled.
So, there is need to implement reliable execution of workflows in cloud environment.
Since in workflow, there is lot of data and compute nodes to process data, therefore
execution time is required to minimize. Therefore, a tool is required to design and specify
of task parallelism and task dependency ordering, coordinate execution of the tasks over
large computer resources, facilitate workflow reuse over multiple datasets. The thesis
discusses the various tools for generating workflow and these tools have been compared
on the basis of operating system, databases, architecture and environment. Pegasus
bridges the scientific domain and the execution environment by automatically mapping
high-level workflow descriptions onto distributed resources. It automatically locates the
necessary input data and computational resources necessary for workflow execution.
iii
Condor Scheduler manages individual workflow tasks and supervises their execution on
local and remote resources. So, the application on workflow is designed and implemented
with Pegasus. The designed workflow is analyzed, monitored and scheduled. Then the
designed workflow can be deployed on Nimbus. The workflow type application is also
designed using Oozie. The generated workflow is implemented in hadoop environment.
Hadoop is an open-source JAVA based software platform developed by Apache Software
Foundation. It lets one easy to write and run their applications on clusters to process huge
amount of data. Then the designed workflows in oozie and Pegasus have been compared
on the basis of mapper, enviorment, scheduling, and reliability with compatible cloud environments like Nimbus and Hadoop.
