A Proactive Fault Tolerant Approach for Grid Environment
Loading...
Authors
Supervisors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Grid technology has emerged as a new way of large-scale distributed computing with
high-performance orientation. Grid Computing tries to bring, under one definitional
umbrella all the work being done in the high performance, cluster, peer-to-peer, and
Internet computing arenas. Grid Computing solutions are constructed using a variety
of technologies and open standards. Grid Computing, in turn, provides highly scalable,
highly secure, and extremely high-performance mechanisms for discovering and
negotiating access to remote computing resources in a seamless manner. This makes it
possible for the sharing of computing resources, on an unprecedented scale, among an
infinite number of geographically distributed groups. The increasing complexity of
Grid services and systems demands correspondingly larger human effort for system
configuration and performance management. The management issues, which are
mainly done in a manual style today, become time-consuming, error-prone and even
unmanageable for human administrators.
The fault tolerance is the major area of concern in the self-management of the Grid.
The terms self-management or self-healing have been borrowed from Autonomic
Computing. Autonomic Computing helps to address complexity by using technology
to manage technology
In this work, a system for the coordinated, Autonomic management (self-healing) of
multiple clusters in a Grid has been discussed. The clusters are integrated into the
Grid making the Grid Fault tolerant. The existing fault tolerance mechanism in the
Grid uses the reactive approaches. The focus of the thesis is to design a prototype for
proactive fault tolerance in Grid environment.
Description
ME(SE)
