Supporting IT Service Fault Recovery with an Automated Planning Method
Beschreibung
vor 13 Jahren
Despite advances in software and hardware technologies, faults are
still inevitable in a highly-dependent, human-engineered and
administrated IT environment. Given the critical role of IT
services today, it is imperative that faults, having once occurred,
have to be dealt with eciently and eeffectively to avoid or reduce
the actual losses. Nevertheless, the complexities of current IT
services, e.g., with regard to their scales, heterogeneity and
highly dynamic infrastructures, make the recovery operation a
challenging task for operators. Such complexities will eventually
outgrow the human capability to manage them. Such diculty is
augmented by the fact that there are few well-devised methods
available to support fault recovery. To tackle this issue, this
thesis aims at providing a computer-aided approach to assist
operators with fault recovery planning and, consequently, to
increase the eciency of recovery activities.We propose a generic
framework based on the automated planning theory to generate plans
for recoveries of IT services. At the heart of the framework is a
planning component. Assisted by the other participants in the
framework, the planning component aggregates the relevant
information and computes recovery steps accordingly. The main idea
behind the planning component is to sustain the planning operations
with automated planning techniques, which is one of the research
fields of articial intelligence. Provided with a general planning
model, we show theoretically that the service fault recovery
problem can be indeed solved by automated planning techniques. The
relationship between a planning problem and a fault recovery
problem is shown by means of reduction between these problems.
After an extensive investigation, we choose a planning paradigm
that based on Hierarchical Task Networks (HTN) as the guideline for
the design of our main planning algorithm called H2MAP. To sustain
the operation of the planner, a set of components revolving around
the planning component is provided. These components are
responsible for tasks such as translation between dierent knowledge
formats, persistent storage of planning knowledge and communication
with external systems. To ensure extendibility in our design, we
apply dierent design patterns for the components. We sketch and
discuss the technical aspects of implementations of the core
components. Finally, as proof of the concept, the framework is
instantiated to two distinguishing application scenarios.
still inevitable in a highly-dependent, human-engineered and
administrated IT environment. Given the critical role of IT
services today, it is imperative that faults, having once occurred,
have to be dealt with eciently and eeffectively to avoid or reduce
the actual losses. Nevertheless, the complexities of current IT
services, e.g., with regard to their scales, heterogeneity and
highly dynamic infrastructures, make the recovery operation a
challenging task for operators. Such complexities will eventually
outgrow the human capability to manage them. Such diculty is
augmented by the fact that there are few well-devised methods
available to support fault recovery. To tackle this issue, this
thesis aims at providing a computer-aided approach to assist
operators with fault recovery planning and, consequently, to
increase the eciency of recovery activities.We propose a generic
framework based on the automated planning theory to generate plans
for recoveries of IT services. At the heart of the framework is a
planning component. Assisted by the other participants in the
framework, the planning component aggregates the relevant
information and computes recovery steps accordingly. The main idea
behind the planning component is to sustain the planning operations
with automated planning techniques, which is one of the research
fields of articial intelligence. Provided with a general planning
model, we show theoretically that the service fault recovery
problem can be indeed solved by automated planning techniques. The
relationship between a planning problem and a fault recovery
problem is shown by means of reduction between these problems.
After an extensive investigation, we choose a planning paradigm
that based on Hierarchical Task Networks (HTN) as the guideline for
the design of our main planning algorithm called H2MAP. To sustain
the operation of the planner, a set of components revolving around
the planning component is provided. These components are
responsible for tasks such as translation between dierent knowledge
formats, persistent storage of planning knowledge and communication
with external systems. To ensure extendibility in our design, we
apply dierent design patterns for the components. We sketch and
discuss the technical aspects of implementations of the core
components. Finally, as proof of the concept, the framework is
instantiated to two distinguishing application scenarios.
Weitere Episoden
vor 11 Jahren
vor 11 Jahren
vor 11 Jahren
In Podcasts werben
Kommentare (0)