The high-tech industry has spent decades creating computer systems with ever mounting degrees of complexity to solve a wide variety of business problems. Ironically, complexity itself has become part of the problem. As networks and distributed systems grow and change, they can become increasingly hampered by system deployment failures, hardware and software issues, not to mention human error. Such scenarios in turn require further human intervention to enhance the performance and capacity of IT components.This drives up the overall IT costs—even though technology component costs continue to decline. As a result, many IT professionals seek ways to improve their return on investment in their IT infrastructure, by reducing the total cost of ownership of their environments while improving the quality of service for users.
Self managing computing helps address the complexity issues by using technology to manage technology. The idea is not new many of the major players in the industry have developed and delivered products based on this concept. Self managing computing is also known as autonomic computing.
The term autonomic is derived from human biology. The autonomic nervous system monitors your heartbeat, checks your blood sugar level and keeps your body temperature close to 98.6°F, without any conscious effort on your part. In much the same way, self managing computing components anticipate computer system needs and resolve problems with minimal human intervention.
Self managing computing systems have the ability to manage themselves and dynamically adapt to change in accordance with business policies and objectives. Self-managing systems can perform management activities based on situations they observe or sense in the IT environment. Rather than IT professionals initiating management activities, the system observes something about itself and acts accordingly. This allows the IT professional to focus on high-value tasks while the technology manages the more mundane operations. Self managing computing can result in a significant improvement in system management efficiency, when the disparate technologies that manage the environment work together to deliver performance results system wide.
However, complete autonomic systems do not yet exist. This is not a proprietary solution. It's a radical change in the way businesses, academia, and even the government design, develop, manage and maintain computer systems. Self managing computing calls for a whole new area of study and a whole new way of conducting business.
Self managing computing is the self-management of e-business infrastructure, balancing what is managed by the IT professional and what is managed by the system. It is the evolution of e-business.
What is self managing computing?
Self managing computing is about freeing IT professionals to focus on high-value tasks by making technology work smarter. This means letting computing systems and infrastructure take care of managing themselves. Ultimately, it is writing business policies and goals and letting the infrastructure configure, heal and optimize itself according to those policies while protecting itself from malicious activities. Self managing computing systems have the ability to manage themselves and dynamically adapt to change in accordance with business policies and objectives.
In an autonomic environment the IT infrastructure and its components are Self-managing. Systems with self-managing components reduce the cost of owning and operating computer systems. Self-managing systems can perform management activities based on situations they observe or sense in the IT environment. Rather than IT professionals initiating management activities, the system observes something about itself and acts accordingly. This allows the IT professional to focus on high-value tasks while the technology manages the more mundane operations. IT infrastructure components take on the following characteristics: self-configuring, self-healing, self-optimizing and self-protecting.
Systems adapt automatically to dynamically changing environments. When hardware and software systems have the ability to define themselves “on-the fly,” they are self-configuring. This aspect of self-managing means that new features, software, and servers can be dynamically added to the enterprise infrastructure with no disruption of services. Self-configuring not only includes the ability for each individual system to configure itself on the fly, but also for systems within the enterprise to configure themselves into the e-business infrastructure of the enterprise. The goal of self managing computing is to provide self-configuration capabilities for the entire IT infrastructure, not just individual servers, software, and storage devices.
Systems discover, diagnose, and react to disruptions. For a system to be self-healing, it must be able to recover from a failed component by first detecting and isolating the failed component, taking it off line, fixing or isolating the failed component, and reintroducing the fixed or replacement component into service without any apparent application disruption. Systems will need to predict problems and take actions to prevent the failure from having an impact on applications. The self-healing objective must be to minimize all outages in order to keep enterprise applications up and available at all times. Developers of system components need to focus on maximizing the reliability and availability design of each hardware and software product toward continuous availability.
Systems monitor and tune resources automatically. Self-optimization requires hardware and software systems to efficiently maximize resource utilization to meet end-user needs without human intervention. Features must be introduced to allow the enterprise to optimize resource usage across the collection of systems within their infrastructure, while also maintaining their flexibility to meet the ever-changing needs of the enterprise.
Systems anticipate, detect, identify, and protect themselves from attacks from anywhere. Self-protecting systems must have the ability to define and manage user access to all computing resources within the enterprise, to protect against unauthorized resource access, to detect intrusions and report and prevent these activities as they occur, and to provide backup and recovery capabilities that are as secure as the original resource management systems. Systems will need to build on top of a number of core security technologies already available today. Capabilities must be provided to more easily understand and handle user identities in various contexts, removing the burden from administrators.
Characteristics – The Eight Elements
· To be autonomic, a system needs to “know itself”— and consist of components that also possess a system identity.
· An autonomic system must configure and reconfigure itself under varying and unpredictable conditions.
· An autonomic system never settles for the status quo—it always looks for ways to optimize its workings.
· An autonomic system must perform something akin to healing—it must be able to recover from routine and extraordinary events that might cause some parts to malfunction.
· A virtual world is no less dangerous than the physical one, so an self managing computing system must be an expert in self-protection.
· An self managing computing system knows its environment and the context surrounding its activity, and acts accordingly.
· An autonomic system cannot exist in a hermetic environment (and must adhere to open standards).
· Perhaps most critical for the user, an self managing computing system will anticipate the optimized resources needed to meet a user’s information needs while keeping its complexity hidden.
Path to Self managing computing
Delivering system wide autonomic environments is an evolutionary process enabled by technology, but it is ultimately implemented by each enterprise through the adoption of these technologies and supporting processes. The path to self managing computing can be thought of in five levels. These levels, defined below, start at basic and continue through managed, predictive, adaptive and finally autonomic.
1. Basic level—
A starting point of IT environment. Each infrastructure element is managed independently by IT professionals who set it up, monitor it and eventually replace it.
2. Managed level
Systems management technologies can be used to collect information from disparate systems onto fewer consoles, reducing the time it takes for the administrator to collect and synthesize information as the IT environment becomes more complex.
3. Predictive level
New technologies are introduced to provide correlation among several infrastructure elements. These elements can begin to recognize patterns, predict the optimal configuration and provide advice on what course of action the administrator should take.
4. Adaptive level
As these technologies improve and as people become more comfortable with the advice and predictive power of these systems, we can progress to the adaptive level, where the systems themselves can automatically take the right actions based on the information that is available to them and the knowledge of what is happening in the system.
5. Autonomic level
The IT infrastructure operation is governed by business policies and objectives. Users interact with the autonomic technology to monitor the business processes, alter the objectives, or both.
Research into creating autonomic systems won't be easy, but future computer systems will have to incorporate increased levels of automation if we expect them to manage the ballooning amount of data, the ever-expanding network and the increasing might of processing power.
To create autonomic systems researchers must address key challenges with varying levels of complexity. Here is a partial list of the challenges we face.
- System identity: Before a system can transact with other systems it must know the extent of its own boundaries. How will we design our systems to define and redefine themselves in dynamic environments?
- Interface design: With a multitude of platforms running, system administrators face a briar patch of knobs. How will we build consistent interfaces and points of control while allowing for a heterogeneous environment?
- Translating business policy into I/T policy: The end result needs to be transparent to the user. How will we create human interfaces that remove complexity and allow users to interact naturally with I/T systems?
- Systemic approach: Creating autonomic components is not enough. How can we unite a constellation of autonomic components into a federated system?
- Standards: The age of proprietary solutions is over. How can we design and support open standards that will work?
- Adaptive algorithms: New methods will be needed to equip our systems to deal with changing environments and transactions. How will we create adaptive algorithms to take previous system experience and use that information to improve the rules?
- Improving network-monitoring functions to protect security, detect potential threats and achieve a level of decision-making that allows for the redirection of key activities or data.
- Smarter microprocessors that can detect errors and anticipate failures.
Self managing computing Architecture Concepts
A standard set of functions and interactions govern the management of the IT system and its resources, including client, server, database manager or Web application server. This is represented by a control loop (shown in the diagram below) that acts as a manager of the resource through monitoring, analysis and taking action based on a set of policies.
These control loops, or managers, can communicate with each other in a peer-to-peer context and with higher-level managers. For example, a database system needs to work with the server, storage subsystem, storage management software, the Web server and other system elements to achieve a self-managing IT environment. The pyramid shown below represents the hierarchy in which self managing computing technologies will operate.
The bottom layer of the pyramid consists of the resource elements of an enterprise networks, servers, storage devices, applications, middleware and personal computers. Self managing computing begins in the resource element layer, by enhancing individual components to configure, optimize, heal and protect themselves.
Moving up the pyramid, resource elements are grouped into composite resources, which begin to communicate with each other to create self-managing systems. This can be represented by a pool of servers that work together to dynamically adjust workload and configuration to meet certain performance and availability thresholds.
At the highest layer of the pyramid composite resources are tied to business solutions, such as a customer care system or an electronic auction system. True autonomic activity occurs at this level.
In an autonomic environment, components work together, communicating with each other and with high-level management tools. They regulate themselves and, sometimes, each other. They can proactively manage the system, while hiding the inherent complexity of these activities from end users and IT professionals.
Another aspect of the self managing computing architecture is shown in the diagram below. This portion of the architecture details the functions that can be provided for the control loops. The architecture organizes the control loops into two major elements—a managed element and an autonomic manager. A managed element is what the autonomic manager is controlling. An autonomic manager is a component that implements a particular control loop.
The managed element is a controlled system component. There can be a single resource (a server, database server or router) or a collection of resources (a pool of servers, cluster or business application). The managed element is controlled through its sensors and effectors:
· The sensors provide mechanisms to collect information about the state and state transition of an element. To implement the sensors, you can either use a set of “get” operations to retrieve information about the current state, or a set of management events (unsolicited, asynchronous messages or notifications) that flow when the state of the element changes in a significant way.
· The effectors are mechanisms that change the state (configuration) of an element. In other words, the effectors are a collection of “set” commands or application programming interfaces (APIs) that change the configuration of the managed resource in some important way.
The combination of sensors and effectors form the manageability interface that is available to an autonomic manager. As shown in the figure above, by the black lines connecting the elements on the sensors and effectors sides of the diagram, the architecture encourages the idea that sensors and effectors are linked together. For example, a configuration change that occurs through effectors should be reflected as a configuration change notification through the sensor interface.
The autonomic manager is a component that implements the control loop. The architecture dissects the loop into four parts that share knowledge:
· The monitor part provides the mechanisms that collect, aggregate, filter, manage and report details (metrics and topologies) collected from an element.
· The analyze part provides the mechanisms to correlate and model complex situations. These mechanisms allow the autonomic manager to learn about the IT environment and help predict future situations.
· The plan part provides the mechanisms to structure the action needed to achieve goals and objectives. The planning mechanism uses policy information to guide its work.
· The execute part provides the mechanisms that control the execution of a plan with considerations for on-the-fly updates.
The four parts work together to provide the control loop functionality. The diagram shows a structural arrangement of the parts—not a control flow. The bold line that connects the four parts should be thought of as a common messaging bus rather than a strict control flow. In other words, there can be situations where the plan part may ask the monitor part to collect more or less information. There could also be situations where the monitor part may trigger the plan part to create a new plan. The four parts collaborate using asynchronous communication techniques, like a messaging bus.
The sensors and effectors provided by the autonomic manager facilitate collaborative interaction with other autonomic managers. In addition, autonomic managers can communicate with each other in both peer-to-peer and hierarchical arrangements. The numerous autonomic managers in a complex IT system must work together to deliver self managing computing to achieve common goals.
Autonomic manager knowledge
Data used by the autonomic manager’s four components are stored as shared knowledge. The shared knowledge includes things like topology information, system logs, performance metrics and policies.
The knowledge used by a particular autonomic manager could be created by the monitor part, based on the information collected through sensors, or passed into the autonomic manager through its effectors. An example of the former occurs when the monitor part creates knowledge based on recent activities by logging the notification it receives from a managed element into a system log. An example of the latter is policy. A policy consists of a set of behavioral constraints or preferences that influence the decisions made by an autonomic manager. Specifically, the plan part of an autonomic manager is responsible for interpreting and translating policy details. The analysis part is responsible for determining if the autonomic manager can abide by the policy, now and in the future.
Implementing self managing computing
Shifting the burden of managing systems to self-managing technologies does not happen overnight and cannot be solely accomplished by acquiring new products. Skills within the organization need to adapt, and processes need to change to create new benchmarks of success.
As companies progress through the five levels of self managing computing, the processes, tools and benchmarks become increasingly sophisticated, and the skills requirement becomes more closely aligned with the business.
The basic level represents the starting point for many IT organizations. If IT organizations are formally measured, they are typically evaluated on the time required to finish major tasks and fix major problems. The IT organization is viewed as a cost center, with variable labor costs preferred over an investment in centrally coordinated systems management tools and processes.
In the managed level IT organizations are measured on the availability of their managed resources, their time to close trouble tickets in their problem management system and their time to complete formally tracked work requests. To improve on these measurements, IT organizations document their processes and continually improve them through manual feedback loops and adoption of best practices. IT organizations gain efficiency through consolidation of management tools to a set of strategic platforms and through a hierarchical problem management triage organization.
In the predictive level IT organizations are measured on the availability and performance of their business systems and their return on investment. To improve, IT organizations measure, manage and analyze transaction performance. The critical nature of the IT organization’s role in business success is understood. Predictive tools are used to project future IT performance, and many tools make recommendations to improve future performance.
In the adaptive level IT resources are automatically provisioned and tuned to optimize transaction performance. Business policies, business priorities and service-level agreements guide the autonomic infrastructure behavior. IT organizations are measured on comprehensive business system response times (transaction performance), the degree of efficiency of the IT infrastructure and their ability to adapt to shifting workloads.
In the autonomic level IT organizations are measured on their ability to make the business successful. To improve business measurements they understand the financial metrics associated with e-business activities and supporting IT activities. Advanced modeling techniques are used to optimize e-business performance and quickly deploy newly optimized e-business solutions.