A problem is an integral part of service delivery. Especially in the IT service management context, every other day there is a new software release, a new security patch, a change in the systems infrastructure, and other important change requests which need to be communicated to the customer. In such a dynamic service environment, service interruptions are bound to occur and these interruptions need to be diagnosed and rectified. These problems, however small they may appear, can sometimes lead to an unhappy customer and in turn result in severe consequences to the business financially and in terms of brand image.
Problem Management Process
Problem Definition:
Understanding a problem needs an understanding of the concept of an Incident in the ITIL context. According to ITIL, an ‘Incident’ is defined as an unplanned interruption to an IT service or reduction in the quality of an IT service or a failure of a ‘configuration item’ that has not yet impacted the IT service.
Therefore, a problem is the cause of one or more Incidents. Any recurring service interruption is due to an underlying problem. If this problem is identified and fixed, you can prevent the recurrence of similar service interruptions.
Objectives of Problem Management
Problem Management has 3 key objectives:
- Prevent Problems (and resulting Incidents) from happening
- Eliminate repetitive Incidents
- Reduce the impact of Incidents that cannot be prevented
Types of Problem Management
- Proactive Problem Management: Proactive problem management is the prevention of Incidents from recurring.
- Reactive Problem Management: The process of fixing recurring incidents and minimizing their impact on the service performance
This is a typical problem management process flow:
A problem is detected by performing a Root Cause Analysis (RCA) on recurring incidents. Whenever a problem is detected, it needs to be logged and then categorized and prioritized based upon the severity and forecasted impact. The problem is then diagnosed and investigated. If a full resolution is found, it is then fixed, closed and updated in the log.
However, if a full resolution has not been found, then the impact needs to be minimized temporarily. To do this, you need to come up with a temporary fix, which ITIL calls a ‘workaround’. Once a workaround is found, you need to raise a ‘known error record’. A known error record is a problem with a known root cause and a workaround without a full resolution available yet.
For example, you have recently released an updated version of your mobile application and a customer has called the support team with a complaint that his phone hangs when he opens your app. The workaround would be to ask the customer to close other apps in his phone and open this app with a better internet connection. However, this would turn out to be a temporary fix if you have several other customers calling with the same problem. Then you need to look into the root cause of the problem which could be some kind of a bug or a compatibility issue in your app.
Proactive Problem Management
Most organizations have well-designed reactive problem management. However, what organizations lack is proactive problem management. Very few service organizations realize the value of preventing problems before they occur and the benefits of doing so.
ITIL says “proactive problem management is concerned with identifying and solving problems and known errors before further incidents related to them can occur again”. Proactive problem management prevents the recurrence of incidents, not problems. Therefore, proactive problem management can be a tricky thing. If you delve too much into trying and preventing future problems from occurring, you will tend to tamper with the availability of resources, capacity management, risk mediation, and change management.
A cost-benefit analysis needs to be performed on the proactive problem management process. If the cost of implementing this process is going to be more than the cost incurred when the problem actually occurs, then it does not make financial sense to go ahead with the process.
From the above discussion, it can be seen that proactive and reactive problem management are two sides of a coin. A problem is detected and you fix it – this is reactive. When you fix a problem, you are also preventing further incidents from recurring – this is proactive problem management. Problem management is a critical piece of the IT service delivery jigsaw for which ITIL recommends a simple and effective process. This process again is not of a prescriptive nature. It can be customized to fit the needs of your service delivery process.
From the above discussion, you can see that proactive and reactive problem management are two sides of a coin. A problem is detected and you fix it – this is reactive. When you fix a problem, you are also preventing further Incidents from recurring – this is proactive problem management. Problem management is a critical piece of the IT service delivery jigsaw for which ITIL recommends a simple and effective process. This process again is not of a prescriptive nature. It can be customized to fit the needs of your service delivery process.