ITIL V3 Service Operation disconnect between Incident and Problem Management

Submitted by skeptic on Mon, 2009-09-07 08:40

Share this post with

There seems to be a major disconnect between ITIL V3 Incident and Problem Management.

Problem Management says in 4.4.5.1 that problem may be detected by any of

"...immediately obvious at the outset that an incident ... has been caused by a major problem...

Analysis of an incident by a technical support group which reveals that an underlying problem exists...

Automated detection...automatically raise an incident ...reveal the need for a Problem Record"

...amongst other sources.

All perfectly sensible. So where - anywhere - in description of Incident Management in 4.2.5 does it say to spawn a problem record?

The only hint I can find is under Incident Closure 4.2.5.9

"...[If] it is likely that the incident could recur... raise a Problem Record in all such cases..."

All incidents seem to magically get resolved without recourse to a Problem. The Problem happens afterwards if we think another Incident might happen.

Figure 4.2 flowchart shows NO link between Incident and Problem Management (though Figure 4.4 does show Incident as an input to the Problem flowchart)

In 4.2.6 (interfaces) all it says about Problem Management is

"Incident management forms part of the overall process of dealing with problems...incidents are often caused by underlying problems which must be solved to prevent the incident from recurring..."

So Problem Management says it can be invoked any time in an incident but Incident Management says Problems happen after the Incident is resolved, only if we think it might happen again.

[Update: see We should create the problem record right up front in an incident]

Published in The Skeptical Informer, September 2009, Volume 3, No. 7

Previous story: Egg on face from the ITIL Refresh Refresh: we can do better at reviewing bodies of work
Next story: New LinkedIn group for Owning ITIL

Comments

Submitted by skeptic on Mon, 2009-09-07 09:08.

How do we miss these things?

I don't know how many times I've read in and around those two processes in that book. So too did a huge number of contributors and reviewers. Someone started building process flowcharts before the books were published. NOBODY spotted a hole like this?

Submitted by Charles T. Betz on Tue, 2009-09-08 02:18.

This is what data analysts do

Incident vs. Problem from a data perspective have always troubled me. As concepts, they are at the least tightly coupled, with probably a common conceptual parent.

Think about such matters is what the good folk who belong to the Data Management Association, go to Wilshire conferences, read The Data Administration Newsletter (tdan.com) and hang out over on dm-discuss do. I can name any number of folks who, had they been involved, this would not have happened. They would have had their logical data model close to hand and the manuscript would not have been released unless consistent with a well-formed model.

ITIL may have been written by capable professional practitioners. But it was NOT written by professional analysts/architects/systems thinkers. And, as I have said ever since I ran into ITIL 6 years ago, data architects in particular seem to have been far, far away.

Perhaps for the next version of ITIL the OGC will hire someone with credentials in this area.

See

www.dama.org
www.tdan.com
www.wilshireconferences.com
http://groups.yahoo.com/group/dm-discuss/summary

FFI.

Charles T. Betz
http://www.erp4it.com

Submitted by skeptic on Tue, 2009-09-08 04:57.

formal analysts

i agree about the obvious need for formal analysts but I don't see the data perspective would have helped ehere. Problem and Incident entities can be modelled to perfection, and a relationship in place between them, but it does not inform use cases and process flows. The relationship can be "spawned from" but that doesn't tell us WHEN.

Submitted by Charles T. Betz on Tue, 2009-09-08 12:25.

CRUD

A formal CRUD (Create, Read, Update, Delete) matrix analysis would have caught this. You're right in that it would call for an analyst familiar with both process and data. But in my experience data people understand process more than process people understand data.

Too many structured/IE artifacts like CRUD matrices were thrown out with the bathwater when OO/Agile came along. That kind of traceability is desperately needed at the higher, architectural levels.

Charles T. Betz
http://www.erp4it.com

Submitted by skeptic on Tue, 2009-09-08 05:21.

ITIL V2 Incident link to Problem

I got worried so I went back to ITIL V2 Service Support. Sure enough

5.6.2 Classification and Initial Support

"informing Problem Management of the existence of new Problems and of unmatched or multiple Incidents"

See also Annex 5D, p93, the incident process flow diagram with a branch at Diagnosis to Problem Management (even if the Y and N on the decision branch are back to front)

Still not crisp (no clear task to spawn a Problem in 5.6.3 Investigation and Diagnosis) but much clearer than V3

Submitted by mbuzina on Tue, 2009-09-08 11:33.

Probably more interprocess links missing

Hi Rob,

I guess in this cause you are right to require an activity to create a problem record. I myself am guilty of omitting a direct step myself in most of my designed processes. But I usually add this as an additional activity to the list of things to do in incident management.

I believe that there is more to a process than a "swimlane diagram & some roles", so I routinly add additional activities which are not necessarily connected to the normal process flow (see http://buzina.wordpress.com/2009/02/27/itsm-series-1-incident-management/).

But you are right, this is so important that it should be visible in the flow chart. I did exactly that in a recent project setting up IPC proccesses for a small service unit within a large IT organization (yes, strange that they should have seperate processes, but in this case it made at least some sense).

Advocatus Diaboli: Why should you mention that incident management should create problem records? This is a task for everyone, that if a problem occurs, it should be logged (as in the description of problem management).

Submitted by skeptic on Tue, 2009-09-08 19:52.

such a fundamental pair of processes

It is true that Problem can be logged any time (I have a favourite spiel about "Incident matching is everyone's duty"). But Incident Management mentions it at the end, at the wash-up, but not in earlier steps. My experience and understanding is that the primary point that Incident Management will identify a Problem is either at initial matching or later diagnosis. I now feel really uncomfortable that there is something I don't understand about such a fundamental pair of processes. If the books make ME feel that way after years of immersion, what do they do to a beginner?

Submitted by Charles T. Betz on Tue, 2009-09-08 20:41.

process vs. capability

"Why should you mention that incident management should create problem records? This is a task for everyone, that if a problem occurs, it should be logged (as in the description of problem management)."

The above statement confuses the process with the capability. The Incident Management capability is distinct from the end to end process of Resolve Incident. The data entity known as Problem should be originated in relatively few well understood, end to end processes.

The capability is responsible for things like defining and platforming the process, measuring & governing it, & improving it. Like their comrades in Change Management, the one thing they don't do is enter every last Problem record.

Process vs. function redux...

Charles T. Betz
http://www.erp4it.com

Submitted by JamesFinister on Tue, 2009-09-08 21:43.

Tension

I genuinely find myself torn in two here. One part of me says that rigorous analytic approach would avoid a lot of the current issues in v3, another part of me feels both that we are not yet mature enough to correctly carry out the analysis, and that if we did we would would end up with a sterile solution.

The simple yardstick I used to apply was that anyone, at any time, could say "I think this might be evidence of a problem" but only the problem manager could say "We will allocate resource to diagnosis and treatment"

Submitted by Tim Malone (not verified) on Tue, 2009-09-08 22:43.

Agreed

To make it work practically though, perhaps restrict the ability to create problem records to anyone from the specialist resolution groups (2nd line and above) It and the team leads on the Service Desk?

This at least helps to reduce the "not my problem" attitude and lack of communication between the different lines of support.

Submitted by mbuzina on Wed, 2009-09-09 12:03.

Devils Advocate

Hi Charles,

The sentence was stated to be from the devils advocate, so I do not believe this to be a good answer to the original question of : "Why is there no better link between Incidnet & Problem management".

But I still don't get your answer. I understand the following:
* Logging a problem is one of the (many) capabilities that "participants" in incident management have (I write participants, since this capability is not granted to a process, but to roles or functions).
* At the same this capability (so probably the "role") is responsible for defining, measuring, etc., of the process but they are not the sole originators of their records

Both of these statements are true in my personal view.

But that proclem records should originate only from a few well understood end-to-end processes does not get my seal of approval. Problems arise anywhere and anyone should be able to log one (step 1). And yes (as JamesFinister points out), it is the responsability (in step 2) of the problem manager to assign resources and to identify inconsistent problem records (double records, unclear data, etc.) and if the inspection is done (yes, this is a problem) the problem (error) record should be more visible.

This difference may only result of my wider understanding of the problem records (if your problem record starts to exist in my step 2 and the first record would be something like a RfP (Request for Problem, just joking, Request for Process, lousy name I know it intersects with Request for Proposal) which will alert the problem manager of a possible problem, then we only have a naming inconsistency.

Greetings,
Marc

http://buzina.wordpress.com

Submitted by Charles T. Betz on Wed, 2009-09-09 13:07.

It's all in the semantics

The idea that a given entity should only be created by relatively few processes is well established in the schools of EA thought I follow (see Spewak). I think our disconnect is that the Resolve Incident (or Resolve Problem) process may be *invoked* from anywhere, but once it starts it is a single process. Sometimes there is no antecedent process worth noting. Other times, we will be able to note that (e.g.) the Problem arose from a Change (perhaps via Incident).

"Logging a problem" is *not* a capability in my vernacular. It is a step in a flow. A capability is a larger grained collection of resources and practices that enables achievement of objectives. Aka Function.

Roles may or may not be "owned" by a capability, since "all the world's a stage." Problem manager might be a role sourced from the Incident/Problem Management capability. But Problem Reporter (or whatever you may call it) may live anywhere; similarly, resources dispatched to resolve the problem are sourced broadly.

I *do* think we are able to rigorously analyze this stuff and come up with generative insights. It's not sterile.

Charles T. Betz
http://www.erp4it.com

Submitted by DavidM on Wed, 2009-09-09 00:06.

Not sure there's a "problem" :-)

Sorry, hard to resist. I suspect the documentation could be clearer, but I think I understand the overall intent.

The focus for Incident Management is the rapid restoration of service -- not fixing or resolving underlying cause.

At the 100K meter level (besides being close to low orbit :-)), Problem Management is about identifying incident cause, and taking appropriate actions (either fix, generate a workaround, and/or RFC as appropriate).

Section 4.2.4.3 Major Incidents includes this: "If the cause of the incident needs to be investigated at the same time, then the Problem Manager would be involved as well but the Incident Manager must ensure that service restoration and underlying cause (investigation) are kept separate." (parenthetical mine -- missing the entire section)

The way I interpret... PM follows IM on the reactive side (and also has a proactive component, but that isn't the subject :-)). PM takes input from IM records, and directly from IM if there is a major incident. So, PM sets it's own agenda and schedule using IM records in the absence of a major incident.

Clearly, some "cause" determination requires more "incident" information than others, so I believe this is OK.

SLA terms should cover some of this: if a workaround can't be developed at the IM level within X period, then escalate to PM. Escalation terms and conditions are clearly part of SLM negotiations.

Could this be documented in the Service Operation book? Sure. Does it have to be? I'm not sure. Probably depends on how "cookbook-ish" we think ITIL should be.

Bottom line: Without diving into this in more detail (something I won't have time to do until next week), I'm OK with this view, for now.

David

Submitted by skeptic on Wed, 2009-09-09 05:24.

a one-ended pipe

David (and James - this applies to your comment too)

I don't think I'm being over analytical. I don't want to see every possible link to every other process in every conceivable circumstance.

This is not a trivial link. If there is one profound insight that ITIL offers in this area, it is the separation of incidents from problems. Over and over I have that discussion with organisations who knew somethign wasn't quite right but couldn't pin it down until introduced to this concept. Every time a light goes on. I'm in the midst of doing it right now for a client. (That's why i was re-reading the book, getting the "holy word" on the subject).

if you are going to separate incidents from problems, it is essential and fundamental to describe how they connect. Vague allusions and inferences don't cut it. I can live with "some assembly required" but if one already has to be an expert in ITIL in order to decode the books then what the **** are they for? Are they like the sacred texts of certain religions that people are supposed to read only under the advice and guidance of a priest? or are they intended to actually be useful to the public?

This is a profound and basic linkage that is only described indirectly and badly under Incident Management, and in fact worse than in ITIL V2.

The comment I made on Twitter is that it is a one-ended pipe. Problems are coming out in Problem Management but they aren't going in at Incident Management. Misleading and confusing.

Submitted by aroos on Wed, 2009-09-09 06:53.

Fatal flaw

I agree that the incident - problem concept is a profound insight of original ITIL. V3 tinkers with this important insight in several ways.

1) The concept of incident has been changed and the ORIGINAL V3 definition in the book makes no sense. Remember that a V2 incident included any event that was not normal and might cause a reduction in quality. The abbreviated version of V3 incident used in training and subsequent books requires that an incident is a real failure. This leaves a lot of grey area where a customer report is not an incident or a service request. See http://www.itskeptic.org/itil-v3-incident-definition-camels-and-committees

2) The separation of problem control and error control is lost while it is very important. A lot of known errors are not fixed and should be managed.

3) Proactive problem management is a key activity and it is missing from V3. See http://www.itskeptic.org/proactive-problem-management-description-does-not-

4) The problem management process graph does not make sense, see http://www.itskeptic.org/node/1345

What I have seen in practise is that Problem Management is a difficult process to master. We have heard that David Cannon interviewed a lot of companies to gather best practices on this area, http://www.itskeptic.org/node/419#comment-5421. I have done the same thing and have observed that nearly all companies I have interviewed seem to struggle with the concepts and practices of problem management. Many think that incidents become problems as soon as they have been transferred to 2nd level. It looks like the authors of SO book may have gathered bad practices from the field and inserted them in the book.

In my opinion this is a fatal flaw in V3. If you have not studied V2 Problem Management then V3 makes no sense. The only way anyone can understand the incident - problem separation is using V2 concepts. I suppose most of ITIL trainers have learned the original concept and are using it but in fact V3 Service Operation book is worthless.

Submitted by JamesFinister on Wed, 2009-09-09 08:12.

Spot on

I think that sums up my views and experience exactly as well. Getting your head around problem management is a key indicator of ITSM maturity. As Rob said earlier, getting the difference is what my Dutch colleagues would refer to as an "eye-opener". The pragmatic question is what documentation/guidance will help organisations get that difference.

Submitted by Charles T. Betz on Wed, 2009-09-09 13:12.

Binary, or continuum?

Is it possible that Incident and Problem are merely two points on a continuum? The continuum would consist of some synthetic scale based on fundamental variables of business impact, assessed likelihood of recurrence, etc... response would scale proportionally...

Charles T. Betz
http://www.erp4it.com

Submitted by JamesFinister on Wed, 2009-09-09 16:21.

Worth consideration

And CSI would be part of that as well, but do we have a model for defining such a continuum? Does it, possibly and don't quote me, fit on to some kind of life cycle model?

James

Submitted by skeptic on Wed, 2009-09-09 20:15.

Entanglement

Nope, don't buy it. Entanglement of incident and problem as variants of the same thing is a huge problem that muddies roles, processes and most of all degrades quality of service to the user. That separation is one of the few genuine BIG ideas of ITIL

Submitted by Charles T. Betz on Thu, 2009-09-10 01:14.

But they do share a common supertype

There's no question in my mind that incident and problem share at least one common supertype. What to call it might be controversial. At most abstract, they can be seen as subtypes of Activity.

This does not prevent the real-world benefits of separation cited.

Charles T. Betz
http://www.erp4it.com

Submitted by skeptic on Thu, 2009-09-10 09:46.

common supertype

philosophically speaking, don't all things share a common supertype?

Submitted by mbuzina on Thu, 2009-09-10 12:27.

You just invented the "OBJECT"

Hi Rob,

Congrats, you just invented the object class.

As to Charles comment: To me they do not need a common supertype, but both have a similar relationship to the "issue" type. The relationship Incident <-> Issue means "Incident" is the activity to quickly provide a workaround for "Issue" while "Problem" is the activity to finally resolve the "Issue".

Even this is a bit muddy, since Problem contains an object called "Known Error", which may again be a subtype of "Issue".

Incident <- Activity
Problem <- Activity
Change <- Activity

Known Error <- Issue
Known Error <(causes)> Issue

Incident <(works around)> Issue
Problem <(identifies)> Known Error (given Issues as Input, basically the as(Known Error) operation)
Problem <(resolves)> Known Error

No way to do proper modelling in Blog Comments ;-)

Submitted by DavidM on Thu, 2009-09-10 14:36.

Exercise for the reader ???

As I've mentioned in other posts here, my only religion in the tech arena is client outcomes and success. I also admit to not being big on dogma.

I didn't think you were being overly analytical, I merely suggested I could live with it.

re: separation of Incident and Problem. The SO volume is quite clear on that score (from the section on Major Incidents):

A separate procedure, with shorter timescales and greater urgency, must be used for ‘major’ incidents. A definition of what constitutes a major incident must be agreed and ideally mapped on to the overall incident prioritization system – such that they will be dealt with through the major incident process.

People sometimes use loose terminology and/or confuse a major incident with a problem. In reality, an incident remains an incident forever – it may grow in impact or priority to become a major incident, but an incident never ‘becomes’ a problem. A problem is the underlying cause of one or more incidents and remains a separate entity always!

The connection part is the formal suggestion that IF there is a Major Incident, defined as: The highest Category of Impact for an Incident. A Major Incident results in significant disruption to the Business, then both the Incident and Problem Manager gets involved immediately.

The implication being that except for the case of a major incident (where the cause has to "be investigated at the same time"), that incident and problem management processes are on separate schedules and potentially priorities (given that prioritization is also part of the PM process).

Problem Management process flow clearly takes input from incident management (and several other sources (see figure 4.4 -- the text says reactive, but the caption doesn't, that's another issue). So this means that for both normal and major incidents, a link is defined. It might not be with specific language that says "this is how you do it" and I'm OK with that in the context of a, "descriptive framework." In fact, with some clients, if the HowTo were part of the text, it might cause more harm than good, because they have something that works and if the "ITIL Way" was different....

I guess what I'm saying is this: For ITIL to be descriptive, some level of interpretation is required. This means that in some areas, the way to do that (stay descriptive) is to do something I hated, both from professors and texts: Get as close to "command" as possible without crossing the line and then, "The rest is left as an exercise for the reader." Maybe it's taken me this long to understand WHY the authors and professors (really the same people, just different roles :-)) took that approach. I didn't like it then, and not thrilled with it now, but at least I think I understand it.

David

Submitted by skeptic on Thu, 2009-09-10 19:37.

they forgot

Sorry, still not buying it. Digging for inferences in the Major Incident process is not how a book should be used to explain the key relationship between two key processes. the book should clearly explain all the points within Incident where a Problem can be spawned, and explain how and under what circumstances that happens. It doesn't. C'mon! they forgot!

Submitted by DavidM on Fri, 2009-09-11 04:00.

Author style???

OK, so the real issue appears to be that you don't like the way they wrote the text and not that something is or isn't linked -- at least that's the way I read your latest. If that isn't what you meant, never mind what follows. :-)

Given that the text says there is confusion between a major incident and a problem, I might have included it with the information in Major Incident, too. As an author I'd want to make absolutely certain that there was NO possible way to confuse problem with the most likely candidate for confusion, the major incident.

David

Submitted by DavidM on Wed, 2009-09-23 04:18.

Another approach

Sorry that it took me this long to be able to articulate the issues in this way.

Both Incident Management and Problem Management are processes. Processes have 4 common characteristics: measureable, done on behalf of a customer/stakeholder, produces specific out, and has inputs/trigger.

Inputs to problem management include incident records, Event Management output, Access Management and more. So, the linkage is there as inputs to the process. I get the impression that there is the possibility that some of the argument for the type of linkage suggested would be necessary if Problem Management was a Function.

Spend some time earlier today talking with Dave Cannon about some of this (notion of function v process) and he concurred. He also suggested that one of the areas that should be a concern is the explicit absence of proactive problem management -- coming in the revision.

David

Submitted by skeptic on Wed, 2009-09-23 04:43.

One of the outputs of incident process is a Problem

The Mandate for Change explicitly says "no new concepts" but evidently that doesn't include missing concepts like proactive problem management. Wonder if they'll put error control back while they're at it?

Strictly speaking and never mind what ITIL says, Problem Management is a function. Problem Resolution is a process. It is a bit late for ITIL to start getting exact about the word "process" :)

likewise Incident Management, but whatever. One of the outputs of Incident Resolution process is a Problem. At what points in the process can this output occur? That is the question not properly answered by the book that started this whole discussion and it seems an important one.

Submitted by DavidM on Wed, 2009-09-23 12:42.

Output of Incident Management is resolution

We may have to agree to disagree. If an incident and problem are separate incidents, and remain so always, then 1 of the outputs of incident managemnet is not, cannot be a problem. To do so would create a linkage that should not exist.

One of the outputs of incident review may indicate that this is (or likely to be) a recurring incident, and if so create a problem record for investigation. But this is done in conjuntion with Problem Management.

Incident records are in input to problem management (along with a lot more including the Service Desk, Supplier Proactive Problem Management, etc).

This is one of the challenges I had with V2 that V3 resolves. Lifecycle arranges the processes in an ordering or sequence. So, could the wording be improved? Sure. Is there an issue regarding "linkage" between incident and problem? Not the way I view it.

SO Section 4.4.5 mentions Proactive Problem Management and it's also mentioned in Figure 4.4, but the detailed section isn't there, so it's not a new concept. Adding error control would be (and I don't think it's necessary, but that can be a topic for another thread or post :-))

David

Submitted by DavidM on Thu, 2009-09-24 03:13.

Strictly Speaking it's really about ITSM

I think part of the problem is that we're attempting to be much too literal. The real subject is less ITIL and more specifically ITSM. In other words, it's not about "doing" ITIL as much as it is USING ITIL to achieve ITSM. It's guidance, not "writ" (holy or otherwise :-)).

Incident records are just one source of input to Problem Management. Incident and Problem Management aren't sequential with Incident Management throwing something over the wall to Problem Management (and the team that will work the reactive part of the problem management process waiting). In reality it's asynchronous operation between the two processes (Incident and Problem Management).

David

Submitted by skeptic on Thu, 2009-09-24 03:35.

deficiency

No matter how you frame it, there is a process for resolving problems. that process has a number of triggers, (including triggers from the separate proactive problem discovery process).

one input is a trigger from the incident process which passes incident information.

there is an incident resolution process whose objective is to restore service. One secondary output from that process is a trigger (and information) to initiate the problem resolution process.

That trigger may be secondary to the incident resolution but it is still an extremely important linkage. outside of actual servioce restoration it is the second-most important thing the incident process does.

And yet the documentation of the incident process mentions it in only one of several places where it can happen.

and that doesn't strike you as a deficiency in the documentation?

Submitted by JamesFinister on Wed, 2009-09-23 06:13.

Processs and Function

I thought it was said somewhere in V2, if not V3, that there is a distinction between problem management qua a process and problem management qua a function? I certainly remember teaching in V1 days that when a problem manager is involved in resolving major incidents they are not doing problem management.

I'll have to go and look it up.

Submitted by aroos on Wed, 2009-09-23 06:36.

Both a function abd a process

I think one could have a Problem Desk and a problem management process. Proactive problem management should be an activity of the desk. In real life I have seen Service Desk managers and IM process managers doing the proactive problem management.

Aale

Submitted by skeptic on Wed, 2009-09-23 11:32.

"management" is not a process

"management" is not a process. Maybe it is an sometimes an activity rather than a function but it is never a process. Managers do not execute repeatable defined transactions with inputs and outputs - at least good ones don't. They own, manage, measure and improve. I was overjoyed when they called it Request Fulfilment.

Incident resolution is a process. So too is problem resolution. And change review. Maybe so is Availability Planning
JvB! It's late, help me out here - you've done this analysis so well...

Submitted by jvbon on Wed, 2009-09-23 11:55.

the biggest flaw in itil.....

... is the misconception of process and function. If you want to learn more, please read the article I wrote on this issue in the "IT Service Management - Global Best Practice" guide, volume I.
The result of this flaw is that ITIL cannot be implemented, which is a generally accepted fact. Instead, ITIL can be perceived as a reference model, with useful guidance - but you need an implementation framework (not a model!) to make it work. Currently, and afaik, there is only one implementation framework available in the market.
To answer your question: Availability planning is not a process in terms of incident/change/problem management being processes. It is a highly polluted term that is already covered by some of the core processes (in casu incident and problem....). You can even entirely lose the term if you have covered your basic processes.
A second big flaw in ITIL is that it has an inconsistent terminology, causing many semantic problems. Cause of this is the general lack of architecture in the model. This can all be solved, but it requires that you adapt a good architecture, rebuild your framework, and as a consequence you change some of the guidance of ITIL. I've read many posts in this blog environment in the latest months that clearly illustrate how hopelessly lost many people are when they stick to the ITIL terminology.

Submitted by aroos on Wed, 2009-09-23 14:48.

Good article

Jan's is good, read it.

Aale

Submitted by supportthought on Fri, 2009-09-11 14:34.

Book of results

I am no expert on most things discussed in this comment thread but as someone who has built a few Service management organisations and assisted in the adoption of Service management best practices I will say this:

Why on earth would the ITIL V3 book infer connection between two of the most important processes in the entire framework? Defining the procedures, expectations, etc. for Incident<>Problem interaction is not trivial and can have the greatest impact on overall service improvement, surely it would behoove the ITIL books to explain how they feel this should be done ideally.

There are plenty of other areas of the book that i may dis-agree with but some guidance is provided, I would think it's a given that we pick and choose what to use and what to ignore, here we have no choice.

ITIL books are not a method but rather a goal, in this case they provide neither, which isn't very helpful.

On the other hand...ITIL has so many of these types of issues I have shrugged off thinking about them anymore, i'm glad it's Skep's cross to bare not mine :)

Submitted by aroos on Sat, 2009-09-12 06:22.

Very good point

The Problem - Incident interaction is difficult and we definitely need a best practice for this area. My feeling is a lot of people are not even aware of the difficulty.

Aale

Submitted by david.stucky on Fri, 2009-09-11 18:03.

Problem Identification

I fully agree that the guidance is less than clear on this point. But, it's not complicated. Just a little be of reasoning and awareness of generic process structure pretty clearly tells us that the first and probably most helpful point at which problems are ID'd comes during the Initial Diagnosis step of Incident Management.

In short, this step in Incident Management provides the best opportunity to prod Problem Management into action...

You can read more about that here.

Submitted by skeptic on Fri, 2009-09-25 00:53.

obvious but essential

Here's another piece of Incident Response that seems to me obvious but essential to describe if ITIL is actually guidance. it's from the OGC Change Log, issue 178:

In 4.4.5.9 it is suggested that related Incident records that are still Open should now be closed. However, the situation of related CLOSED incidents also needs to be considered - these may need to be revisited, and perhaps re-opened. This is particularly the case where problem resolution overcomes the need for users to resort to a sub-optimal work-around. (This might in some cases be handled by a message broadcast on the Intranet.

Submitted by Atul Kherde (not verified) on Thu, 2009-10-08 12:39.

Orientation of IM / PM.

My own humble two cent thoughts:

Incident Management: should be more concerned about 'getting it rolling first'. In other words, IM should help the user to do what they want to do, and keep the focus on resolving the user's difficulty. 'Repairing the infrastructure' can be a secondary objective. e.g. If a user is not getting printouts, then IM should FIRST the user to get a printed copy - even if by a workaround of redirecting to another printer, and SUBSEQUENTLY try to fix whatever is wrong with the errant printer - if possible. The incident was about "not being able to get a printout", so the moment the user gets the printout, the incident is closed.
In short, IM is an outward looking process, which is primarily tasked with helping users for higher productivity. IM might also provide information to PM regarding any problems, etc. but that is secondary.

Problem Management: On the other hand, the PM process is concerned with keeping the infrastructure healthy by identifying and resolving problems lurking in the infrastructure. The PM process scans through the incident database to look for trends, frequency (repeating incidents) and high impacts to identify problems. In the above example, if the printer repeatedly fails, it will be captured in the frequency analysis and someone will start thinking, "Why is this printer failing again and again?" As the root cause is unknown, it can be logged as a problem for further processing through the Problem Control. The moment the root cause is KNOWN, it doesn't remain a problem anymore, but becomes a KNOWN ERROR, and is handlled by the Error Control processes.
In short, PM is an inward looking process by which the infrastructure stays healthy by identifying and eliminating problems.

ITIL v2/v3 was written by a large number of authors as a collaborative effort. Let us take it in a generic sense, the way it was meant to be, and not a LEGALISTIC sense where every comma and fullstop matters. My personal opinion is that rather than harping and carping about "what the book says", we should capture the generic sense from these books and use it to implement 'sensible' processes in client environments as long as we stay within the generic boundaries of the processes.
Which is why it is said, ITIL has to be adopted and then ADAPTED. If everything was supposed to go by the book (like aircraft maintenance procedures) then there was ABSOLUTELY no need and scope for ADAPTATION.
Another important perspective: Many people might object that my response is more v2 than v3. So, whattt? v2 and v3 are not black and white..! Personally to me, as long as the purpose is served, any discussion about whether something falls in v2 or v3 seems futile. These are not two different things!

Thanks for reading patiently.

Atul

Submitted by AndyIvey (not verified) on Thu, 2009-10-08 16:18.

IM/PM so you're not working all AM/PM

That’s pretty much how I’ve always understood it. IM puts the fire out (workaround or solution) and PM (if necessary) will clean up the mess, investigate cause, and prevent future fires. A lot of clients want to push the high severity incidents straight to problem for “added visibility.” The thing these environments all have in common is poor event correlation at the service desk. They need a second process (PM) to cover a gap in their incident process.

This reminds me of the difficulty so many ITSM tools have with logging calls and marking an incident as the “lead”. This is where clients that ship high severity incidents to PM have the advantage. They can overcome tool limitations and log every call as an “incident” and then link them all to a single, lead problem.

Submitted by skeptic on Thu, 2009-10-08 21:00.

ITIL to become a religious text

Atul, I think we are all pretty clear on the distinction between Incident and Problem.

I've raised the possibility before that ITIL is just a casual fireside chat not to be taken too seriously. But that's not how it is sold. If ITIL is only broadly applicable in spirit and not necessarily in the detail, then how can we have four-tier certfication and product compliance? No that's a cop-out, sorry. An excuse for a sloppy product.

The last thing we want is for ITIL to become a religious text, comeprehensible only to learned scholars who make a living interpreting it to the lesser mortals. ITSM isn't brain surgery and it isn't mysticism.

ITIL is guidance. the interface between incident and problem is fundamental. it should be described properly. it's not.

Submitted by Hazen (not verified) on Mon, 2009-11-23 02:20.

Well, here's the intent: If

Well, here's the intent:

If this issues happens again it may constitute a problem; ie; services will be impacted as a result of this incident, and we cannot prevent this incident's recurrence therefore problem management is to maintain knowledge of this incident and the scenario in which it occurred to prevent the scenario from arising again. This is seen when dealing with compound issues that have 2 or more root causes.

Submitted by skeptic on Mon, 2009-11-23 03:24.

far more accurate and complete problem reporting

Not necessarily. What you describe is only one manner in which to use problems, and I think it has issues. if you only open problems when there is a potential recurrence, then you are using the incident record to cover some instances of problems and the problem record to cover others. if you only use the problem record to track the instances you describe then I'd say your problem reporting is pretty useless.

I think it is fairly common for a problem to be created from every incident where there is a problem (i.e. an underlying cause to be "fixed") identified or even suspected. This gives far more accurate and complete problem reporting

ITIL's failure to make these considerations clear is a deficiency which will lead to many sites going down the wrong paths. Nice future work for consultants trying to sort it out I guess.

Submitted by RichC (not verified) on Fri, 2010-06-18 14:02.

Misuse of Problem

I know this is quite old, but I had to respond to this last post. We were taught explicitly that you do *NOT* want to create a Problem for every Incident. Every Problem was created due to an Incident, but not every Incident has a Problem spawned as a result. I can't tell you how many times that was repeated.

For example, an application failing because of an issue with a server may only generate an incident. That organization may have plans to deal with that failure, execute them and restore service. There is no underlying "Problem". Another organization could have something similar happen, but identify that the outage experienced is not acceptable and that an underlying Problem is that there's not sufficient redundancy built into their design. Changing their architecture could eliminate that service outage. Another organization could identify that, after service was restored, the root cause was a bad driver on a system and that they use that same driver on 500 other systems. That would then spawn a Problem.

Problem Management's job is to remove the cause of Incidents. It's not to "fix" the specific singular cause of one incident. A failed hard drive is not something that Problem Management would deal with. A cracked screen on someone's laptop because they dropped it is not something that a Problem would be generated for.

Submitted by skeptic on Fri, 2010-06-18 20:56.

problem records are incomplete

"create a Problem for every Incident" is NOT what i said, or at least not what i meant. Sorry if you inferred it that way. I'd have thought it is pretty clear that i understand "Problem Management's job is to remove the cause of Incidents. It's not to fix the specific singular cause of one incident." I certainly didn't say that. Are you confusing problem management with problem records.

If an incident was resolved without invoking the problem team, but it was caused by an underlying problem now resolved, however you define a problem, then I think it should either be linked to an existing problem record that caused it and that problem closed, or a new problem record should be opened and then closed recording how the problem was resolved. If you choose to regard a failed hard drive (I'm assuming you mean on a desktop/laptop) as not a problem then that is fine so long as the scope of the definition of "problem" is well understood and consistent.

A better example would be a failed network or storage component that causes an incident, or several incidents all related to a master incident. Level 2 support identify the cause, swap the device, and notify the Service Desk to contact the users and close the incident(s). There was a problem but no problem record. A week later it happens again but this time there is no spare (guess why). They open a Problem record and make whatever workaround they can until the spare arrives. In order to report on how often that device is failing, or to recognise a pattern of failure, you now need to report across both incident and problem records: the problem data on its own has little meaning.

If you don't open a problem record every time there is an occurrence of whatever-you-define-to-be-a-problem then your problem records are incomplete. i don't think problem records exist solely as work tickets for the problem team - I see them as information for other processes, including CSI.