Separation of incident and call

Submitted by skeptic on Sat, 2009-09-12 19:36

Share this post with

From time to time, a consultant is in the position of explaining and justifying fundamentals. Recently I was describing how incidents are not the same thing as calls, that every call is not a new incident if the same user has already called about the same incident previously, that it is more effective to record the call history on the same incident. I went to three sources of "best practice" for support - there isn't any.

ITIL V3, ISO20000-1 and ISO20000-2, COBIT 4.1 ... nada.

The nearest I could find was COBIT DS 8.2: "Establish a function and system to allow logging and tracking of calls, incidents, service requests and information needs." with the implication that these are different entities.

Why is that? Is this not a fundamental good, best or generally-accepted practice?

Published in The Skeptical Informer, September 2009, Volume 3, No. 7

Previous story: Two ITIL-derived rules of thumb for organising the IT department
Next story: has there been any change to the ITIL V3 books?

Comments

Submitted by mbuzina on Thu, 2009-09-17 21:06.

Where does it say Incident = Call???

Hi skep, I have not read the whole mile long discussion here, but I must say I can compare your question to someone looking for an explicit proof that an ice cream is not a car.

No documentation I have read (Cobit, ISO 20K, ITIL 1, 2 & 3) talks much about calls (the DS 8.2 is the closest). So why should an incident be a call? Why do people just assume an incident to be something just because some vendors have named the records in their system calls (requests? tickets? anything else anyone?). Why do you have to explain to your client that apples and grapes are not same and then you have to find best practice to prove it? Just go to chapter 12.4.2.1 in CS (common sense) and you may find it.

Who ever said that ITIL, COBIT or anything else contains a model of the objects involved in ITSM? I know it is time for one, but still is none. If we get to a single (object / data) model, things may become easier, but the vendors will be comparable and they won't like that at all.

Regards,
Marc - http://buzina.wordpress.com

Submitted by skeptic on Thu, 2009-09-17 21:51.

We take our fundamental knowledge for granted and so did the ITI

thanks, we're getting to a very fundamental question here. So fundamental I've moved it up to a blog post

Submitted by truthbetold (not verified) on Sun, 2009-09-13 21:48.

Incident vs call

ITIL v3

Incident = Service Outage

One Incident per outage (not call)

Also in v3, Service Request calls are no longer need to clog the incident mgt process, (outage) metrics, etc.

Hence, after v2, a "ticket" is no longer always an incident either, but may be incid, SR, or new requirement (RFC)...does this help?

Still may behelpful to coordinate service requests (Serv Cat) stuff with Serv Desk/Incident flow, work, etc.

e.g., you want to know if a user is just ordering (via an SR) a new printer because they are sick of "printer outages" from ink cartridges forever being low, default paper size setting errors, etc...i.e., user needs training in this case to not give up on the incident mgt process too soon...

Make sense?

Submitted by skeptic on Sun, 2009-09-13 22:26.

much of ITSM is passed along by folksong

Thanks, I've got a reasonable grasp of the theory and principles of ITSM.

What I'm looking for is an explicit statement in the books somewhere that calls are different entities to incidents, that when someone calls to say "what the *** are you doing about my service problem?" that this is a note on the history of the existing incident record, not a new incident immediately resolved.

We all know it but it's not official best practice apparently

George Spafford said on Twitter "It's also indicative of how much of ITSM is passed along by folksong and is not formally codified" which is powerfully true!

Submitted by DavidM on Mon, 2009-09-14 23:25.

The dilemma

I find myself asking this question: If ITIL is supposed to be a descriptive framework, how much explicit 'how to' detail is sufficient and at what point does the amount of detail cross the line a make ITIL either quasi-proscriptive or fully so?

A review is explicit to close every incident (and problem). Does it make a difference if two different calls lead to two tickets for whatever the reason (they occur at the same time, so there isn't a record, or the description by supplied by the two callers makes it appear that the two "incidents" are the same). The review would consolidate the calls into a single incident -- and over time that would help the Service Desk and Incident Management do a better job in the future and that IS explicitly found in ITIL V3.

Please understand, I agree that there isn't anything specific about tikets, or whether or not the two calls should be (or shouldn't be) the different records/aspects of the same incident. But, as noted above, does there have to be? What are the bounds on descriptive versus proscriptive.

David

PS Rob, this isn't for you: An outage is NOT an incident (and vice versa); an incident IS the unplanned interruption or degradation in performance of an IT service -- and that is explicit.

-d-

Submitted by vinodka on Tue, 2009-09-15 03:17.

Outage vs Incident

Hi David,

A clarification: When you are saying Outage is NOT an incident (and vice versa), are you including planned outage also into that scope? If not, it IS an incident. correct?
So, to clarify a little further, can't we say an unplanned outage (full or partial) is an Incident? When I say full or partial, I mean interruption or reduction in quality.
I would think so- since in the definition of Incident (as per V3), ITIL says: a failure of a CI that has not impacted a service IS ALSO an incident.
I feel that definition removes any key, apparent difference that can exist between an "unplanned outage" and an "Incident".

Vinod

Submitted by DavidM on Tue, 2009-09-15 04:56.

Incident defined

Hi Vinod,

The formal definition of an incident, according to ITIL V3 is:

(Service Operation) An unplanned interruption to an IT Service or reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet affected Service is also an Incident. For example Failure of one disk from a mirror set.

The second and third sentences get confusing because it's based on Event Management and the definition of an event:

(Service Operation) A change of state that has significance for the management of a Configuration Item or IT Service.

So it's really up to the organization to determine when the failure has significance... but that's something we can parse another day. :-)

An outage that occurs when the terms of the SLA say the service isn't guaranteed to be available isn't an incident -- it's NOT an unplanned interruption.

Further note that the definition is broader than mere "outage" it also says that degraded performance is also an incident. This specifically addresses the notion of service warranty expressed in terms of availability, capacity, continuity, and security.

Not to split hairs, though that's what we're doing... :-) degraded performance (below SLA targeted levels -- during the time the service is supposed to be available) is not an outage, but it is still consider to be an incident.

So, the best thing to do is not equate term outage with incident. There is a specific definition for an incident. We get into difficulties when someone uses what they think is a synonym for a specific term or definition, thereby making an assumption about the conveyed meaning.

An incident in ITIL is specific (definition above). It is not defined in terms of outage -- but an unplanned interruption or...

One of the purposes for ITIL V3 was to clean up the language and definitions (forget for a moment that is one of the things they want to clear up with the next edition of V3 :-)). When we try to invent out own terms for things specifically in the ITIL glossary, we miss the point.

Does this help?

David

Submitted by aroos on Tue, 2009-09-15 05:38.

Remember the full definition

We have had this discussion earlier but the fact is that all subsequent V3 publications have cut the third sentence out of the incident definition statement. It changes the content of the term.

This the way V3 defines incident: An unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a configuration item that has not yet impacted service is also an incident, for example failure of one disk from a mirror set. Incident Management is the process for dealing with all incidents; this can include failures, questions or queries reported by the users (usually via a telephone call to the Service Desk), by technical staff, or automatically detected and reported by event monitoring tools.

If you leave the last sentence away, you get a clearer definition but what are then those calls where the user just thinks that the system is broken?
Aale

Submitted by DavidM on Tue, 2009-09-15 16:36.

Would have been happier with specific mention of Event

I think I understand the intent of the way ITIL V3 defines an Incident -- and that was specifically to include Events and Event Management as part of the proactive process of resolving Events that could lead to incidents before the customer/user is aware of the issue.

However, the current wording stinks. If instead of trying to get cute, if they'd referred explicitly to an "event" things might have been better. For example:

...unplanned interruption or.... An Event that has not caused a visible unplanned interruption or degraded service quality is (or could be -- and leave it up to the organization???) also an incident.

One of the critical aspects of Event Management is knowing what "normal" is so that when something happens that has "significance" it can be identified as deviating from the baseline (aka normal).

Also, I thought (and you've hit part of the problem that I hope will be addressed with the planned revision) is the confusion between "incident" and "Service Request." The queries and questions aren't incidents, that are service requests.

David

Submitted by skeptic on Tue, 2009-09-15 19:57.

half-assed descriptions

As I've blogged eslewhere, there aren't 2 processes (incident and request), there is one process with at least a dozen categories.

ITIL V2: one process: incident, everything else ignored
ITIL V3: two processes: incident, and everything-else

Both are half-assed descriptions of the true situation. maybe ITIL V4 will do it properly

An Incident is a deviation from expected/agreed service.
A Fault is something broken.
An Event can alert us to either or both.
Both can lead to a Problem.
A Problem might generate 20 Incident reports and two Fault reports - perfectly normal.

Clear. Easy to grasp. Reflects reality.

Submitted by DavidM on Wed, 2009-09-16 03:41.

What's reality?

I don't know that "fault is needed if the definition an incident changes to also reference Events.

An unplanned interruption is an incident.
Degraded performance == incident.

Both, as you say deviations from expected or agreed (I'd also add perceived) service levels.

I don't see fault as necessary (in that it's most likely the cause of one of more incidents aka a problem and therefore covered). In fact, I think the addition of the term might be confusing...

... and reality is still covered. :-)

Bottom line: the customer/user doesn't care about the distinction if things don't meet expectations.

re: Request Fulfilment (the process to handle everything else that isn't an incident). Without digging into it in more detail, I can accept it as the catchall for everything else. Note: other than querries for advice (and in some organizations that could also be covered) each service request (i.e., everything that isn't an incident) is also covered by and SLA and subject to review by CSI and SLM with a SIP if necessary.

(Alphabet soup, anyone ?? :-))

David

Submitted by skeptic on Wed, 2009-09-16 09:09.

A fault is not an incident

Don't be such a traditionalist. There's either one kind of request (including requests for restoration of service) or there are lots - there certainly aren't two for any reason other than historical.

read the post I referred to: I believe there is one entity, a ticket or request, with many categories, and there are variations on a core process for each category or type.

Specific to fault: a fault is "most likely the cause" of an incident but not necessarily: that's exactly why they had to invent the clumsy V3 definition - to cover the times when it isn't.

here's the reasoning they used to lump faults in with incidents: "If an event detects a fault, we need to run around being urgent until we fix it. let's see, the process for running around being urgent is the incident process therefore a fault is an incident." Wrong. there are several running-around-being-urgent processes for several categories of ticket or request or whatever we call the master entity.

AND THEY ARE DIFFERENT PROCESSES. We do not respond to a fault and an incident in the same way. For a start a fault has no user, and is not subject to an SLA. Incident prioritisation doesn't work for a fault: remember no service is impacted. Escalation communication matrices are also different: "hello Mister Business Owner, its 2:30 am and we are calling you to tell you that we have an internal fault that hasn't impacted any of your services yet but it is really serious and we think it might". yeah right.

Cmon throw off those mental chains! A fault is not an incident.

Submitted by DavidM on Thu, 2009-09-17 15:11.

Where's the beef? :-)

I said we don't need a "fault" -- and frankly I think it's very confusing in the context of Incident.

An Incident is the unplanned interruption or degradation in performance.

I don't have any problem with the definition, as it is.

An Event is a change in state that has significance...

Again, NO problem with the definition.

My issue is that I would have rather seen them reference Event as part of the rest of the definition of Incident.

As I said, to include "Fault" in the context of Incident is wrong, not needed, confusing... and I think we agree. I know we do about this: A fault isn't an incident!

In fact, if anything I'd relate "fault" closer to problem (since it's likely to cause incidents :-)) than anything else.

BUT: Fault IS defined with "See Error" and Error is defined as (in the Service Operation volume):

A design flaw or malfunction that causes a Failure of one or more Configuration Items or IT Services. A mistake made by a person or a faulty Process that affects a CI or IT Service is also an Error.

So, in the words of Clara Peller: "Where's the beef?" :-)

David

(for those who don't remember or don't know:
http://www.youtube.com/watch?v=Ug75diEyiA0 )

-d-

Submitted by skeptic on Thu, 2009-09-17 19:45.

Perhaps Fault and Error are the same thing

An Event is just something that happened of interest. the vast majority of them do NOT lead to an Incident or Error or Fault or Problem.

An Error is (by that definition) a state of a thing - no action associated with it. there should be a related Problem which involves the action.

A Fault is a type of Response, a work ticket, a job to be acted on just like an Incident or Request is. There is a response process associated with it. It usually leads to a Problem (there may be a few rare scenarios where it does not - where we choose not to fix the Error identified by the Fault report).

Perhaps Fault and Error are the same thing, but I remain convinced Fault and Incident are NOT the same thing: muddying the two Response processes of Fault and Incident together as ITIL V3 did only makes for a very unclear and confusing definition of what an Incident is, and it confuses and complicates the communication, prioritisation and escalation.

Submitted by JamesFinister on Tue, 2009-09-15 19:17.

Event

Don't have the books in front of me - but does an event get resolved as such, or do they just end? I remember an old debate about whether we resolved incidents but removed problems.

Submitted by DavidM on Tue, 2009-09-15 19:53.

Event Management is abnout reporting

Diagnosis is not an activity that is part of Event Management. Notification, detection, filtering, correlation, trigger, response selection, review, closure, yes; diagnosis, no. How the event is to be handled is part of response selection.

Whether it's reported to Incident Management, Problem Management or both or something else happens is part of the response selection activity.

David

Submitted by skeptic on Tue, 2009-09-15 06:46.

Incident as "a deviation from expected service"

It's a mess. My definition of Incident as "a real or perceived deviation from expected service" is nice and clear and crisp. It even covers user's thinking it has deviated when it hasn't: that is an incident that is resolved by explaining that it isn't an incident :)

"An unplanned interruption... or reduction in the quality" is a set of scenarios that is a subset of "a deviation from expected service". A user may see something that deviates from what is agreed in the service description in the Catalogue or in the agreed SLA, but does not fit "An unplanned interruption... or reduction in the quality". How about a planned interruption that went too long, A planned interruption that did not follow the agreed communciation process, An application that is not returning the results expected, a request not fulfilled properly, a service desk response SLA breached...

To introduce "Failure of a configuration item that has not yet impacted service is also an incident" is to completely destroy the generally accepted meaning of incident. No it bloody isn't: as we've discussed previously, that is a problem or a fault or something else but it shouldn't be an incident if there is no detectable impact on services.

Likewise I agree with David, don't equate Outage with Incident: they are two intersecting sets of scenarios. Each includes scenarios that are not in the other: a planned outage, a poor transaction response-time

Submitted by DavidM on Tue, 2009-09-15 20:01.

Ah, Perception!!!??? :-)

Rob,

You've raised a very interesting point. Typically the way things SHOULD work is the various targets are defined in the SLA supporting this service. However, if the performance or availability (etc) don't meet expectations, what's happening? I've been in 1 organization where users thought things should be faster because of the way their management set expectations. Actually performance was well within the negotiated targets in the SLA. However, perception is reality.

After some discussion, I suggested the client generate an incident. The solution didn't involve any changes to the infrastructure. Instead we generated an RFC to add or change training about the service to better (properly) set and manage expectations.

David

Submitted by skeptic on Tue, 2009-09-15 20:08.

inward-facing technology focus

Exactly. That's an incident. See my other comment: incidents and faults are not the same thing but parts of ITIL V3 are still written with an inward-facing technology focus, not an outward-facing service focus

Submitted by Platos dITILectic on Fri, 2009-09-25 14:14.

inward-facing technology focus

Valid point on many levels. For example, ITIL does introduce/get into (3) Service Provider types, but doesn't take this far enough...i.e., one could argue the need for three Service Strategy (SD, etc?) books, one each from the perspective of the three Service Provider types....once ITIL v3 introduces these three, much of the discussion afterwords goes back to generalizing on guidance for the "service provider," and only occasionally making specific correlations back to the specific provider types. (My experience is that good providers handle my calls differently when they know they are in direct competition for my money/business.)

Still, one might find it helpful to examine how to talk about best practice for "call" handling in an outward-facing service focus context from the perspective of different service provider types; or indeed, in the case of CMMI-Svrs, from the perspective of many different types of businesses in the entire service industry...

Or, you could state generally, that if there is a business need (or agreement with the business...i.e., SLA) for how to answer/respond to, etc calls, and not just incidents, faults, events, problems, etc, then the provider should have the processes (and hopefully tools) in place to do so.

But obviously, we get into the old argument about how much is too much prescriptiveness...too specific or trying to cover too many specific situations creates a scope nightmare for any publisher in these areas...and also scares away the larger numbers looking for guidance generally...or those not wanted/willing to wade through the mass of "specific-to-others-but-not-me" scenarios before finding the one diamond that fits their setting...

Submitted by DavidM on Wed, 2009-09-16 03:31.

Semantics -- again :-)

Back to the concept of working and the way things were written (and probably edited).

We're actually in agreement, though I would have used different language to say the same thing. :-)

David

Submitted by vinodka on Tue, 2009-09-15 07:54.

"Best practice definition"

Yeah, I agree with the perspective of both you and David. I used to keep that perspective, explanation and interpretation till V3 came out.
However, my debate was basically from the ITIL V3 definition of Incident.

As a trainer, consultant etc when you work in the Industry, the client organizations (fortunately or unfortunately) has "ITIL Experts" who will go with ITIL definition (and their/their trainer's interpretation of those!).
It next to impossible, for convincing them - because they have a "strong" back up of the "Best practice definition" in ITIL! Many a time, I myself has landed into messy situations for giving logical / practical explanations that might not go "theoretical" as ITIL - to the extent of comments like: 'You dont know ITIL' or 'you are deviating from 'best practice'!' - to the extent of eroding the trust the client has on your "knowledge" or "expertise".

Incident vs Problme debates are always end-less, so on top of that if we add messy definitions of Incident and Problem, it is fun time!

I hope, the new "Editions" address these critical issues with best practice definition/documentation.

Vinod

Submitted by DavidM on Tue, 2009-09-15 19:37.

Experts and everyone elses

I'm also an ITIL Trainer and ITSM consultant (been in the ITSM space almost since before it had a name, but that's another story :-)).

I don't have any problem suggesting the Experts might want to take another look. We spend the time working on what is behind the ITIL definition and also point out how their interpretation might not jibe. I've NEVER gotten the comment that even suggests I don't (a) know what I'm talking about (b) don't know "X" (c) or that I'm deviating from best practice -- in over 25 years of being an independent "ITSM" consultant.

I also have not found incident-problem debates to be endless. I explain it this way:

Everyone is familiar with the difference if we take it out of an IT context. Try this approach: Question, if someone sneezes or coughs, what's the cause, what's the problem? Answer: You don't know if it's allergies, bacterial, or viral without doing some investigation (potentially including tests like blood work). Any doctor will tell you that the symptom is just one potential manifestation of the real problem. So, as the doctor, what do you do? You treat the symptom (incident) in the best way you can, given the information avaiable from the patient until the real cause (problem) comes back after you've done the investigation and tests. Effectively the Doctor is involved in Incident Management, the lab (or other diagnostic tools, like MRI, CAT scan, X-Ray, etc) are part of Problem Management. The two work hand-in-hand to get the patient healthy.

Put another way: In medicine: The symptom is never the cause. Same thing is true in IT, the Incident (symptom) is never the Problem (cause). I also refer them to the quote I posted here from the Service Operation volume that states rather emphatically, that problem and incident are separate entities and remain so, always!

From there it's always been a, "Next," and we're on to the next topic.

David

Submitted by skeptic on Tue, 2009-09-15 20:20.

ITIL analogy

You've prompted me to publish one of the backlog of about 40 blog post ideas I have: a similar analogy for ITIL you might like

Submitted by JamesFinister on Tue, 2009-09-15 11:29.

All too true!

I've been in a classroom with one of the original authors when a delegate couldn't understand he was telling the person who wrote the book that he was wrong about what ITIL meant

James Finister
Wolston Limited
www.wolston.net
www.coreITSM.com
http://coreitsm.blogspot.com/

Submitted by DavidM on Tue, 2009-09-15 19:41.

Oy! :-)

Jim,

I've been in the room and had the same experience both as a delegate and as the author!

The word "Chutzpah" comes to mind. :-))

David

Submitted by JamesFinister on Tue, 2009-09-15 19:55.

You know what I meant

Delegate:"But ITIL" (V1 in those days, but obviously we didn't know it was only v1) "says ...whatever"

Well Known ITIL celebrity " Yes, I know it does, I wrote that. I was wrong."

Delegate "But ITIL says.."

WKIC* "Yes, but I wrote it and I was wrong. It won't work"

Delegate "But ITIL says.."

WKIC "ITIL didn't say anything, I said it, and I was wrong."

Delegate "But ITIL says..."

I think the exchange lasted twenty minutes.

*Name of WKIC available on request. I did worry in the early days of the Foundation exam that he could never remember the right answer to one question on the dummy paper, even though it was his own question, proving that the best of us can be tripped up on the process v function issue.

Submitted by doc (not verified) on Mon, 2009-09-14 09:56.

call != incident?

V3 Srv. Operation: "A telephone call to the Service Desk from a User. A Call could result in an Incident or a Service request being logged."
This might be the closest to what you are talking about, Skep.

If a mail server is down, and you receive 200 calls from different people, are they all incidents? In my practice I allways insisted on creating 1 incident and 200 service call records related to this one incident.

Service Desk with less nervous or slower customers could receive only 50 calls about the mail service. If all calls are incidents then the first SD will have much better statistics although they may restore the mail service in the same time as the second one...

Submitted by skeptic on Mon, 2009-09-14 10:35.

associate calls to different users and link them to one incident

You're lucky your software allows you to associate calls to different users and link them to one incident - many don't.

But which user is the incident associated with? do all users have to agree service is restored or only one? how do you know for sure all 200 users are experiencing the same problem?

Submitted by supportthought on Mon, 2009-09-14 14:49.

best Practice

This is the best way to do incident Mgmt, 1 incident many calls, many users. 1 closure many call backs.

Good training, people, and leadership is how you manage to make sure all calls are truly related to a major incident, the call back is your confirmation.

Any other method will overstate your incident numbers or create a tangled web of relating incidents and not reporting them as separate.

I have also seen this done with separate incidents logged and then related to a master incident, but I always felt this was a tool inadequacy rather than best practice.

Submitted by skeptic on Mon, 2009-09-14 19:57.

major inadequacy of the description of ITSM

If you are right, and I think you are, then this is a major inadequacy of the description of ITSM by ITIL, COBIT and ISO20000, right in the heartland of arguably the most important process of them all. And if so, it sure looks like it might be there because so few tool vendors would comply

Submitted by ianclayton on Mon, 2009-09-14 20:53.

My definition of ITSM

Skep

I blogged to the ITSM topic at www.itpreport.com last week and in that post (lengthy though that may be) I said:

"The Product Management profession morphed into two discrete, complimentary and sometimes-conflicting responsibilities: product marketing and product planning. Together they created the ‘goods-service continuum’ concept to help describe the amount of service or ‘people power’ in a product. Ranging from pure goods such as food items, to the subject matter expertise of a lawyer or teacher, a product was placed along this continuum by design...

As the world’s economy shifted from the Henry Ford days of mass production, World War Two and subsequent manufactured products to services, service management was born as a necessary and specialized extension to product management...

A number of IT organizations have begun the journey to transform themselves into a customer focused service provider. Those that have are following a blueprint suggested by the term ‘IT Service Management (ITSM)’, which you could be forgiven for thinking means they are applying the product management styled service management concepts to the IT environment...

You would most likely be wrong... "

In my humble opinion ITSM SHOULD represent how IT applies service management concepts, methods and best practices universally available to the service industry at large, to the task of providing IT information systems as services. We should not be reinventing this wheel...

Submitted by doc (not verified) on Mon, 2009-09-14 11:31.

associate calls to different users and link them to one incident

The incident is associated to a first caller. He confirms the restoration. Upon incident resolution (and closure also) all "callers" get a notification. If they disagree they can reopen the incident or open a new one, thru customer portal, mail or via another phonecall.

Experienced SD will connect a call corectly in 99% cases with a proper incident, especially if it has a higher priority. If not, customers will be very fast to correct it.

Submitted by aroos on Mon, 2009-09-14 11:16.

The same caller

In Skep's original example it was the same caller calling about the same incident. In that case it is just another call, not a new incident although you should be able to count these calls. With different people calling they are also different incidents.

Submitted by aroos on Mon, 2009-09-14 05:54.

Good point

It is so obvious that I had to check the V2 book myself and yes, it is missing.

Lets hope ITIL V4 (or V3.1) will finally make a logical set of definitions.