Debate around the definitions of Incident and Problem never seems to end.
Here's my take on the fundamental issue that fuels the endless arguments: we have two entities trying to do three jobs.
When a service breaks, we have to deal with three things in support:
I'm getting lots of positive feedback about my series of articles for The ITSM Review, which use a train crash in Cherry Valley, Illinois as a case study for understanding incident and problem management. (It is part of a wider theme of my articles for The ITSM Review using railroad examples for service management).
It always mystifies me that people (and ITIL) don't grok this simple model: incident management is about users, problem management is about causes.
In ITIL, we don't separate Incidents from Problems properly. This causes a muddy and confused definition of both. Join me as I try one more time to make this clear.
Calling all you ITIL theorists, philosophers, pontificators and pundits. Marty is back: our follower from the real world, trying to make sense of ITIL on its home grounds, the operations of big iron batch computing. Marty asks what happens after a service is restored? What does ITIL call the function of undoing the damage done while a service was unavailable? I have a view - of course - but I'm going to stay quiet - for a while- and hear what everyone else thinks. So have at it.
When a train rolls by, the guys on shovels and brooms, track gangs, crews on the ground, crews on other trains, clerks, station-masters, everyone stops and watches the train and waves to the crew on board. Lazy? Hell no.
Complex systems are by definition broken. They will always break and sometimes they will break when everybody did what they are supposed to. Fixing the problem won't necessarily reduce the risk of another incident.