The importance of IT drills to enhance performance

The #1 way to reduce the service restoration time (MTRS) of incidents is seldom mentioned, not at all in ITIL: training drills. If IT were the military we'd knock a server over once a month just to see what people do.

When I worked at one dysfunctional site, we got quite good at recovering from total disaster, especially good at crisis communications to the business and between ourselves. But not because we drilled. Sadly the real world was our practice ground.

Most good IT shops practice disaster recovery once a year. They also practice building evacuation. Why don't we practice IT procedures? In particular we should rehearse

  • Major Incident response, several times a year. ITIL makes a remarkably casual reference to MIM and offers almost no guidance at all. There is some great MIM material here from Braun Tacon of Nike. Just do it I guess.
  • Release and Deployment procedures including...
  • Release and Deployment rollback
  • Priority 1 incident response
  • Functional and hierarchal escalation to third party suppliers
  • VIP service requests

...and my personal favourite

  • determine the service impact of an incident or change, a.k.a On-Demand CMDB


"Train Like You Fight, Fight Like You Train"

It's not just a slogan -- it's the way things work.

Unfortunately, most IT organizations spend most of their time (for various and sundry reasons, some good, some bad...) handling BAU actvities. Most of the "failures" and major incidents happen infrequently enough that the pain and consequences associated with them are often easily forgotten once things "return to normal".

Being vigilant requires a proactive orientation. Doing the work to prepare adequate response plans falls into this category. Once prepared, unless it is practiced, you're taking it on faith that the plan will hold up under the weight of the circumstances you will face during the event.

As someone who is also responsible for training people in Incident Command System (ICS) and National Incident Management System (NIMS), I can say that IT would be well served by leveraging what already exists and incorporating that thinking, rather than trying to invent IT-centric approaches. I don't take issue with the MIM material you reference, but there's a lot more that could be addressed.


Leverage non IT best practices... how dare you?

Are you crazy? Leverage best practices by people outside of IT. Have you ran too many marathons?

Next thing you know we will be looking at how shipping track requests across multiple providers for managing cloud service requests, or investigating how engineering design firms work off of integrated specs to see how we could deliver highly tuned SLA/OLA's, or dare I say it... look at how marketing companies are leveraging social media to engage consumer influence to improve the way we communicate IT offerings and support.

NO! If IT did not invent it, then it will not work!

That's it... I need to make some badges. Someone has to stop this madness.

It was never a question...

Am I crazy?

Yes, I am. Certifiably tweaked like that!


P.S. "Badges? We don't need no stinking badges!" -- see what I mean! :-)

two excellent points

You make two excellent points Ken:

1) a plan is only a plan until it is tested and then rehearsed until it works

2) there is a whole body of incident response knowledge out there outside IT - as usual. I did research a lot of that MIM stuff myself - I envy you your practical experience outside IT. I liked Braun's paper because (a) it is addressed to an IT audience and (b) it looks much like what I came up with myself :)

The world outside of IT - Look out the window every now and then

Folks - now you've gone and done it - and let the cat out of the bag regarding NIMS for incident management - I mean it did (properly if you actually know a bit about how it works) help manage the BP spill, oh and Katrina... bit hard to practice that one but they try.

Now you have gone and done that I suppose I have to disclose there are similar sets of practices, best practices perhaps, industry wide accepted practices - for the following:

- change and project management
- problem management (that includes control barrier analysis - google that one!)
- service request management (oh - check 311 service request - bing that one)
- service (oh I mean product) management
- capacity management (finite capacity scheduling - I think a small company named GM might use that)
- availability management (hhmm.. let me see oh yeh your local utilities company)
- product lifecycle (oh oh - thats product oriented marketing - product management again)
- configuration management (do not get me started on the US Army or US Marine Logistics approach - how do they get all those spare parts, boots and supplies to the theater)
... enough already

Light bulb flickering out there in ITSM/ITIL evangelist land - hope so - these places are here I went to step back and write the UNIVERSAL Service Management Body of Knowledge.... and there is so so much more...

Lean, outside-in, UX, CEX, so should service experience management be knicknamed SEX?

Look in the mirror - are you an insider...?

Syndicate content