Use SLA response metrics that matter

Using indirect KPIs is always a dangerous distorter of behaviour. if you want the SLAs to ensure the appropriate resources are applied and to drive the size and location of teams required and the spare part/hot swap stock size and locations, then write the SLAs so they define the appropriate resources to be applied by priority of incident for that service and define the size and location of teams required and the spare part/hot swap stock size and locations by priority of service. Don't make the behavioural causal chain any longer than it need be - you'll get all sorts of unintended consequences.

This post is in response to this excellent comment. What I've seen happen is that everyone drops everything to work on a priority 1 incident and the smaller priority 2 gets worse and worse, esepcially as the priority 1's arbitrary resolution deadline approaches and a resolution is no closer.

In the satirical Introduction to Real ITSM [now available on Amazon :-D] I said "Real ITSM SLAs don’t define response times. Based on priority, they define how many people will be assigned to an incident; how many hours a day it will be worked on; and what gets overruled to work on it. " So a lowest priority incident gets .0025 hours per day of a Peon resource who gets pulled from reading the paper, while a higher priority incident gets 2 staff assigned for 8 hours a day managed by the Service Desk team leader and thye can be pulled from staff meetings.

So if there are the agreed number of people working on that incident the rest of us can concentrate on everything else that is still going on. of course as the User Priority rises (the spat dummies index), management will reassign more resource anyway but that is another driver to meet a different KPI.

We have this fixation that SLAs need to be written in terms of a small set of accepted metrics. they don't. Write them in terms of what matters.


Matter to who

I am not sure if I understand where your satire finishes. If I was a commercial customer of a service provider, the type of SLA in the last two paragraphs would be a bit scary..

Its essentials saying, as long as I show up, I get the gold medal. Its an SLA which will mostly deliver lowest common denominator service. I always have this battle with marketing folks. They get goaled on the number of marketing programs they launch into the market, no matter if they have any impact on generating demand or more importantly the right type of demand.

How can you have an SLA which does not reflect the intended goal of the service. The goal of a incident management service is to deliver a workaround. Not to deliver 2 bodies to work on it and resolve it whenever it can be resolved. And how do you then place an SLA around what type of skills and experience these 2 resources have to ensure optimal resolution time..

I agree that you need to set SLA's that matter, but just because some of the existing SLA's are imperfect, does not make them wrong.

Brad Vaughan

SLAs are about setting expectations

SLAs are about setting expectations. Saying priority 1 incidents will be fixed within 4 hours is (a) impossible to know if you can do it (b) less likely than fixing priority 4 incidents within 4 hours (c) making a rod for your own back when you can't.

i'd rather say that
priority 4 will have one person assigned to it within four hours. they will work on it as they can. If they can't fix it within 3 days we will ask you if we need to escalate it in which case it will become a higher priority.
priority 3 will get one person assigned to fix within four hours. they will bring in additional resource as they need. if they can't fix it within two days they will contact....
priority 1 will get the following team of 5 assigned fulltime within one hour, including a Major Incident Manager, plus approximately 50% of the service desk manager's time. They will bring in additional resource as they need. they can pull resource from projects, training, meetings...

or whatever. We can still measure time to resolve by priority as a KPI - just don't commit to it in an SLA. Users understand that it takes however long it takes. they are happy just to know the scale of the response they are going to get.

You don't place an SLA around who gets assigned (or you could I suppose) but we don't do that now. We assign people now and they know we do the best we can as IT professionals to find the right person.

As well as expectations, SLAs are about transaprency. "This is what you are gonna get". SLAs aren't (in general) written to deceive or obscure what is going on - they are there to make it clear. Except in Real ITSM :-D Different priorities get different levels of attention anyway, so make it clear. people are a bit less keen to demand priority 1 if they know what it will set in action.

As professionals we do try to close every incident as quickly as we can, i.e. right now we resolve it whenever it can be resolved. We're not saying we are going to extend the resolution time any more than it is now - we will always endeavour to keep those resolution times down. i just think it is silly to make promises. Setting a three-day expectation on a priority 4 just gives level 1 an excuse to not even start until tomorrow, especially if they are all bunched up over there trying to fix a priority 2.

Again its about violent disagreement

So most of what you say is on the money..

No commercial service provider (that plans to stay in business) is committing to resolution time (by default), except when well defined in a quantitative space like say a physical HW failure. Most SLA's are based around response time and escalation.. Put that is the focus, it does not go any deeper (as far as the SLA goes). The definitions of the processes used in the services organization for say Tier 2 vs Tier 3 vs Tier 4 vs Executive Escalation etc.. are just defined in a addendum to the SLA which can change more flexibly. So to a certain extent to contractual document used to define the relationship with the customer has most of the aspects you desire.

It would be foolish to define the resource allocations too clearly in a SLA type document (or even in a legal contract), because
1. you would need to measure and prove compliance
2 . incidents in complex IT infrastructure environments are too variable to create a generic resources plan. Why assigned 5 people to a task that with 2 has the optimal chance of resolution. Too many cooks and all that..
If you go down to that detail, you are artifically defining resource plans.

The option of resolution times is mostly applied by customers who want to ensure the provider has shared risk in solving the problem. This is when SLA's become more than expectations. The risk is normally capped. It can be beneficial to the provider if there is also benefit in overachieving the SLA, but WARNING, hard to manage this successfully and maintain long-term healthy relationship. I have successfully managed SLA's with individual resolution times, but the customer understood that you are managing it at a macro level and the goal is not to deliver 100% compliance. It was about measuring a goal, trying to achieve it and adjust it annually/qtrly to reflect the desire to improve processes proactively and reactively. Resolution times were just part of the picture, it included volume of calls etc.. At the macro level, this type of risk management is possible.

My preferred method for SLA's at a macro level is to manage the number of problems resolved at Tier 1, Tier 2 and Tier 3 etc.. for a certain priority level. This provides some direct linkage to the goal of resolution time (because assuming tier 1 only exists for 1 hour) and also help bring into line successful problem management , service desk, availability management. We tend to do these types of things based on base lining with annual improvement targets.

So the net-net is I agree with you, but I disagree on the detail of the resource definition. You need to start with defining the service delivery framework to set expectations, but try and move towards an SLA that links to the overall goal of service delivery (speed of resolution, accuracy of data, reliability of service) so that you are continuing to improve the quality of service. Don't start at the service delivery framework and dig into the detail of the service delivery process (people, skills, tools being used etc..).

Brad Vaughan

Getting to WE

I agree with you Brad, it's amazing how often we rush to develop SLAs when not enough time has been spent understanding service delivery requirements, agreeing on realistic expectations and establishing an accurate baseline of current performance. If fact, simply agreeing on the current level of performance (and how it is measured) is often a breakthrough.

I have had the unpleasant experience of having clients get into my knickers --- a bad trip indeed. You can't improve overall service quality by putting requirements on paper; sooner or later it's going to take MONEY. Often our desire to 'maximize resources' hides the frightening reality that the Customer cannot afford what they, NOW what?

This is why it is so important that there be a business lane associated with your ITSM road map.... you cannot set shared expectations without one, and without shared expectations you're back to US and THEM.

The real voyage of discovery consists not in seeking new landscapes,
but in having new eyes.
In seeing the universe through the eyes of another,
of a hundred others.
In seeing the hundreds of universes each of them sees.

-- M. Proust

Getting to WE is the goal.

John M. Worthington
MyServiceMonitor, LLC

Syndicate content