Dead cat syndrome

Image[Hi! IF you came looking for insight into the rebounding world economy, you are looking for Dead Cat Bounce. This post is about IT project management. But thanks for dropping in! If you are interested in IT, please take a look around]

Operational readiness of new and improved services ensures a smooth transition from Project to Production. ITIL talks about it in a number of places, but I think Operational Readiness needs to be recognised as a practice in its own right, like any other ITIL "process". OR is not (just) about being a gatekeeper to Prod: it's about ensuring readiness throughout the lifecycle. OR provides a positive benefit for the customers, projects, development, and operations.

One of the things ITIL3 improves is the whole development/production interface, introducing radical concepts like production readiness, acceptance, evaluation... oh and testing. Heady stuff. But something that was omitted from ITIL V3 was documentation of Dead Cat Syndrome :) Service Transition alludes to it in a few places, but it is such a chronic condition in the industry it needs to be described explicitly.

ImageIn many organisations, putting a new project into production is akin to lobbing a dead cat over a wall. No operating model, little or no operational procedures developed, minimal last minute training for the service desk and operations. Supplier contracts don't align with service commitments. There are no service commitments - no SLA exists. The project disbands the moment the system goes live. If you are lucky someone is still around to answer questions.

IT Operations needs to put controls in place to prevent this. Projects benefit from these controls by having a better definition of the end goal and a better end product.

Dave in the comments below said "consider both Operational Acceptance (does the service work as a service) and Organisational Readiness (are we ready to operate it)". I like this yin-and-yang view. There are indeed both sides: the readiness of the service to be operated, and the readiness of the organisation to operate it.

When is a project ready for production hand-over? In some organisations this is when it has passed testing. Testing addresses what ITIL v3 calls Utility and Warranty: it does what it should, and it does so reliably/dependably (see Service Strategy (the “leaf book”), p 33).

But there is more to consider. Without these further considerations, many projects are as welcome in production as a dead cat, and those in IT feel they have as much say in receiving the system as a neighbour does in receiving the cat over the wall.

Some organisations accept a project after it has been in production for a Warranty period. (Not the same meaning of “warranty” as ITIL Version 3. ITIL calls it Early Life Support, a toothless phrase that does not imply the same commitment and standing behind their product that "warranty period' does. If it doesn't work, send it back. In fact ELS sounds to me more like the project is already dying before it goes live.) That is, the project team supports the system and resolves incidents and problems for a defined period after go-live. This is an excellent idea, but on its own it only means we are deferring the cat-toss for a month.

ITIL v3 talks about operational readiness: testing that the system can be deployed and deployment can be verified, the service can be monitored and measured, it can be operated, users and service providers can access it, and it meets service levels (see Service Transition (the “peas book”) p 101). This is a most important concept, and operational readiness should be a formal test criteria, as it is in ITIL v3. One way to think of this is that the IT operations group perform their own acceptance testing, as well as the users doing so.

There is an element of locking the stable door in operational readiness testing. We need to work back to the start of the lifecycle and influence operational robustness (not an ITIL v3 term), i.e., how the system was designed and built...

Okay, now the cat is at least alive. But it will still be a mangy cat unless we look at the last consideration for an acceptable project deliverable: The organisational infrastructure. In terms of my favourite mantra, we have addressed process and technology but what about people?

No system should be accepted into production until it has:

  • A mechanism for the transfer of IP from the project into production support (level 1 and 2). Normally, this is done by seconding production people onto the project, then returning them to their production teams at the end of the warranty period.
  • Underpinning contracts negotiated with third party suppliers (usually software and hardware vendors) that match up with...
  • SLAs agreed with the business.
  • Administration roles agreed and demarcated between IT application support and the business’s own systems administrators, defined all the way down to discrete functions such as user provisioning, password resets, adding values to reference data, help with using the system, fixing user data errors, backups, scheduling jobs, regular maintenance.
  • Functional and hierarchal escalation paths agreed between all IT, business and service provider groups. Functional escalation means Level 1, 2 and 3 support arrangements decided and implemented, with OLAs (i.e., internal SLAs between support groups) on how incidents and problems get passed around.

Now we have a healthy cat.

How to get to know the cat and have some say in whether it gets chucked over the wall or not?
Image
There is a tribal effect in any group of humans. Once the group reaches somewhere between 20-and-100 people it will start to fracture into two groups. (Personally I reckon 80 is the magic number, at least with Kiwis). The separation between development/solutions and production/operations is a natural plane to break along. So, in any reasonably-sized IT shop there is usually an us-and-them mentality between these two groups. Once they are on separate floors or in separate buildings it is pretty much a given that there is some gap in communication.

In order to have any control over what is chucked over the wall, it is essential to bridge this divide. IT should establish liaison relationships with architects and with project teams. I have never yet found a relationship between production’s infrastructure architects and development’s solutions architects that rose above the dysfunctional. It always seems to degenerate into butting antlers. Liaison people are communicators and negotiators, not pontificators.

The main objectives of liaison are to:

Image

  • Educate architects, designers and developers on operational requirements;
  • Influence design standards; and
  • Review projects at an early stage to gain agreement on operational design.

Most ITIL processes seem to have their proactive and reactive aspects. Think of this as proactive IT operations: influencing the design of systems during their genesis to ensure a satisfactory outcome for everyone. A good place for it to reside is with the Availability Manager.

Conversely, projects cannot follow a strategy of ignoring operational requirements - it won’t make the problem go away. Project teams should reach out in return and encourage corresponding communications with production, to ensure these requirements are addressed early in the project, when changes are cheaper and easier.

So, operational acceptance should depend on: proper testing, warranty period, operational readiness, operational robustness, organisational infrastructure, and liaison relationships

More requirements to worry about and a higher bar to clear might seem like the last things a project manager wants to hear. But, by meeting these criteria, a PM knows exactly what he or she is shooting for. Moreover, there will be minimum fuss at the end of the project and a quality product will be delivered. Hand a happy purring cat to willing owners instead of tossing a feline corpse over the wall and running away.

I don't feel that ITIL has this problem clearly nailed yet. In fact I accuse Service Design of actually fostering it.

When a project is being designed, production readiness should be designed in. Along the way the project should be coached to prepare for Go Live. (For more detailed suggestions see this previous post). If they don't listen or don't want to play or don't play nicely, then there should be production-readiness criteria and if they don't meet them they don't go into production (The satirical Real ITSM calls it the Service Porthole they have to squeeze through to get aboard). If this gets steamrolled then there should be waivers signed: "Operations will make best endeavours... but accept no responsibility if..."

Nobody deserves a Dead Cat.


See also these free checklists

My Dead Cat Syndrome presentation on TFT14

Syndicate content