On-Demand CMDB reprise

We could be overdoing our solutions to reporting requirements in IT Operations, whether it be CMDB, historical trending or service level reporting. Consider the option of on-demand operational data, and a specialist team to provide it.

As suggested in an earlier post, I want to propose the idea of on-demand operational data. We’ll discuss it in the context of Configuration Management and CMDB but it applies just as much to any data requirements. Within the audience reading this, that would also include areas like service level reporting, capacity trending, support responsiveness, and many others.

On-demand configuration
We will start with CMDB. Instead of chasing the rainbow of building a consolidated, federated, integrated, reconciled CMDB, we could assemble the configuration data on demand in response to a requirement.

Let me say from the outset that this is an idea rather than proven practice, but then ITIL V3 includes relatively untested ideas like the SKMS so I am in good company.

The idea of on-demand data first came to me in the satirical book Introduction to Real ITSM:

Known elsewhere as “assets” or “configuration”, Real ITSM manages Stuff. Records should be kept of all Stuff except where records are not kept… Management or auditors will periodically demand lists of Stuff. Given that their requirements are entirely random and unpredictable, the most efficient way to deal with this is to respond on an ad-hoc basis by reallocating staff to collate the data. This is known as on-demand processing. The technology of choice is MS-Excel.

That proposition was made tongue-in-cheek, but let us consider it seriously. Does this scenarios sound familiar?

The Service Desk and Support staff just will not record incidents in the correct category. Incidents get left in the first category chosen even when it emerges they were actually something quite different. There are multiple categories being used for the same thing depending on who you ask. The taxonomy is so complex that new staff use “Other” for weeks while they come to grips with the job before they are pressured into learning the proper categorisations. Level 2 insist on using their own subset of categories. And the external service providers never specify category at all. The Incident Manager spends hours every week trying to keep it clean. Every few months there is a housekeeping sweep to fix it all up. It comes up at every Service Desk team meeting, over and over, but the message never seems to get through.

Or this one?

The Security Manager wants to introduce an authentication “dongle” that plugs into the USB port. He wants to know how many desktops have a USB port accessible on the front. But there is no record of which desktops have a spare USB port at all, let alone on the front. This identifies a deficiency in the asset database, so a new field is added and a project launched to go capture the information from thousands of desktops and laptops across the organisation.

Here is a hypothetical future scenario for the real geeks reading this:

The grid computing network reconfigures itself under load, but because the network is in mid-reconfiguration when the updates get sent to the CMDB and because the updates are not two-phase-commit (there is no multi-phase commit in the CMDB architecture), the updates keep getting lost and the vendors seem incapable of adding store-and-forward to the update mechanism. So we never really know real-time which servers are running which services.

In all three of those scenarios, consider if we created the configuration data when we needed it in response to some particular situation instead of trying to maintain it all the time in a CMDB.

Formalising what we do anyway

This is nothing new; it is what we do now. We create data ad-hoc anyway when we have to. If the data is not there or not right and management wants the report, we gather it up and clean it up and present it just in time, trying not to look hot and bothered and panting.

How much better if we had a team, expert in producing on-demand configuration information? They would have formal written procedures for accessing, compiling, cleaning and verifying data, which they would practice and test. They would have tools on the ready and be trained in using them. Most of all they would “have the CMDB in their heads”: they would know where to go and who to ask to find the answers, and they would have prior experience in how to do that and what to watch out for. Instead of ad-hoc amateurs responding to a crisis, experts would assemble on-demand data as a business-as-usual process.

They would understand basic statistical sampling techniques. When management wants a report on the distribution of categories of incidents, they would sample a few hundred incidents, categorise them properly according to what the requirements are this time (after all how often does an existing taxonomy meet the needs of a new management query?) and respond accordingly.

They would be an on-call team, responsive to emergency queries. “The grid computing system has died and the following servers are not dynamically reconfiguring. Which services are impacted and which business owners do we call on a Saturday?” They may not know the answers off the top of their heads but they will know - better than just about anyone - where and how to look to get the answers, and how long that is going to take.

As I said in that earlier post about on-demand CMDB, we geeks want to automate everything but the reality is the only time where it is cost effecitve to automate is for tasks that are
a) repetitive and
b) STABLE. In a constantly changing world the cost of changing software usually outweighs the benefits it returns

So why do we delude ourselves that a CMS will be stable? All this geekbuzz about federation scares me. If we are constantly integrating and de-integrating operational tools and chasing new technolgoies like The Cloud and virtual this and that, then the already nutty cost of a CMDB goes up another order of magnitude.

Learn to love people.. Humans are infinitely adaptable and self correcting and autonomous.

And in the new flat world, they are cheaper too. Maybe we only need one on-demand-cmdb team member onsite and much of the data crunching can be outsourced to India or China or Egypt or Romania or Kenya.

What do you think? Could it work? If so, wouldn't it be much cheaper and more responsive to a changing environment? (Last time I raised this, I got one for and one against).

[Updated: see also the implications]


My latest summation of On-Demand CMDB

My latest summation of On-Demand CMDB, or wetware CMDB:

a couple of people who are designated owner/accountable for configuration data and process. Design, implement, rehearse and optimise a formal process for determining impact. Give them the mandate to prise the data from the geeks who currently hoard it. Give them tools for discovery and sampling and data analysis and reporting and whatever else they determine they need after trying the process, maybe even a CMDB.

Unknown unknowns

What still worries me is that what catches us out are the dependencies we didn't foresee. The point of a CMDB should be to do a meta analysis that isn't obvious. How do I know in advance that moving a scanner with a static IP address will stuff the entire network? OK, that one is inexcusable, but has happened to me.

Strangely I agree with keeping the data with the geeks, the question is how do we wake the geeks up to notice our change might impact them? We don't want to always go down to the lowest unit of information if we don't need to

professional configuration data analyst

The data is staying with the geeks. What could be more geeky than a professional configuration data analyst working in the belly of IT Operations?

how many CMDBs provide that kind of "meta analysis" or "gotcha detection"? An experienced human is far more likely to produce that kind of alert

Computers aren't very smart

Computers aren't very smart but they are good at dealing with lots of data. So for larger, more complex environments an experienced geek will miss so much. Identifying all the combinations of what could go wrong when one small change is made is like playing chess. So our geeks need the help of some software and it's one of the reasons why we want CMS as sold to us by ITIL and others.
So we want it - but does anyone provide it yet? I don't think so. That's partly why CMDB brings up such strong feelings.

In addition to the technical impacts of a change we now want to be able to understand the business impact. A link to the Service Catalogue can help with that - if you got a decent one. Strong feelings also in evidence there. The other approach which can help and has other benefits is to change our profession at geek level to a more business aligned one. Even more strong feelings about that I imagine.
Consider the pin pulled on that little shrapnel grenade.

CMDB is so 2007

Pin pulled indeed :) Spot on.

In the most complex and critical of sites I can believe that a practiced professional configuration team with good analytical tools might be sufficiently more effective with a CMDB so as to justify the expense of that CMDB... but I gotta think it is pretty rare. Does what they miss cost justify the seven-figure bill to design, install, populate, maintain and audit a CMDB? And there is nothing even more important that the money should be used on? Not often. That's why there are so few CMDBs. "#bigfoot" as they say on Twitter.

Now we are in an even worse situation. Just maybe now and then a CMDB is the right thing to do. But the "experts" have moved on. CMDB is so 2007. Now it's CMS, which doesn't even exist. I'll blog on that one day. No rush - it won't exist for years.

Wasted money

7 figures goes a long way these days, it may almost be cheaper to duplicate devices. data and networks to remove all those dependencies we have such a hard time tracking down and managing.


I mean, yes subject to quite a lot more definition and sanitised-packaging-for-management.

Definition because your "idea" needs more flesh to determine if we're talking something viable or if there are gaping holes in the assumptions. (Just as the SKMS "idea" needs more flesh.)

By packaging for management, I'm thinking "don't worry, we have a full suite of federated CMDBs, but we haven't wasted money on predefined reports - we'll cook up and tweak the reports on demand". Whereas the more honest situation might be "we'll auto-discover and manually discover and slice-and-dice and correlate and de-duplicate and (whisper) extrapolate the missing necessary data on demand".

Picking up Antonio's notes on non-exact numbers, and in response to Jim on the original thread, who really believes that the CMDB/CMS is 100% accurate? And repeatable, scalable and reliable? Not me. You may not admit it to management, but on demand reporting could be just as accurate, or more, than pregenerated stuff.

How do you validate auto-discovery?

One of these days I'll blog on "How do you validate auto-discovery?"

The ITIL books and I say that auto-discovery is only suitable of initial population and for ongoing audit of CMDB data (Service Transition top of second column page 69). Those who advocate automated maintenance of data don't understand the distinction between asset database and CMDB, and don't understand the requirements of Configuration Management and its integration with Change process.

But even if you use it for audit, it is not a gold-standard benchmark to test against.

then we get on to the issue of what percentage of CIs are really under Change control.

And how reliable reconciliaton is.


I'm pretty sure it can work

I must admit that when I read your book, this was one of those "Knowledge Pearls" to read between the lines. I suddenly realized that this is using statistic sampling and that this is the correct way to infere knowledge from fuzy data.

Then I commented this with a colleague, a business intelligence expert, who laughed at me and said that it was a good joke, but internally I suspected that it can be possible.

Now, reading your... let's say more serious approach to the idea, I'm pretty sure.

¿Who wants to be the first to test?

The only problem is that we, geek people, don't like "non exact numbers", so those approaches that leaves you with a 99.5% confidence mean a 0,5 unconfidence... so we don't like statistics.. we like "real numbers".

But , as all the politics already know, real numbers are not cost-effective, and statistics are really cost-effective.


BTW:: I see this approach more usable under the "incident re-classification" use-case than under the "CMDB populating" use-case.

Antonio Valle
G2, Gobierno y Gestión de TI

Syndicate content