Availability Management Is Harder Than It Looks

Availability Management

Everyone understands the need for availability of services. I mean, we all want our power services to be there, right? OK, maybe some things are best done in the dark but I still want to be sure the lights come on when I flip the switch. Us New Yorkers expect most things 24×7, but I understand some folks in other places cope without services like buses and trains during the night time, and they agree to have services that only happen during working time.

That’s the idea those ITIL® guys have had in their books since 1990 when they talk about availability management. Of course they had to make it look clever – and nothing looks smarter than a formula – so they gave us the following for availability.

Availability = Agreed Service Time – Down Time   x 100
Agreed Service Time

It looks nice and easy. Keep it going all the time and you’ve delivered 100% availability and everyone is smiling. Lose it for an hour during an 8-hour working day and it comes down to 87.5%, which will get you a lot less smiles!

How could anything get confusing with something that simple and obvious? Well, let’s look at that.

Beauty – and Availability – Are in the Eyes of the Beholder

In most organizations, customers and providers will sit down together and decide on the “agreed service hours” for the service desk. Committing to deliver a service 24×7 is expensive. These folks setting the “agreed hours” in a meeting room may well get tempted by the savings that a ‘9-5 on working days’ package can offer them. The sales team might think differently when the system goes down at 6pm on the last day of the working week, with several important deals nearing closure (I know my colleagues at SysAid would be fuming!) The figures published at the end of the month might show 100% availability, but the users will not feel that way. If the attitudes and productivity of key staff are important to an organization, then they need to ensure:

  • This key staff are part of the discussions when ‘agreed service times’ are being agreed
  • What was agreed is publicized clearly, not just discovered during a service outage

Fail to do that and the availability you are judged on will not be the figure you calculate.

Not All Downtime Is Equal

The simple formula tacitly assigns the same value to every minute the service is not available. Real life in real organizations is rarely like that. For a company that does a monthly invoice run on the last day of the month, losing their finance system on that day is way more disruptive than losing it on the first day of the next month. Yet both situations would account for (say) 8 hours of downtime in the availability figures. Or think about a retail sales system supporting a shop. Losing 2 hours on a weekday morning might have a very different effect than 2 hours on a weekend afternoon.

For the figures to reflect the pain, a significant slice of research, discussion, consultation, and agreement will be needed. Maybe a weighting factor can be worked out. But the more you talk and work on it, the more complicated it can get. And even then it needs to be documented and explained to those who might be affected. And, perhaps hardest of all, the customer and user have to take an active part by supplying their input. This cannot be worked out only by the service provider.

What Does “Available” Really Mean Anyway?

Before worrying about when the service is available, we should really clear up what “available” means. For a component like a disk drive or a monitor, I think it’s obvious – it either works or it doesn’t. For elements like the network, it is less clear – does a degraded service count as available? After all, it is probably still usable, just takes longer and stretches the users’ patience to cope with it. Things get even less clear when we consider “service availability”.

Services are very complex nowadays, with a range of features and components delivering facilities to a range of users. Let’s take a relatively simple example – a company’s staff expenses submission, approval, and payment service. It provides a range of facilities:

  • Entering expenses claims
  • Routing them and processing approvals, queries, etc.
  • Processing payments
  • Reporting on expenses and progress against budgets, etc.
  • Archiving old records

Clearly not all of those are equal. In fact, for many staff the third one, processing payments, is most important and only the first three matter at all. So…do all parts of the service have to be there for it to be officially ‘available’? Someone needs to decide beforehand, or otherwise reported availability means very little. If archiving is the only part that works, it’s clear the service isn’t really available, and likewise if it is only archiving that does not work, most customers would consider the service as available. But that crossover point between available and down can be very hard to set, because it will be seen in different places by different groups of customers and users.

Communication Can Squash Confusion

There are many more points of potential confusion around availability definition, measurement, and reporting. Hopefully these few examples I’ve just given you will give you an idea that what sounds simple on first sight, can actually get more and more complicated the deeper you go, putting it in place.

What should be clear is that in order to get meaningful results, and to remove the arguments afterwards, time spent in discussion with customers and users – establishing what an available service and downtime mean to them – is time well spent!

Image credit