Business continuity, disaster recovery (BC/DR)

Business Continuity Management, Disaster Recovery, and ITSM Are Not Mutually Exclusive

We have all had, or have heard about, IT horror stories that have caused businesses to go into unplanned panic (and most likely to be adversely affected in terms of business operations, revenues, and customer sentiment). I know that I have seen my share during my IT support career, and I am sure that my colleagues would argue that I have caused more than my fair share too.

Unfortunately, the unexpected does happen and therefore it needs to be expected, even if with a low probability of it happening. It’s good risk management and will usually involve one or both of business continuity (BC) or disaster recovery (DR) plans – often abbreviated to BC/DR.

Real-World BC/DR and What Are Your Doing About It?

Sadly though, these business-critical plans are often in a less-than-perfect state when the BC/DR crisis occurs, and you’ll hear things like this:

  • “The annual DR test was undertaken but it was not updated to reflect changed circumstances between then and now.”
  • “The DR plan is somewhere, but we are not sure exactly where.”
  • “The BC plan was thoroughly tested but we did not include nor test for this sort of thing.”
  • “The BC plan is only IT-specific; we assumed someone else would get the business-side of things up-and-running again.”
  • “We have no DR plan – we have never needed one.”

These quotes might make you smile but don’t sit there thinking that BC/DR is something that someone else in the IT team does – it can very easily be an IT service management (ITSM) responsibility. And, as this blog shows, there is a big overlap between BC/DR and ITSM. But first I need to state that…

Business Continuity is More Than Disaster Recovery

It is really important to start with this point. According to the Business Continuity Institute, BC is “the capability of the organization to continue delivery of products or services at acceptable predefined levels following a disruptive incident.” However, as a simple New Yorker, I prefer this simpler BC definition, that:

“Business continuity is the practice of keeping the business in business.” 

It’s one of those cool quotes that you can’t quite remember where you stole, sorry borrowed, it from.

DR can be viewed as merely a subset of overall BC. It is the saving of data with the sole purpose of being able to recover it in the event of a disaster. I am sure that many people would disagree with this, but it is my blog and therefore my definition. So BC trumps DR by being the bit that actually gets IT and the business back up and running again. But as a compromise to those that might think DR is more than my definition, I use the BC/DR acronym from now on in the main.

Unfortunately, whatever we call it, many organizations feel that BC/DR is an unnecessary expense and ignore the marketing value of statements such as “we know how to remain in business” or service-related statements around availability and quality of service. It is also certainly a less costly alternative to the potential business disruption (and associated loss of revenue and customers), regulatory fines, reputational damage, or the ultimate penalty of going out of business.

Thus BC/DR is important. And it should also be important to those who work in ITSM – after all they are responsible for making sure that IT and business services are available and delivered at an acceptable quality.

The BCM Lifecycle Mapped to ITSM

BC/DR, or business continuity management (BCM), shouldn’t be a totally alien concept to anyone who has taken any of the ITIL exams – with IT service continuity management (ITSCM) one of disciplines in the ITIL service design area that should interface with many other ITSM capabilities. (Editor: my head is hurting from all these acronyms Joe.)

The “Blended BCM with ITSCM” diagram below shows how other ITSM disciplines overlap with the BCM lifecycle, including aspects of service strategy, service transition, and service operation.

Blended BCM with ITSCM

This mapping to ITIL and its five publications can also be extended upon using the plan-do-check-act management method, an iterative four-step management method used in business for the control and continuous improvement of processes and products.

Plan: Service Strategy. Your strategy not only needs to encompass service delivery success; it also needs to plan for service failure. Something is going to force a continuity situation at some point. You cannot prevent it but you can be ready by knowing the answers to questions such as:

  • What is your business?
  • What are your key products or services?
  • How do you “do” business, i.e. make money?
  • Who do you do it for?
  • How long can you remain out of service?
  • What is the impact of no service to your customers, employees, or suppliers?
  • What happens if a supplier has an issue that affects your business’ operations?
  • What are the requirements of regulatory agencies?
  • If you have to invoke your BC/DR plans, are you ready?
  • After a test or event, how can you improve for the next one? 

Plan: Service Design. The service design processes should enable the creation of your continuity options and definitive plans based on variables such as:

  • The services and SLAs in scope
  • Vendor and licensing contracts
  • Internal processes and information requirements
  • Security obligations
  • Employee skills and availability
  • Data backup and restoration
  • Communication and escalation plans
  • Ongoing maintenance and support
  • Salvage options for IT and business
  • Insurance policies that can be invoked
  • Time of day or period of event and its potential impact on business operations
  • Reciprocal plans with other businesses

Armed with this information, the business and IT can decide which BC/DR options to use, test, and maintain and how these options should change over time.  Creating an impact versus option matrix is a simple tool to support decision-making here.

Do: Service Transition. This is where the agreed BC/DR options are tested and readied for invocation when needed. This should include the following variables:

  • Scope and assumptions
  • Responsibilities – internal and external
  • Requirements related to accommodation, IT, and supplies
  • Communication and escalation plan and process
  • Change control and governance
  • Associated documentation and checklists
  • Storage location of information required and the plans of continuity
  • Knowledge database of people, both business and personal

Then there is the need to test-test-test the BC/DR plan. This should include activities such as:

  • Half-day workshops involving portions of the business, suppliers, even customers
  • Testing at various times: during business, during batch runs, weekends, and high-volume or critical periods
  • Ensuring that everyone who needs to know, knows the part they must play to quickly resume normal business
  • What happens if normal service cannot be resumed? It’s the secondary BC/DR plan for when the primary BC/DR plan doesn’t work as it should – the Plan B
  • Escalation processes for decisions, money, resources, and regulatory agency assistance – you never know what resources will be needed above and beyond what you expect to need

The goal of this testing is to ensure that the strategy, plans, people, IT, vendors, etc. are all fit-for-purpose and in use.

Do: Service Operations. What happens when you have to invoke the BC/DR plan? If all of the above have been successfully tested and documented for real-world use, then you do it as planned.

Since the service desk is already known to be a point of contact for IT assistance or requests, then using the service desk as a central location for BCM teams is a worthwhile consideration.

Also don’t forget that business-driven change might have an impact on your BC/DR plans – so make BC/DR part of your change management and request for change process.

Check and Act: Continuous Service Improvement. What happens after a test or real invocation? This is very important. You need to have a review of what occurred, which entails answering the following questions:

  • Was the BCM/DR plan successful?
  • Did the people, processes, and technology perform as expected?
  • What issues were found?
  • What has to change, by when, and who owns this action?
  • How will you know that the change will help?

If an issue that impacts the business was discovered, then resolving and re-testing as soon as possible is paramount. It is also important to maintain this record of performance to help ensure that future events or tests do not have the same issues and instead improve the reliability of the plan.

Some Final Tips for BCM and DR 

I now that this has been a long blog, with a lot to take in, but there are a few more things to consider:

  • Make BC/DR a culture thing
  • Ask WHY when you prioritize processes (as employee-related tasks are often subjugated when they actually need to be recovered first)
  • “Communication, communication, communication” is the mantra for BC/DR
  • Baseline and benchmark what you do
  • Use insurance to mitigate cost
  • Use ISO 22301 international standard to assess and test yourself
  • Remember to include municipal services and vendors
  • Backup is not enough if you cannot restore

So how is your BC and DR looking in light of my blog?


Posted by Joe the IT Guy

Joe the IT Guy

Native New Yorker. Loves everything IT-related (and hugs). Passionate blogger and Twitter addict. Oh...and resident IT Guy at SysAid Technologies (almost forgot the day job!).