Measuring risk

How to Assess the Risk of a Change with 5 Simple Questions

One day real soon, machine learning and other forms of artificial intelligence (AI) will deliver on what they promise and truly help IT service management (ITSM) pros to better analyze and assess the risk of making a change. Until then, we still need to rely upon good old personal and peer experience.

This experience includes how best to assess the inherent risk in each change request, and right-size the necessary checks and balances required to make sure that the risk is documented and properly allocated.

Please read on to learn the five simple risk assessment questions that will help with this.

Change and Risk Assessment

A change manager should reflect on, and communicate, the risks they identify to those involved in sponsoring, supporting, and putting a change into effect. There are many questions they could ask to collect contextual information about the circumstances surrounding a change, and the risk, but five of the simplest, and some might say the most effective, include:

  1. How many teams within the organization are involved in implementing the change?
  2. Has this type of change been attempted before, and with what degree of success?
  3. How extensively has the change been tested and what evidence exists to back this up?
  4. Is a service outage required to put the change into effect, if so for how long?
  5. If the change fails, what’s the length of any service outage to reverse (back out) the change?

Let me unpack each of these questions to better explain how they help. And as I do this, I’ll also include a range of possible responses, and a points system to arrive at some form of overall score of the risk.

The Scoring System

The scoring system can start very simply, where each response has five possible answers. Starting at 2 for the best possible scenario and answer, and increasing in increments of 2 to 10 for the worst. The scores for each question are then totaled to arrive at a final ‘total risk factor’ score.

Q1. The Number of Teams

The number of teams involved (in a change) can suggest complexity, and difficulty, from the perspective of the tasks involved, the challenge of maintaining communications, and coordinating the collaborative effort. Thus, the more teams involved, the greater the risk that one of these facets may fail. The range can be straightforward.

In terms of possible scoring ranges, an example is that one team involved scores just 2 points. If there are two teams involved it’s 4 points, and so on until five or more teams involved equates to a score of 10 points. Such that the score increases as the number of teams involved, and thus the associated risks, increase.

This example can be visualized as:

Number of teams involvedScore
12
24
36
48
5 or more10

Q2. The Prior Experience

As an example here, if the type of change has been performed before, successfully, it scores a 2. Previously successful after encountering issues, 4 points. Unsuccessful after issues, 6 points, through to 10 points for no previous experience. You might also wish to expand on these levels.

This example can be visualized as:

Level of prior experience/change successScore
Previously successful2
Previously successful after encountering issues4
Unsuccessful after issues6 (or 8)
No previous experience10

Q3. The Level of Testing

Testing is of course a major factor in success/failure, but sometimes circumstances mean that a change must progress absent of any serious chance to test. An example scoring mechanism is that an untested change scores 10 points. Only unit tested, 8 points. Unit tested and quality assured, 6 points, and one that is unit, quality assured, and user-acceptance tested, 2 points. Again, extra levels (and possible scores) can be added to suit your organization.

This example can be visualized as:

Level of testingScore
Unit tested, quality assured, and UA tested2
Unit tested and quality assured6 (or 4)
Unit tested8

Untested

10

Q4. The Expected Service Outage

It’s always worth asking whether a service outage is required to implement a change, or whether it can be implemented without interruption.

In the example scoring system, no outage scores best at just 2 points. Less than 2 hours, 4 points. Between 2 and 3 hours, 6 points. Between 3 to 5 hours, 8 points, and more than 5 hours scores the maximum of 10 points.

This example can be visualized as:

Expected service outageScore
No outage2
< 2 hours4
2-3 hours6
3-5 hours8

> 5 hours

10

Q5. The Reversal Time

Finally, for now, can the change be reversed out (backed out) should issues arise? When asking about this, it’s also important to check how long any backout effort might last.

An example scoring system is: No outage and less than an hour to back out scores the minimum 2 points – which is good. No outage but more than an hour to reverse the change scores 4 points. An outage of less than one hour scores 6 points. An outage of more than one hour score 8 points, and no back out option scores the maximum 10 points.

This example can be visualized as:

Reversal timeScore
No outage and less than an hour to back out2
No outage but more than an hour to reverse4
Outage of less than one hour with minimal reversal time6
Outage of more than one hour with significant reversal time8
No back out option10

A Worked Example

So, a change that involves two teams (4 points), has been successfully completed previously (2 points), is fully tested (2 points), but will require a service outage of between 2 to 3 hours (6 points), and an outage of more than an hour with significant reversal time (8 points), scores a total of 22 points.

While this is in no way an exact science, and something that needs to be tweaked over time, this allows you to quickly understand the potential risks of your changes and to react accordingly.

Understanding Your Organization’s Risk-Point Range

How much risk, and how many risk points your organization can handle in any available change slot (i.e. a time when a change can be applied), can vary. The total score can be matched to differing levels of change management assistance and scrutiny. Such that a total score of less than 15 might be subject to a low risk procedure, 15 to 25 moderate, thereafter – up to say 40 might – be high, and a greater score very high.

Consider Weighting Each Response

For the bolder among us, they could also add a weighting factor to each of the questions and use these weights to add bias to any question. For example, past results may have indicated the testing question is especially important for the application affected, so we apply a 1.5 load to its score. This means multiplying whatever it scored by 1.5.

Of course, your organization could use more questions than these five, or trade one for another. For example, it might be important to ask if the change requires vendor involvement and assistance, and to what extent. Or perhaps if it will affect any one of your more critical business applications.

These five questions will help your organization to apply an appropriate level of governance to any requested change – the higher the score, the more that is needed. They will also provide it with a rationale for follow up questions, and for requiring additional checks and balances, approvals, and so on for any change request.

 

IT Maturity quiz


Posted by Joe the IT Guy

Joe the IT Guy

Native New Yorker. Loves everything IT-related (and hugs). Passionate blogger and Twitter addict. Oh...and resident IT Guy at SysAid Technologies (almost forgot the day job!).