Make Risk Management Real-Time
Risk management processes really slow us down. The problem is, we need them.
Or do we?
Facebook famously has the motto, “Move fast and break things”. That shows what they think of their users – that it doesn’t matter if things that we rely on break.
But what if you are not a social media site, but an airplane manufacturer? Or a healthcare provider? Or a bank? Then you cannot afford to break things.
My 2005 book High-Assurance Design was based on my experience as CTO of Digital Focus*, a company that built complex but highly trustworthy things very fast (typically in three months), for companies such as Capital One Bank, McKesson Pharma, FedEx, and many others.
The problem with risk management processes is that they have historically been implemented as add-on “controls”. These usually take the form of additional checks, or “gates” during which people “attest” that they feel that certain risks have been addressed.
These separate steps can greatly slow things down, because they usually are done in either a queued manner or a scheduled manner. In the queue approach, approval requests circulate – we wait until everyone has signed off. In the scheduled approach, everyone attends a meeting during which they attest.
These approaches are not very compatible with today’s need to go-as-fast-as-you-can.
Fortunately, there is an alternative. Amazon uses it. So do others who have led the way in going really fast.
I call the approach “real time risk management”, because it doesn’t wait: it operates in “real time”, because in this approach, the risk controls are embedded in the work itself.
The Example of “Continuous Delivery”
In the realm of software development, there is an approach known as continuous delivery, which is a form of real time risk management, even though it is usually not explained that way.
In a traditional approach to software development, software is first created, and then it is tested, and when testing finishes, there is usually a “configuration control board” that meets during which people attest that the software has been tested and meets its requirements. It is deployed for users to use.
Traditional approach for managing risks.
In the continuous delivery approach, there is no attestation meeting, and there is no testing phase. Instead, tests are automated, and they are run frequently throughout development. When all of the tests pass, the software is considered to be deployable for users to use.
Continuous delivery: develop tests throughout the workflow.
For that to work, all of the risks must be covered by the tests.
That last sentence is really important. It is what makes continuous delivery a legitimate risk management approach. Unfortunately, most people who use continuous delivery do not understand this, and they leave out assessment of risk coverage, which is the part in red below.
Real-time risk management: develop tests throughout the workflow, and review coverage of tests as they are developed.
Instead, they manage risk coverage – in this case that takes the form of test coverage – in an ad hoc way. In the most common case, they substitute for test coverage something called “code coverage”, which is a very low level form of assessment and is wholly inappropriate for assessing risk of a multi-component system.
Using code coverage as a metric for assessing risk or quality is kind of like assessing whether you built an automobile correctly by measuring how tight all the screws are – never mind if parts are put on backwards or have holes in them, or some things are actually broken, or all the screws are tight but some parts are actually missing!
No wonder software is so unreliable today: the vast majority of software teams are not measuring how thorough their tests are, except at the most granular level. Above that level, it is entirely ad hoc.
That’s a great dysfunction. I have written many articles about this (e.g. here and here). And I once took the time to write up a case study of an instance of when I helped people to do this properly (here).
Real-Time Risk Management
This approach is generalizable to any complex endeavor, as what I call real time risk management. The basic idea is simple and has two facets to it:
Replace all after-the-fact risk controls or checks with checks that occur in the course of the work.
Devise a way to assess the risk coverage of the checks that are done in the course of the work.
If you have confidence in the checks, that means that when all the checks pass, there is no need for a final test phase or attestation: you are done!
The challenge is in how to engineer facets 1 and 2 above. If you can do that, then your risks are managed in real time, in the course of the work, rather than as separate steps or phases. It means that when your product development team says they are done, you can believe them.
Streamline™ is designed to help you to do this.
Streamline’s primary planning view is the Roadmap view. It graphically depicts the major things you are trying to create. We call them capabilities. In the example we often use, the project is about designing a robot product that can be used for search-and-rescue. The simplistic roadmap for that consists of five capabilities: Walk, Hop, Run, Jump, and Skip. The Run capability depends on the Walk capability (see the gold arrow between them), in the sense that Run will build on the Walk capability. Here’s a closeup view of the capabilities:
There is risk involved in each of these. One can right-click on any capability and select Manage Risk. A panel then opens on the right-hand side of the view (see figure below).
In this panel one can define risk mitigation strategies. One can list risks, and write short notes about how the risks will be mitigated.
The panel provides a pull-down menu for different types of risk, in order to prompt the user to consider those types. It has a similar pull-down for mitigation strategies. The nature of the options in the latter pull-down is that they are all real-time oriented. We’ll explain that momentarily.
This side panel is for making quick notes about risk mitigation strategies. For a more thorough approach, one can click on the Manage Risks button in the Roadmap view. A separate tab then opens, displaying a fuller view of the risks for the roadmap (see figure below).
In this view one can add more detail. These details are important, because they include columns for,
Where performed.
How implemented.
Sufficiency criteria.
This is where one has the opportunity to define approaches that are “real time”. For example, let’s look at the table for the Can Run capability. There are four rows in the table. The first one addresses an Integration risk. The strategy is to “Cover all Run functions that rely on Walk”.
We can then fill in the three columns, Where performed, How implemented, and Sufficiency criteria:
Where performed: In any of the test lab rigs.
How implemented: By the Walk team, such that any team can run the tests, using their own robot prototype. Tests are created concurrent with each robot feature that is developed. As new tests are added, Judy reassesses (see sufficiency criteria).
Sufficiency criteria: Judy (from the Run team) assesses whether the tests are sufficient.
Notice that none of these are after-the-fact checks. In fact, they are written in a way that requires that tests are developed continuously and made available right away, so that the Run team can have access to them. This means that the Run team will never be waiting for the Walk team. There is no phase separating the two teams. The Run team can start developing its Run capability right away, and run whatever Walk tests are available, as soon as they are available. When the Walk team finishes, the Run team will have been able to run each of the Walk tests, and can literally finish at the same time, or almost at the same time.
The sufficiency criteria are written such that a member of the Run team assesses if the tests are sufficient, and does that as tests are made available. That is, in effect, an independent assessment with respect to the Walk team. It means that the Run team can make sure that there are enough tests to ensure that Run’s dependencies on Walk are well covered by the tests.
This approach is very different from the traditional approach, which would be to first complete Walk, run it through a testing phase, and eventually make the Walk capability available so that the Run team can start.
Which is a faster approach? I think it is obvious.
Streamline provides the planning framework, with valuable prompts, to support and encourage the real-time approach that I have explained. There is no other tool that does.
Why This Is Powerful
This is how to move fast, but maintain a high level of assurance. It is how to move fast and not break things. It is how to avoid teams waiting for each other. It is how teams can work concurrently on overlapping things, but ensure that they are continuously aligning their work.
In the school of thought known as Lean Manufacturing, there is a concept of “wastes”, and waiting is one of them. Streamline helps you to avoid the waste of waiting. Waiting is a killer. Approval queues, review meetings, and waiting for others to finish are wasteful. Replacing them with real time approaches for managing risk is how to eliminate the waste of waiting. It is how one can enable work to overlap – work that otherwise would have been done in sequence.
Don’t wait.
See how Streamline can help your teams to avoid waiting.
* Digital Focus, a profitable $12M/year company, was acquired in 2006 and merged into Command Information, a deeply funded IP6 startup that later failed.