Pragmatic Risk Management

Friday, May 14, 2010

The Importance of Technology

You can drive a nail into a tree with your hand, if you are committed to striking it constantly for a few decades. Likewise, you can do risk-management without a single technological tool at your disposal. But just like a nail goes in more easily with a hammer, so too do good expert tools assist in making risk-management relatively painless. Having said that, risk-management isn’t about technology, and the primary stress is always on process. All the tools do is increase the efficiency of the processes.

While our research into “safety” drove our understanding of the current models, and their serious flaws, those understandings drove our development of tools to support risk-management approaches. What is interesting is how often the incremental development of the tools drove our understanding of the opportunities presented to do risk-management well. In most experiences the creation of a tool is to address a purpose, and so too with the primary tool we created to manage risk, but as it developed it exposed other avenues to consolidate the theories we developed.

Some of the key considerations that our research raised and our develop efforts reflected were:

A need to simplify the actual mechanics of the data recording process, since we knew the data gathering and recording processes were always a challenge;
A need to spread the effort requirements to engage more people in the development of risk aversion, by having the tools assist them in communicating;
A need to monitor both data quantity and data quality, to ensure the reporting capability reported against clean data, or at least had the chance to explain the data faults;
A need to enforce value returns by focusing away from traditional distractions toward cost-provable processes that can be measured concretely;
A need to impose an integral relationship model that would provide opportunities for deep analysis without subjective effort, and allow for engagement of the cyclic improvement model; and
A need to get the right information (data) to the right people.

A few of the things our development revealed as it proceeded were:

The actual practical nature of risk factor linkage across inventory, activity, and reactive entities, leading to the theory we developed about Risk Factor Linkage and Analysis; and
The degree of expectation we could have of dependent systems to provide good inputs, revealing methods to overcome the information technology barriers in large organisations.

Ultimately, without good tools, the cost of doing risk-management will always exceed the benefits, because the process will become clumsy, the returns will be unreliable, and your focus will end up in the wrong domain. With good tools, there is nothing to prevent risk-management becoming a profit-driver.

Thursday, May 13, 2010

Due Diligence

The apparent inability to change fatality and injury rates in any real way, combined with the increasing trend toward criminal prosecution post-incident, has led to a focus on due diligence. The problem with this focus is not the focus itself – we should all be dutifully diligent – but on how to achieve it. Here again, everything we have learned in more than a decade has shown us that traditional safety fumbles this entirely.

In a traditional safety program the problem with proving due diligence is twofold: first, the proof simply doesn’t exist, because traditional safety has an abysmal lack of provable value, almost no precision focus, and is so highly reactive it has to draw any proof of diligence from a sea of assumptions; and, second, relying on luck isn’t a sign of diligence of any kind, but rather a sign of total ignorance.

What we see increasingly is that in situations where due diligence becomes a question, usually about the time a prosecution begins, the fumbling begins to generate the proof. While not criminally driven, this is essentially a fraud, because what it entails is creating something concrete from something ephemeral. If you are dutifully diligent, you can tell me today, with fair accuracy, exactly how many of your employees are competent to do their respective job tasks, and you can show it to be true by way of a provable process and certification. This requires several conditions, the first being that such a comprehensive list exists, the second being that it is up to date (or can be shown to have been updated at some specific date), the third is that it is right (certificates match claims), and the fourth is that you can generate it reliably. Any one of those conditions failing means you are generating the proof by way of magic, or, at the very least, bending it into shape after the fact. Even assuming your end report on the matter is honest, the problem is that you cannot prove to have been diligent prior to the event that required such a proof, because you can’t prove you knew the state of your organisation before that event.

Real due diligence is about consistency, accuracy, and explicability. In a court environment the test of proof is going to demand it, and ignorance is no defence against a lack of diligent effort.

What we know is that risk-management itself isn’t any better at delivering proof of due diligence than any other method. What counts in this regard are the supporting tools, which is where the development side of our research and development efforts focused. It was apparent from the outset that data had to be capable of proving diligence, and even then it was doubly clear that the best tools would face the same two problems: data quality concerns, and data management concerns. Put in basic terms, the fact is that even with the best tools, low quality (bad) data and poorly maintained (unreliable) data kill the proof of diligence.

The difference between a risk-management model and its associated tools, and traditional safety, is that risk-management at least makes due diligence possible. By focusing efforts, and by having the tools maintain relationships once they are defined, we ensure consistency. By ensuring consistency and guiding the recording of data, we ensure accuracy (so far as is mechanically possible) as long as people follow the defined processes. And by having consistent and accurate data, and by wrapping the tools in processes that are clear and repeatable in terms of results from inputs, we can explain that data. In essence, risk-management processes and tools, combined, impart the three key gauges of proof of due diligence.

If you risk-manage your company, you use tools that make it easy, and if you execute that process wilfully your outcomes are naturally both safety, a side-effect of good operational risk-management process, and proof of due diligence – because you are being dutifully diligence in your process.

What is a depressing reality in our experience thus far is that the attitude remains, “serious accidents won’t happen to us,” and by the time that fallacy is shorn away, it is far too late to reverse engineer an enterprise to be naturally diligent.

Wednesday, May 12, 2010

Rolling the Dice

Two incidents from just before Christmas in 2009 illustrate the fact that companies are rolling the dice daily, betting worker lives against some fantastically ill-defined maximisation of returns.

In the first case, Canada Post demanded employees take down a set of Christmas lights from the top of a file cabinet, stating it was a safety issue. The associated press release stated safety of employees was of paramount importance, while the union rightly observed management needed to become more concerned about high critical hazards. While not obviously wagering lives here, the fact is they have, because they directed resources (including a chest-thumping press release!) toward this bizarre event. While argument is sound that Christmas lights present a possible hazardous condition, every penny spent on this nonsense is one less spent managing risks that will eventually kill someone. This misdirection of effort is an egregious affront against the very idea of management, and shows that Canada Post has no understanding of resource application for value return. They will have spent a thousand dollars on this, at least, draining valuable budget that could be applied effectively. That they congratulated themselves in a press release only further shows the low or absent management quality. A good manager would not ever allow such an issue to be treated with such false bravado.

In the second case, four workers in Ontario, Canada, fell to their deaths on Christmas Eve from a faulty swing stage. While it will be months before an inquest determines details, what was apparent early on was that the employees were improperly trained, the swing stage was improperly maintained, and basic safety equipment was not being used. One has to wonder what the company responsible for this travesty was hoping to achieve? Were they really that utterly ignorant of the risks to their workers, or were they convinced that the risks were acceptable? If the former they are criminally negligent as well as incompetent, and if the latter they are simply criminal. In this case they killed four people, wilfully, by way of ignorance, even if in the end it turns out the safety equipment was ignored by the workers. For some reason those workers had a cultural bias toward getting a job done, rather than doing it safely. The saddest aspect of this is that in killing four people, they devastated four families, which represents social costs that go beyond any compensatory penalties they will ever face. This type of incident is why criminal prosecution is possible in workplace injury and death cases, because nothing short of hard time would ever send a message about this kind of disregard for human life.

The lack of significant decreases in fatalities over the decades is indicative of how many companies are rolling the dice rather than stacking the odds in their favour. A large part of this is because safety, being treated as an activity, simply doesn’t return on the investment. You cannot prevent or suppress the impact of accidents when your entire model for doing so is focused on counting irrelevancies that have no provable relationship to the underlying causes of accidents. Theory states that the more proactive measures one does (meetings, inspections, etc.), then the larger the drop in the accident rate. Data, of course, shows no correlation, since there isn’t a connection between counting and changing conditions that increase risk:

Our research has shown doing more of the same inapt counting actually erodes worker trust in the safety program and its purpose, and consequently they begin to view all initiatives connected to “safety” as more of the same drivel. This, in turn, increases risks dramatically, because positive initiatives end up buried in noise. And when focus is lost by the workers, the systemic failures become commonplace. This “safety exhaustion” exists at so many levels in most organizations it is a large contributor to the decline in reasonable investment in actual practical operational risk-management.

What may be sadder than the static rates themselves is that there is no real mystery why these rates remain unchanged. It all comes down to failed safety programs, failing because of a combination of flawed assumptions about safety and flawed processes. The key issue is that safety is seen as a process rather than an outcome, meaning people are trying to manage the end-result of the process, rather than focusing on the process.

Amongst the flawed assumptions about safety is the extension of the mistaken focus on outcomes to provide metrics. Total Recordable Incident Rate (TRIR) exemplifies the problem, because it focuses on outcomes of failures as a measure of success. To put it in painful perspective, we have developed an entire industry around a statistic that rewards us for failing less, without ever asking why we are failing. It is similar to grading on a bell curve, and generating statements like, “Company A is the safest in the world this instant because they have only killed five people this year!” Shockingly, that kind of ridiculous metric is exactly what we use.

Another of the assumption failures is a little harder to come at in traditional safety programs because they simply have no comparative advantage to risk-management, because you cannot compare luck to management. Activity counting in traditional safety relies upon irrational aggregation without any relationship defined between the activities and the functional operational control measures. So, in essence, we reward ourselves for doing thirty extra inspections in a given quarter, without any way to answer the validity of the inspections. This lack of relationship between the underlying risks and their controls, and the activities, means we will never have a cause-effect clarity in traditional safety. In risk-management, though, the entire structure of the process is about defining and managing the relationships in a way that provides insight. In a risk-management model we still do inspections, but our inspections might point at an asset, which points at inherent risks, which point to known controls. So, every inspection (asset or generic) shows a chain of insight, and makes it possible to analyze the control measures. The assumption of a relationship is not the clarification of a relationship, and as long as traditional models focus on outcomes they avoid relationship definitions and accountability.

Extending from the false assumption that counts somehow imply management, is that reactive, traditional safety somehow implies management. Most traditional investigations, like their activity counterparts, have absolutely no control failure trace. And even when Root Cause Analysis is done, the problem is that there is still no direct relationship defined between those efforts and what must be managed to avoid and suppress risk. I can say that the direct root cause of an accident was inattention of the employee, but I cannot manage such a condition in and of itself; what I can manage, being the controls that might impart more employee focus, are not traced by the traditional safety reliance on form-filling. Risk-management, being more orange than apple, has an integral ability to do just that, while maintaining all the features of traditional investigation models.

Assumptions are dangerous because they fail the instant any aspect of the assumption is even moderately flawed. But even pretending that the assumptions are valid, and this pretence is enormous, the flawed process destroys the integrity of traditional safety programs. The focus on outcome, even to the point of using it to measure the efficacy of the process itself, creates a scenario where participants begin to behave against the interest of safety and in the interest of deflection. This leads to poor implementation at every juncture, because inability to prove value proposition one way or another makes it acceptable to create paper, regardless of its meaning; redirect focus in knee-jerk fashion to high visibility “controls” that are often ineffective, and given focus because they appear easy to achieve; generally ignore the quality of primary control processes, which are harder to do and have a longer value delivery; and to entirely ignore the directive management control mechanism.

We have not yet encountered a company that can actually prove whether high priority corrective actions are being done. Even worse, there are no corrective actions for high-consequence events, because there is a failure to recognize the consequences as focused on the outcome.

When you try to manage an outcome rather than a process, your efforts are entirely wasted, since an effect is not a cause. And contrary to what some might wish, rolling the dice is not a process, it is a risk. Choosing risk that is undefined, uncontrolled, and ultimately unacceptable is not management; management is about using defined process to ensure outcomes. The only thing rolling the dice has in common with management is that it ensures an outcome, which is the eventual failure conditions that will kill workers. All the lip service to the idea that there is no price on a human life flies out the window alongside the improperly harnesses employee when an enterprise approaches safety as something to be done, rather than something to be achieved.

Tuesday, May 11, 2010

The Immense Cost of Reactionary Behaviours

One of the realities we came across in our research and development phase was a mercenary reality, which was that we recognised cost was a major barrier to change.

One of the clearest misconceptions we run into is the idea that the cost of risk-management is higher than staying the course. Consequently, risk-management becomes an “expensive option,” despite the fact that in more than a decade we have yet to have a single prospective client company show us a valid cost assessment of their current approaches. In most cases the best that can be done is to add the cost of training, the cost of insurance premiums, and the cost of known losses; and even then none of the three numbers can reliably be gathered by most companies. To put it bluntly, no one seems capable of defining the cost of current safety programs, and yet the cost factor is cited as an excuse for not making changes.

Part of the problem, of course, is that “safety” has been a moving target for decades, always introducing the same basic luck-based ideas with new terminology, always failing to impact the bottom line positively, and never really doing more than obscuring accident rates by happenstance. All this time, all this new investment, has made even the best management shy of investing anything concrete in making changes, since they distrust the changes of the past. Any method is viewed as another faddish distraction from the basic fact safety seems to be a crap shoot, and this attitude is extant often without even analysing the vastly different methodology of risk-management, or assessing the cost-specifics it can define.

One would think that even the basic fact that risk-management can be measured for cost would appeal, but there we often find another barrier: some management groups would prefer not to know how much “safety” is costing them beyond the direct, unavoidable measures. They know to realise the actual cost of the current reactionary model would cripple them with shareholders, who would be stunned to find out that all those resources invested accomplish nothing provable in practical terms. Indeed our research demonstrates no correlation between any traditional safety activity and accident rates, which begs the question of why are companies spending money if it is completely ineffective?

A pretence we hear a lot from companies is that their employees matter, usually twinned with the grandest lie of all, which is that you cannot put a price on human lives. While admirable ideals, the realities are provably different. The simplest calculation to determine the value of a human life is to add the cost of all “safety” in an enterprise, then divide it by the number of times a person has died. That will show you exactly what a life costs, since you will have poured that much money into outcomes that led directly to death. Of course, the fallacy in that calculation is that it really should be the total cost divided by the number of times you could have killed someone, since the purity of luck is the only reason you haven’t.

Not making changes because of cost ignores two basic facts we have discovered:

· You cannot proclaim the cost of productive change to be beyond the acceptable range if you can’t honestly define the cost of doing nothing; and

· You cannot assign a cost calculation to any system that relies on luck to avoid disaster.

The immense cost of reactionary behaviours is obvious post-incident, but within months of those expensive incidents the momentum for change is always lost because any investment is directed to avoiding the same scenario, without ever actually understanding the underlying failures. By the time the inquests determine cause, they have generally obscured the practical realities by drowning them in politics. And even when they are pristine, those recommendations come far too late to make adjustments.

Risk-management solves the timeline problems, allows fine cost awareness, and divests the luck-based approach entirely when done well. Cost is no reason to resist the benefits, because unlike what is extant in the industry today, risk-management can be managed for cost, and can also prove its value on a per-cost basis, which makes it an actual management process rather than a reactionary one, where costs simply cannot ever be controlled.

Monday, May 10, 2010

Management Versus Reactionary Behaviour

To be blunt, if your safety program is “managing crisis” it is neither managing nor imparting any value. One of the key discoveries we embedded in our solution-set relates to the recognition that you react to crisis situations, and you manage in order to avoid crisis. If you are managing, you do not experience crisis, even if you manage the outcomes of a negative incident.

Management is about improving decisions over time; thereby reducing time spent managing small issues so that larger targets can be focused upon. Whenever a company has reams of paper related to safety, there is an instant proof in those papers that risk-management isn’t being done or else there would be no static cache of paper choking actual work time.

When you manage training, for example, what you are doing is imposing controls to avoid or suppress risk, creating productive opportunities, or doing both simultaneously. Mentoring exemplifies a value proposition for training controls that is often lost when monitoring is substituted form mentoring. When Fred the welder spends half his day filling out forms to say he stood by welder apprentice Sally, where was the value? Even if Fred is an excellent mentor, especially so, is it not likely that having him spend 90% of his time doing the mentoring preferable to a fifty-fifty split between that and filling out paperwork?

When you manage productive timelines, what you actually do is set those targets, recognise them, and achieve them by proper process of operations. You don’t panic and turn off all safety guards to double speed, unless you haven’t been communicating well, and have no actual management process. Does it not make more sense to communicate effectively and manage rather than react in chaos?

When supervisors are overwhelmed by paper are they managing or shuffling? Over time good supervisors build trusted workers, with higher reliable skill sets, because they spend an inordinate amount of time present. They have the time to stand with an employee and deal with them behaviourally before stress flares, because they are not pushing shreds of paper into some black box. They have the time to manage rather than react.

In traditional safety though we almost never see anything managed. We see crisis reactions, panicky attempts to bury stupid decisions, and repeating cycles of destruction. We see it because you cannot manage an outcome, only a process, and the idea safety is itself something more than an outcome is ingrained in traditional programs.

Reaction leads to poor feedback, worse communication, and a lack of analysis – a repeating cycle of risk encounters.

Management engenders feedback, through immediate contact); good communication, since the opportunity is two-way; and provides a contextual basis for analysis, since all discrete data will be objectively related (assuming the correct tools).

The most common barrier to introducing risk-management into a company is always lack of management. It has sometimes been so severe as to beg whether any companies even understand the idea of operational management, let alone risk-management.

Sunday, May 09, 2010

Death of a Thousand Paperclips

When we talk to clients about the almost universal problems they have gathering basic data (for any purpose), we sometimes refer to this as the “death of a thousand paperclips” effect. Put succinctly, it refers to a condition where an enterprise spends its time generating enormous reams of paper (whether on files on a machine or printed) that cannot be associated. Very often they have all the necessary data, but just have none of it consolidated in any form that can be accessed, so the majority of their effort can be spent accessing data of questionable quality, which is also difficult to relate contextually.

Part of the problem is that right now traditional safety isn’t participating in management; it is counting activities and generating paper. This is true even to the point where training generates certificates without ever having a way to indicate the value of those papers. Into this mix we have seen numerous additional paper efforts, all of them predicated on some belief that all the past bad paper somehow was itself bad, rather than that it became bad because it was unmanaged. So, we see client companies with massive competency documents describing what a welder should know, who still cannot reliably tell us what Fred Smith actually does. This disconnect is because nothing is related in any way that is reliable; it shows what one used to call a lack of bottom. In plain terms, without a foundation process no amount of tweaking ever produces related value, only larger stacks of irrelevancies.

Form-filling software is state of the art in the traditional safety world. (And had we been less interested in actual management, we would have taken that path to glory.) The problem is form-filling generates forms, not value; and value is the measure by which all processes are actually judged. When you have a stack of forms ten miles high, you can neither find what you need quickly, nor discover relationships easily.

This is another point where the tooling behind a solution-set becomes part of the value proposition, and this is where the payoff comes in the models we developed, where risk factors and controls suddenly link across a broad array of profiling, active and reactive systems to produce opportunities to link and analyse data, address issues of data quality, and generate reams of paper that communicate rather than frustrate.

When we ask a company which has suffered a workplace fatality to be objective, given the current state of traditional safety they will begin to count things. They have to, since they have no other choice. They will fill out forms, counting the fields they can complete, producing an inscrutable but perfect form. They will then attempt to subjectively qualify the fatality, because really they have nothing to draw upon to test any hypothesis. They cannot, for example, really analyse this fatality in context of fifty similar near misses to identify what control failed and how it failed. Nor can they analyse the employees training instantly, relating the risk factors that they encountered that killed them against the ones the training they had protected against. Nor can they analyse what training the occupation they had should have had versus what they did have. All that can happen in traditional forms is during an emotionally distressful instant checkboxes can be checked, counted, and filed.

Real value is lost when outcomes are believed to be systems, because when the actual underlying systems fail the outcomes is shattered. In essence, the focus on outcome obscures the cause in favour of a focus on the effect. No one can be objective when the outcome becomes the exclusive focus.

Our systemic approach though isn’t about safety at all, but about risk-management. Even in the worst scenarios it is easy to be objective (generating value) when your entire process is able to answer questions that matter. If someone is dead, that won’t change; but the next person who is killed could be saved if the system can answer exactly what controls failed to allow the death, exposing faulty controls we can universally rectify, or an absence of control, or unknown risks. Even in the purely reactive conditions imposed by an accident, a risk-management model is focused not on the fatality, but on the mechanics of how to prevent the next probable fatality. It is less interested in blame than focused on avoiding repetitive incidents. Our system can tell you what the person was doing, whether they should have been doing it, what risk they encountered, what the expected severity of that risk was, what specific controls attached to that risk failed, and even go so far as to tell you whether the dead worker should have been doing the job at all given their training controls.

Death of a thousand paperclips kills the ability of traditional safety to impart a defence against disaster. No matter how you treat an accident under the traditional approaches, you have no objective weight to any outcomes, because the circumstances of the outcome provide too much focus potential. Subjective analysis is, simply, not real analysis.

Risk-management tooling is the critical difference between effective risk-management and is where our research and development effort shows real value. It is never good enough to pay lip service to an idea that can be qualified, quantified, and cyclically improved.

Saturday, May 08, 2010

The Paradigm Shifts Toward Risk-Management

If we were to cite a single overwhelming challenge in becoming a commercial powerhouse, it would end up being expressed by something along the lines that what we have in terms of solution-set requires a paradigm shift in thinking before any paradigm shift in operational actions can come about.

Basically, we are in the unenviable position that we are too small to get the message out to a broad enough audience in a short enough time to generate the momentum that comes from a sudden grasp of the opportunities found in risk-management. Perhaps worse, we are trapped in a cycle where momentum is regularly drained by having to redirect limited educational resources toward operational imperatives. When starting a process with a new client has to be deferred to help an existing client find some way to aggregate data from their clumsy information systems, the limitation on our internal resources shows.

If we were a Google-size company, the reality would be different, because scale actually does drive adoption – we would be able to educate through multiple points of contact, on an immediate broad scale, and draw on the necessary commercialisation resources to create buy-in prior to deployment, rather than running these simultaneously and inefficiently. Infrastructure is key to mass adoption and capitalising on a massive untapped revenue stream.

The related problem with this paradigm shifting is that unlike many such shifts, this is a discrete two-stage shift: not only do we need to shift the thinking of the client, but we need to shift their understanding of technical tools. We need them to recognise the serious problems of existing information sources, commit to changing those to better meet their needs, and at the same time commit to using the new tools correctly.

Somewhere in our development we ran into a real conundrum related to dependent systems. We found, for example, that almost no one with a Human Resources (HR) system manages to keep it updated since it tends to be overly complex, and those who do cannot ever get data back from it in a reasonable timeframe. This meant that for our tools to work right, we needed to frequently ask them about why their current tools were hampering them. Try shifting tool use to a new and more powerful level in the face of statements like, “We can’t get a list of employees from our HR system.” Invariably the statement is true just because it actually means, “We don’t know how and no one will show us without extra costs.”

We can be as smart as we wish, we discovered, and even develop the three-step plan toward making the shift to risk-management, but ultimately size hampers adoption. It is very easy to say:

1. View your “safety” as a by-product of normal operational management;

2. View your information systems as a service-points; and

3. Rectify problems with any uncooperative systems before proceeding with the new approach.

Now, try applying those three simple steps when after step one, the overwhelming number of information systems they already depend upon can’t even reliably generate the same list twice. When our tools are deployed, and we hear that our miniscule nod to Human Resources profiling exceeds the value of the enormous system they spent millions on (since from our end they get out what they put in and more), there are obvious barriers to managing step three forward. Abandoning the sixteen million dollar HR system that hasn’t worked right in years is seldom an option, even if it is obviously a barrier to efficiency unrelated to anything to do with risk-management.

Pushing this back in the cycle exposes our main challenge, of course, and HR provides us the perfect example. We have over one hundred various requests to enhance and expand our HR profile model, almost all beginning with statements along the line, “if it just did this we wouldn’t need our old HR system at all.” Try being a gnat in terms of resource scale (or something smaller than a gnat), and hearing that, knowing full well there is no way you can expand fast enough to meet those requests without crippling your ability to deliver what you already have.

To shift someone to risk-management, away from traditional safety, isn’t so much costly as it is complex by virtue of dependencies. Often the first request we make stymies adoption for months, which is, “give us a basic employee list.” The second one, “tell us what their occupations are” invariably slams the process into reverse for a while when clients discover they simply can’t fulfil that demand from any system they have online.

Of course, if we had the scale to educate ahead of the deployment curve, expand to encompass the minor variations that prevent just replacing antiquated systems wholesale, and the reach to attack multiple client targets simultaneously this two-shift process would become a single longer curve.