Pragmatic Risk Management: 2010

Friday, September 03, 2010

What is the Value of the Near Miss?

The “near miss” is a terribly misunderstood event. This blog is about the near miss as seen from my view, which is not only purely practical, but is intentionally divesting itself of most of the obscuring terminology used in “safety systems.” I abhor when people disguise desired outcomes as systems to escape the scrutiny of logic.

In my universe a near miss is probably better called a “no loss event,” but the terms are interchangeable.The near miss has enormous value as a tool to recognise control failures, modify controls against a given risk, and it is a practical opportunity rather than a threat.

An example of a near miss is easy enough to conjure from the real world. Calgary (Alberta, Canada) is a windy city, and high-rise buildings are bombarded by gusts that have an array of damage potential. Just this year, in one case, a falling piece of unsecured building materials killed a child. Other incidents of falling objects abound, some from that same worksite. Eschewing focus on any of those events specifically, let us consider just the idea of a falling object. Let us identify a risk, cite a control or two, and then explore the value of the near miss opportunity.

The risk, of course, we can just generically phrase as the “risk of falling objects.” This generic risk covers everything from wrenches to hunks of plywood. One applicable control we will call “secure all objects,” and another we will call “ensure safety netting below the immediate work site.” Again, we have chosen generic and simple controls, since our point is illustration.

Outcomes can range, so let us generalise those on a spectrum (see the previous blog post and its link to understand where these spectrum points come from):

Fatality or Catastrophic Loss;
Major Injury or Major Property Loss;
Minor Injury or Minor Property Loss; and
Near Miss or Non-Loss Event.

Let’s take a moment to shorten the terms just for ease of reference: Fatality; Major Loss; Minor Loss; and Near Miss.

Now, let us unwind a scenario to describe each, while recognising that though these are generic and imagined each has actually occurred in Calgary in 2010:

Fatality: A tool falls off a high-rise site, striking a child and killing them.
Major Loss: A tool falls off a high-rise site, and strikes a vehicle, totalling it due to striking it specifically so that it is not repairable.
Minor Loss: A tool falls off a high-rise site, and strikes concrete, destroying the tool.
Near Miss: A tool falls off a high-rise site, and lands in ploughed ground not presently developed, which results in not even the tool being destroyed since it landed in mud.

In every case here the “risk of falling objects” was encountered. Regardless of the cause of the encounter, the encounter is fundamentally the same. In every case an object fell, yet the effect of that risk encounter is, in each case, of different severity.

Details are irrelevant to the analysis of the controls that failed (in our example, at least). Either or both of our cited controls failed to some degree. For arguments sake let us say that there was a failure to secure all objects, and while netting was in place it did not contain the falling object. Via our investigation of the incidents, we determine for our purposes here that:

None of the four tools was secured with safety lines, and all were heavy rivet devices just to avoid obscuring the simplicity.
None of the netting in place was sufficiently preventative, and tore through due to the weight.

Now, while there is some edge of surrealism to the example, because we’re pretending the details were similar or identical, the illustrative value is clear: it is possible to have identical causes, identical control failures, and vastly different outcomes. The only variation in our scenarios presented is that in one case a human life was lost, in another an expensive asset, in the third the tool is destroyed, and in the final one nothing is lost. (Note that we’re pretending time was not lost, but in a near miss circumstance time will always be lost, meaning they cost us, which is why a non-loss event isn’t generally correctly named if you call it such. But our pretence is intact, to observe the basic fact of variant outcome.)

The value of a near miss is now obvious if you pretend the order of occurrence of these events is reversed from our list above: the near miss happened, before the other outcomes were seen. And therein lies the value of the near miss as an opportunity, because if you recognise it for what it is and analyse it properly, you will become aware that given the specific landing point, this near miss could have resulted in a fatal outcome. Extrapolating to that possibility, you can then focus on the nature of the control failures and harden those controls.

In our example, pretending the near miss came first, and pretending the safety professional isn’t just churning paper, there are two clear recommendations: secure the heavy tool by a separate tether; and double the strength of the netting (or add a secondary netting, etc.). The recognition of this near miss as an opportunity to harden controls is the difference between next weeks repeat having a more severe outcome, or a lesser one. In no way does the control improvement guarantee nothing can go wrong, but in every way it shows due diligence, productive focus, and risk management.

Near misses save lives, ultimately, by improving practical control measures.

The problem with this is that most people know it, but no one uses the opportunity to generate a competitive advantage. It is apparently easier to wait until someone dies, than it is to try to prevent the problem occurring. And the usual excuse? Because accidents happen. Just like the digging company that might repeatedly rely upon line identifiers who miss critical lines, people excuse control failures until the outcome severity forces recognition of the management failure.

One of the key problems getting a near miss to be recognised as an opportunity is that status quo “safety systems” rely upon statistical outputs and counts that can be negatively impacted by recognising too many opportunities. This negligence is an affront to the idea of operational risk management. Not respecting the near miss as the opportunity to harden controls is the quickest way to ensure that when the luck runs out, the cost is catastrophic.

The Risk Encounter Outcome Model

Today’s post is short, and to the point. In examining ways to visualise risk as a component of operational processes, I’ve formed a model that allows you to visually represent severity of outcome, relative risk exposure, and control coverage. While not for the faint of heart, the paper is posted online at our website here. I would précis it on the blog, but it’s one of those papers best just read in a quiet hour and mulled upon, and a précis won’t give much in the way of insight beyond the paper.

Tuesday, June 22, 2010

The “We’re Not As Bad As…” Syndrome

While chatting with my de facto boss today, we were discussing an often heard phrase that flies from the mouths of otherwise intelligent people, usually when they are preparing to explain why they don’t need to do risk management, or why their “safety programme” sufficiently protects them. To paraphrase, the rush of words come out something like, “We’re not as bad as…” and is followed by a list of companies much worse than their own. The rationale is, evidently, that as long as you can name someone who is worse than you, you are in no need of cyclic improvement of any kind. And thanks to BP, short of killing a half million people directly, I suppose deniers can enjoy complete indifference about their long-term risk management prospects.

What struck in my head after our chat was that this thought process is so incredibly common in life, where an astounding number of people are willing to maintain a crumbling state of being just because someone is observably worse than them. It strikes me that as the world digresses, I will eventually hear someone who is smoking a cigar say, “Well, I only have tongue cancer, but that’s okay because my Uncle Bill is coughing up his entire lung.” Or, perhaps more depressing, I really do some day expect to hear a pretender to the title “safety professional” actually say something like, “Well, we only killed one person last year; our biggest competitor killed three!” (After 10+ years doing this work, I have, actually, read several remarks that come frighteningly close to that, but never spoken directly with anyone who has had the gall to say it aloud.)

This idea that maintaining a static state is enviable is problematic from several perspectives, but ignoring all perceptive aspects of the problem it represents, applying basic logic tells it for the lie it is. Why? Well, simply because business is about progressive revenue enhancement, which requires growth, and growth is dynamic by nature. Hence, any business that is truly static cannot grow, and so operationally the idea of not managing change is an impossible one to attain – though, the world knows, folks will try. To grow a business, you must be operationally flexible, and to be operationally flexible requires dynamic change management – and that, eschews the idea of static state. Anyone who promotes status quo, then, in any operational domain really represents a pure liability.

Where the thoughts led me today was really more about how helpless people seem to be, and how irrational is the acceptance of loss that is not necessary. You see it in politics where compromise has replaced actual leadership, and in everyday life where we negotiate the least evil available rather than strive for better. In all these avenues of life, where risk management is a genuine reality, what we are really seeing is the price of an unhealthy misunderstanding about what risk is, and how one must manage it to achieve value. The simplest explanation for the frozen state of thinking is fear, but experience suggests that perhaps the real cause is not fear so much as laziness. We seem, as a collective, too lazy to challenge ourselves.

This hits home right now with us, at Pragmatic Solutions Ltd, because we are in an interesting position in terms of growth. We recognise we have a need for new expertise, and new resources in the business, but we have found it almost impossible to open that conversation effectively with partners who could actually enhance the business commercially. There are plenty of talking heads, many of them promoting themselves as experts in one domain or another, but the more they talk the more they tend to expose the depth of ignorance in their thinking. While that kind of statement can seem bitter, it isn’t bitterness but logic that implies the truth in it, because what we have found is that many of these promoters have a very static, patterned approach to their thinking that is not apt to recognise anything of value outside some narrow range. Specifically in the contacts we have made to try to jump-start some pursuit of a valid partner to take forward the ideas we espouse, what we run into is a lack of imagination. If the business prospect isn’t comparable to some pre-existing one, it is dismissed intellectually, and we get suggestions to reshape the product concept to be more like some existing product. While this would be a suitable statement if the product was like another, or immature, it becomes a complete barrier to communication given that our ideas are intentionally fresh. Exactly why would we mutate a new idea into the old form that has already failed miserably, just to commercialise it? If that was our intent, we would never have spent the enormous effort to develop, prove, mature and integrate the new thinking patterns. How do you communicate the power of an idea to people who are intent on erasing the value it represents to make it a metaphor for past failures?

These people are, of course, really representative of the same attitude this blog observed earlier; though, rather than “we’re not as bad as…” they are claiming some dynamic authority due to an adherence to status quo thinking: something like, “we are better than the following thinkers, because we project it to be so.” Yet underneath the same lack of imagination and the same odd adherence to old patterns of business approach and thinking are extant. It is no wonder that innovation is so nearly dead, given that the people who should be innovating are so busy trying to repackage newness in some form that buries the newness. This is of course, risk avoidance by way of dismissal; and it is a distressing thing to a risk manager to see that judgement of value has fallen on a tired cycle, rather than a cycle of improvement and advancement.

Research and Development is what creates progress, and yet commercially speaking, almost no entities exist where there is a true appreciation for the pursuit or new value. This, like the attitude that maintaining sameness will change outcomes, is a very disturbing thought process.

Sunday, May 30, 2010

Deploying a Pragmatic Solution

Developing a pragmatic solution to any problem takes time, and deploying one is challenging. The business challenges to being on the forefront of a market are often overwhelming, but for more than fourteen years, Pragmatic Solutions Ltd. has been travelling a path to a destination. That destination has changed over time, and the path has changed in course of time, because just as our research into risk-management showed us the requirement of functional dexterity in that domain, we long ago realised the same flexibility is a necessity to be the harbingers of a new paradigm.

Traditional Safety has spent decades wallowing in variants of the same approach to new problems, always achieving mediocre results through luck and generating spectacular failures by way of inevitability. You cannot treat symptoms to cure an illness, unless you consider death a cure. Traditional safety will never create safer workplaces, because it has never grasped the problems, and spends more time playing with paper than solving real problems.

While it took us ages to create what we have as a solution-set for generating safer outcomes, we are heartened that it often takes an intelligent observer only a short exposure to the fruits of our efforts to have their “Eureka!” moment. It is frequently painful to see the realisation that there is a better way, a dynamic process that is not indifferent to the well-being of people, and to see it combined with the realisation that to make the change requires a degree of commitment and capability that seems impossible. The number of times smart people have told us, outright, that their organisation doesn’t appear to have the chops to play the tune we wrote is shocking.

What we have to deploy is not another safety pig with a different shade of lipstick, and there is no magical transition from pig to princess promised. What we developed was a systematic approach to changing how operational risk-management is done, and to show how it should be done to integrate the scope of operations entirely, and to ensure that the outcomes sought are measured by something that can be quantified. We rejected the same-old-same-old because we knew that industry, business at large, needed to find a way to produce an outcome that wasn’t luck-based. Repeatable success, cyclically improved, with and ever-increasing productive knowledge base was the goal – the tools, and the methods, prove it possible.

We realised that safety is the outcome of imparting a culture of operational risk-management; and that rather than being something you can pay lip service to, it is an idea that must be lived. There is something honourable in forging ahead on new ground, even with the incredible challenges presented by the inertia in the domain of safety.

And here on the cusp of leaving our research and development phase behind, following the governing principles of risk-management, we recognise that there also comes a time when the core of a thing – our company – needs to reach out and surrender the lead to someone who can commercialise the opportunity. Who that is remains to be seen, but the search is on, because at the end of it all we still know the product-set is right, the problems we solve are real, and the value-proposition is undeniable.

Saturday, May 29, 2010

The Acceptance of Failure

One of the most discouraging realities of a decade and more of research is that it tends to confirm executive management has accepted failure in the safety realm, and seeks not to resolve problems, but to mask them since they seem intractable. The traditional safety purveyors have made it worse by promising change, then resorting to “tried and true” methods that have never delivered positive change, reinforcing the view that safety is a black hole for productivity, investment, and time. It is not a wonder that management rejects safety as a real practical priority when their direct experience has been so negative, and it is no surprise that the idea of a shift in thinking meets resistance, since for years the same-old-same-old has been represented as just that – a shift in thinking.

The problem with accepting failure where safety is concerned is that without change, the cost of doing business increases over time until its weight collapses the reason for doing business. It would be far better, and safer, to embrace a real shift in thinking, but the barriers to that are real, and the challenges when one does shift to the new opportunity are real.

Risk-management is not a standalone solution, a tool that just drops in and solves all the problems, because it depends on so many other systems functioning. It depends on being fed good data, in a timely way, and upon being able to have the range of expertise in an organisation act to increase its value. Even for companies who can access their own data effectively (though we have yet to encounter one), accessing employee expertise becomes an almost insurmountable barrier to adoption. Traditional safety practitioners are highly resistance to being exposed by the evidence the system will generate, managers are hesitant to trust another system when so many betray their interests, and workers are suspicious after so many years of being fed lies about something they understand – safety in the workplace.

To reject failure means to recognise that the claimed problems, are often the symptoms of the actual problems, and to treat them means to treat the problems that lay beneath those systems. It means to understand that operational risk-management requires cultural dexterity, commitment, and a desire to continuously develop new opportunities for productivity. There is no status quo condition for risk-management, and that dynamic of continuous change and continuous cyclic improvement can be daunting to many.

Ultimately, though, one has to wonder if the failure you know is actually safer than the pursuit of excellence. The answer to that question may be irrelevant, given the evidence of actions, which speak louder by far than any words. And yet with the trend toward prosecution along with the failure to act reasonably, it is clear that something must change, and that at some point smart executives will demand more than raw counts to base their perception of workplace safety. The question is not if this happens, since it will inevitably, but exactly how long a timeframe exists for those who cling to antiquated ideas about safety are pushed into the light. How many more bodies need to be piled at the gates of industry before they recognise that far too many of those bodies could have gone home, if only the risks that took them had been managed.

Thursday, May 27, 2010

Closing the Loop

The risk-management method is about loop closure as much as it is about cyclic improvement. One of our more peculiar discoveries over the years has been that the most common single point of failure in all systems, inside the safety domain and out, are described as “loop closure failures.” This means, in plain terms, that almost all of them fail not by bad intentions, or because of ignorance, but because of a lack of stamina. Closure in the context of systems is not a fantastical term, but is really a specific one that means all systems intended to provide themselves a feedback loop must close that loop, or they will fail.

In terms of traditional safety this is shown everywhere you look, but probably most notably by the fact the same cause-effect conditions cause the vast majority of accidents. It shows practitioners of traditional safety are not learning from their aggregate data, because they have no objective mechanisms to educate them, and because they have no aggregate potential across entirely disparate data sources. They cannot, for example, warn that a site is at greater risk because the people on that site have training deficits, because they often do not know, until after the fact, what the site purpose is, who is there, or what they are doing. This reactive stance is a choice, which extends from allowing chaos instead of governance, and failing to manage. Yet, this is no surprise, since traditional safety is concerned with raw counts rather than analysis. That some people can actually provide safety via the traditional model is a shocking testament to individual insight.

When risk-management is embraced, the closure of its feedback loop is where the maximum value is generated, because it can educate, inform, and provide the grist for the decision-making mill that renders better decisions. The challenge is that so few people have been trained to close the loop for any system, to actually follow-through, that the risk-management model can face a distinct and immediate challenge that has nothing to do with its features or scope, and everything to do with how unprepared people seem to be to manage.

Management is almost a lost art, because it has been packaged and those packages ignore that solid management is an analytical process that helps make decisions. Management is not housekeeping, though housekeeping requires management; any more than management is a decision, though it generates them. Management is about perceiving opportunity based upon inputs, about steering resources to achieve outputs, and about ensuring that the decisions create more output than the required input.

Part of the real problem with closing the loop for managers today is bad information, which is often triggered by a reliance on statistical calculations and other formulae. The often repeated idea that business management graduates are awful managers is precisely because too many of them end up in a management position and the only tools they know are to use those formulaic approaches. They apply them correctly, without recognising the inputs are skewed by bad communication, misconceptions, and so on. The rule that stated garbage in equals garbage out is true, and management by statistic ensures failure.

To succeed requires to know the inputs, know the path they take, and to know when the outputs appear they must be returned to the cycle. Closing the loop for risk-management returns value beyond the investment many times over.

Wednesday, May 26, 2010

Directives Matter

If metrics that lie kill people, and proof of due diligence is considered a requirement of modern business, directives are what save lives and prove diligence. And yet, our research taught us that perhaps traditional safety exists in such an operational vacuum as to not require even this basic communication process to function.

In traditional safety, directives are what are commonly called “corrective actions.” Corrective actions provide a fix for a problem where there is risk of loss. One would think that given the possible consequences, including serious harm and personal liability, this element of a safety program would garner serious attention. However, our research demonstrated – and continues to demonstrate – that corrective actions are often never identified as a part of an accident investigation; and when they are identified the corrective action is a mere observation or re-statement of the problem, not a call to any action, corrective or otherwise.

Because a corrective action is critical to fix problems, one generally hopes it meets a few minimum standards of communication. These are: it needs to actually describe the expected action; it needs to indicate who is responsible for completing the action; and it needs to indicate a due date for completion. To engage any level of acceptable proof of due diligence, it also needs to actually be confirmed as complete at some point, preferably in the same decade it is initiated. Also handy would be to understand the context for the corrective action, which is basically an answer to the question as to what triggered this action.

Sampling real data provides some unfortunate insight about how corrective actions are viewed:

Total Directives	Action Indicated	Accountable Person Indicated	Due Date Indicated	Completed	Confirmed
3,868	2,216	2,452	1,837	1,377	1,048
100%	57.3%	63.4%	47.5%	35.6%	27.1%

Based upon those numbers, drawn from actual client sources, it is clear how relevant traditional safety feels directives are. When only 57.3% of all “actions” don’t contain actions, or any text at all to describe their purpose; when merely 63.4% have been assigned to an accountable person; and when less than half have a due date indicated (47.5%), one has to question the purpose of the effort. Even when making the effort, when only 35.6% are ever indicated as completed, and a mere 27.1% are ever confirmed, where is the proof of due diligence?

These numbers prove that corrective actions are paperwork efforts more than real, and that evidently no follow-through is necessary. They also indicate how the communication loop is failing completely, with no accountability at all.

Whether you call them corrective actions or the more appropriate name of directives, it is fairly certain an executive who saw numbers this abysmal would have a few question about why they are even created, and a few serious concerns about whether detected problems are ever corrected at all.

Monday, May 24, 2010

Metrics That Lie

British Petroleum (BP) blew up the Texas City facility several years ago, registering billions in productive losses and eventually receiving a fine from OSHA amounting to 50 million dollars. In the five years after the incident, another three people died under their watch, and they were fined another 87 million dollars for their failure to fix original problems; and recently a jury awarded 100 million dollars to workers exposed to toxic chemicals.

There are a few possible conclusions you can draw from this, and none of them are particularly kind to BP, but regardless of those conclusions you can draw, there are a few you cannot propose. One of those conclusions that isn’t true is that every manager at BP is an evil sociopath. But if you accept that most managers don’t feel particularly pleased to have people die on their watch, why then did what led to Texas City not modify the behaviours effectively to avoid repetitive similar problems?

Part of problem is Senior Executives not getting the right information, which creates greater risk. Because performance metrics that they see focus on recordable events and loss rather than risk-management, the Board of Directors and Senior Executives frequently do not have the right information to provide effective oversight, resulting in unacceptable risk and significantly increased liability.

Typical reports to senior management are too superficial to allow meaningful involvement or critical thinking about what the reports imply. Crippled by this dearth of valid, approachable information, executives end up making choices that are, in retrospect, poor choices by any measure. Traditional safety programs that rely upon counts have no representation of implementation in their outputs, and so the executive relying on this is essentially incapable of making the decisions to enhance safe operations. Even when not faked, these counts obscure operational context to a degree where they are meaningless as analytical tools.

Metrics need to be transparent. Without the ability to see beneath raw counts, and gain a contextual understanding of what those numbers mean, risk is being misidentified, mistaken as to degree, and will mislead executives. The right information allows executive to assess risk and control measures, reducing liability and loss. Importantly, it allows them to speak directly to what matters, moving away from the superficial and meaningless attempts to demonstrate commitment to safety (the infamous safety moment that begins every meeting).

Actions speak louder than words, and the executive armed by a good metric set will always make better informed decisions. One wonders if the BP executives had those metrics prior to Texas City if the event would have been mitigated as to damages, or even avoided. Regardless of that, though, it is clear that if they had those metrics now, the problems wouldn’t be ongoing. Traditional safety approaches are why they don’t, and why they may not ever have the advantage of a clear understanding of their risks.

Metrics that lie kill people.

Sunday, May 23, 2010

Implementing Risk-Management

There are many reasons to implement risk-management, but there is only one sure way to guarantee that you implement risk-management well. Three keys exist to unlock the potential of risk-management in any enterprise, regardless of its scale or the nature of its business, and they are required to ensure the cultural shift that is entailed to maintain the benefits of such a paradigm shift.

Unlike traditional safety, which dictates to workers, risk-management is far more participatory. It requires employee engagement at a level that frightens many, despite it being proven to work and to be more cost-effective. This employee engagement takes place at a most basic level by treating employees as risk resources, respecting that they know many of the risks they face, and understanding that communicating their value will cause them to seek additional knowledge that is formative in creating better risk profiles and better control mechanisms. The employee who is engaged in this model invariably responds in the immediate term and with largely positive inputs, and that response is maintained as long as the relationship and communication is maintained. The biggest risk in this is not time-loss, as many fear, or cost increases, but that the employee inputs will be not be reflected in the actionable choices that are communicated as the process of transition occurs. Workers will rapidly divest interest if their input is seen as irrelevant, and the process of adoption is unresponsive to their concerns. Contrary to management fears this distributes too much control, the reality is that to do this well, to engage and maintain engagement, requires stronger management control, and that control is generally accepted more by the workers since the control is not dictatorial, but representative of their collective interests in going home at the end of every day. The higher the risk-profile of the employee, the stronger their engagement will be, since their awareness is enhanced by their perception of personal risk.

The engagement of employees is a culture-changing engagement, and it leads to participatory management by its nature. This terminology aside, what we really gain is that there becomes a clearer separation between practical management activities and productive management activities. As risk-management takes hold, employees will form risk-awareness teams, or control development teams, providing the raw material that managers can shape into productive policies, safe practices, and whatever other tools assist in communicating effectively. The manager role then becomes more focused on enhancing productive returns than on managing immediate concerns. This participatory model flows upward, creating a stronger hierarchy of control at the management level, easier communication, and a higher rate of acceptance of management decisions flowing downward. It will never ease dissatisfaction with decisions that have a negative impact, but it does dramatically reshape the way decisions are viewed, when, for example, the inputs that informed the decision were largely generated by the successive layers of operational employees. If a team of ten welders defines a need for a new control to prevent injuries, and that control is shaped and approved by them, there is a higher likelihood of immediate acceptance. More importantly, because of how those decisions are made, because the cultural shift has embedded the idea that change is acceptable and even desired, imposition of controls does not seem arbitrary – they can and will be changed to reflect the balanced needs of the workers that must implement. To maintain the participatory management process requires a cultural shift toward significant improvements in communication, with less focus on autocratic deployment of decisions and more focus on allowing dynamic communication. The higher the management skills sets, the more effective the productive returns, and the more freeing this aspect of risk-management will be. If the management sees the change in culture as positive, it will be maintained; if fiefdoms and competitive management are the normal, the risk-management model will fail because of competing and divergent interests.

Perhaps the most vital of the keys, since it makes the decision-model function effectively, is related to communication of metrics. Traditional models fail ludicrously because they provide faulty performance metrics by their nature, making it practically impossible to manage “safety.” Risk-management eschews the reliance on raw counts in favour of extension of analysis via linkages. If the metrics are communicated effectively, the decisions that come from the analysis of those metrics will be rational, will hold over the term of their value, and will be easily conveyed downward from the executive level. When managers understand the directives, and workers understand them, then the enterprise functions with a focused effort; when communication falters, and dictums replace decisions, the resistance to implementation renders the process for risk-management defective. The executive branch benefits from risk-management less because of the model itself than its associated mechanisms for communication. Good metrics flowing upward, with the ability to ask questions that have achievable answers, defines a better foundation to make complex decisions. It also shows the level of diligence that is unarguable in worst-case scenarios, because it is possible to observe that based upon the best possible information, the best possible decisions were made, taking into account the agreement of all operational levels of the employment pool. In other words, decisions were neither arbitrary or based on known faulty information. Decisions then become decisions of the company for its benefit, rather than made by the executive without context.

Implementation of a risk-management system involves alignment of organizational behaviours with the operational objectives. This model creates and sustains positive change, because it involves a comprehensive communication model. It, in fact, embeds directive management mechanisms in operations. This imparts a closed loop operational system as a side-effect as certainly as it provides safety as a side-effect. But it all falters if employees disengage, if mid-level management fails the communication requirement, and if the metrics that guide executive decisions are communicated poorly.

Friday, May 21, 2010

Risk-Management is Cyclic Improvement in Action

Part of the mystery of why traditional safety fails so miserably to generate improvements is less a real mystery than connected to the basic misunderstanding about how to achieve an outcome. It has been observed that safety is an outcome, not a process, and is obvious that to get to a destination you must travel some path. Taking that analogy to the safety domain, you could say that the problem of traditional safety is that it consistently takes the same path and hopes to find itself at a different destination. Risk-management differs in that it not only takes the best path to a destination, but that at its core it recognises that over time both the destination might differ, and so too might the best path to any given destination. This idea of cyclic improvement over time, usually just referred to as cyclic improvement, is at the heart of the value proposition of risk-management.

Rejecting traditional models and focusing on risk-management is the only sensible approach to this topic, but one last meaningful lesson can be taken from traditional methods. That lesson is that a pig remains a pig whether you put lipstick on it or not, a fact confirmed by traditional safety at every turn. Semantics do not define value, they only ever describe it, and no amount of terminology can be twisted to produce value that is not extant.

Risk-management is about focusing on how risk interacts in an operational system, whether the system be a workplace, a society, or any other definable system.

The risk-focused view of the model steps beyond the traditional definition of hazard (though it maintains the scope to address all traditional hazards). We still have a trio of risk groups: Physical (Unsafe Condition); Worker Behaviour (Unsafe Act); and Organisational Behaviour (Unsafe Behaviour). We still identify and qualify the risks, and we still provide a critical scoring system to rank the risks comparatively. Where the model differs significantly is that it separates the hierarchal model of most hazard-centric models and creates controls as separate profiles that can apply and address any number of risks, reducing the management of controls significantly, and installing the concept of the multiple-level relationship into the model.

In “Covenants of the Rose” (2004), Larry L Hansen wrote, “Accidents are patterned and predictable performance symptoms, the final visible evidences of systemic failings and organizational deficiencies.” This is a core recognition of the risk-centric model, and it embraces the idea by defining an array of potential linkages (relationships) for the risk to participate across an integrated system that includes profiles, activities and reactive events. These linkages are the foundation for the analysis potential presented, and this analysis is what leads to avenues for cyclic improvements.

The idea behind cyclic improvement in risk-management is twofold: it recognises that imperfections exist in every iteration of any risk-management process and could be improved; and it recognises that changing contexts for encounters with risks will demand modification of controls over time, regardless of current efficacy. This boils down to a basic approach of risk-management, which is to attempt to manage based upon present knowledge bases, in the most efficient way, without entrenching that process. Or, harkening back to our analogy, risk-management is about travelling a path that can change both to accommodate a new destination, or to accommodate the discovery of new mechanisms to make that path more efficient.

One of the lynchpin elements of the risk-management system is its linkage model, which is the heart of its performance benefits. Where most business groups operate in information silos that tend to be highly partitioned, operational risk-management require a high degree of accessible integration to execute the benefits of linkage analysis.

An example of how powerful this model is can be written in a straightforward way: In a risk-management model, it is possible (assuming all data is of reasonable quality, and that it exists) to analyse the risks an employee faces daily and measure the frequency of exposure, thus producing a matrix to show the stress status of controls (which controls are relied upon most to prevent risk encounter and harm). By knowing the control spread, and which are most critical to prevent the most dire outcomes, one can generate specific inspection routines and focus meetings and planning documents to address the risks that are encompassed. Testing employee awareness comes down to generating a document to show the control requirements and the risks, and comparing that to knowledge gained by way of training controls, meetings they attended, inspections they have done, etc. When that employee changes to some new occupation, based upon linkages that run though both occupation job task definitions and training controls, it is possible to focus additional training to address only the risks not previously known, reducing budgetary costs while ensuring focused awareness. More interestingly, because of how risk-management links to all integrated modules of the broad data-set, you can even generate a best-fit for opening occupations, identifying existing resources that might require no or little additional training, and so might be underutilised – improving productive capacity by maximising the return on investment.

The cyclic improvements are not only made to control and risk profiles over time, but the linkages assist in creating a self-evolving system where systemic failure potential is reduced. This occurs because the risk-management model recognises all operational activities are integrated by a desire to reduce risk exposure, improving productive returns.

A system that will self-improve always beats a static record-keeping system.

Risk-Management is Cyclic Improvement in Action

Thursday, May 20, 2010

Reporting the Right Events in the Right Way

Just reporting the right events, whether they be incidents or activities, is only a small aspect of getting better data to based decisions. Reporting them in the right way is critical to developing a responsive improvement cycle.

It is fairly simple to recognise a three-stage reporting model for “events” that centre around risks encounters. In practical terms the life-cycle of a risk is the essential mechanic that governs this model. We identify the risk, we record when it was encountered without harm, and we record it where it was encountered and harm was incurred.

Risk identification occurs at any point in time when a set of circumstances (unsafe conditions, unsafe act, organizational behaviour) creates or has the potential to create a situation in which harm might occur. This process results in profiling the risk, which consists not only of generically describing it, but using some scale to identify its potential impact. When done this is a powerful first step to prevent the manifestation of impacts that harm the company or workers; but identification is not worthwhile unless the process is adhered to in a consistent way, which makes risks capable of comparative analysis. In a simple example, a risk of a paper cut is likely to have a fairly low harm factor, whereas a risk of being crushed by a dump truck is probably going to present serious harm to whoever encounters it. If your risk profiles are not comparable (ranked on the same scales), there is no way to direct resources by priority based upon critical impact.

Recording risk identifications is about ensuring that we know about a risk factor before it has manifested, preventing any impact on the company or a worker. Whether we detect the risk encounter via an inspection or an event investigation, what we are doing by recording it is to give us a broader base to analyse it. Recognising the risk encounter builds a record of its occurrence rates and their context, and that allows preventative enhancement. It focused personnel on both the common risks, and helps them ask two important questions: why is this risk so commonly encountered; and what will happen if it manifests harmfully? Knowing the answers to those questions means resources are applied to manage threat conditions rather than arbitrarily. Extending the example of paper cuts and dump trucks crushing folks, we might find that paper cut risk occur with significantly greater frequency. Basing operations on a purely traditional model, pretending near misses are recorded dutifully, we would eventually have that count reach some number that triggers resources poured into developing awareness. Meanwhile, the two near misses with the dump truck will be ignored by raw count. But comparing the impact, potential for harm, it is immediately evident that the first near miss of the dump truck is likely to garner immediate attention, and controls will be enhanced to avoid that risk developing into a full blown fatal incident. The defensible provision of ranking in this regard is part of the process that makes near miss recording so valuable.

High impact incidents, where harm is incurred, are always reported, and the process is fairly straightforward with some variations for context within an organisation. The majority of reports are always generated in the field, and flow upward to safety personnel and then beyond. At this stage the classification model comes into play heavily, with a special focus on the outcome potential. At a human level we know what can kill us is always treated with more serious regard than what can make us sneeze a single time. What is vital in this process model is that the process doesn’t end when legislation allows, because the feedback cycle is where the actual constructive value extends from. The analysis on the controls that failed becomes a foundation element for analysis of systemic failures, and it is the mechanism whereby control enhancement is triggered. If the process ends when the report is filed, the process fails.

The primary goal of reporting is to record, and the primary goal of risk-management is to analyse the recorded data. Form-filling tools produce paper, and risk-management systems produce improvements by way of cyclic improvement. Only cyclic improvement creates positive change, cost suppression opportunities, and drives productive improvements.

Wednesday, May 19, 2010

Scope of Discovery

The mercenary of our group has been heard to say, quite often, “One of the problems with idiotic traditional systems is scope of discovery is wrong.” Pressed to explain, he is apt to add, “When you analyse something, anything, you can only do an appropriate analysis if your underlying data was discovered on a broad applicable spectrum.” Luckily, translation of the idea is available for mere mortals who speak something like plain English.

Scope of discovery in traditional safety systems is wrong, because the statistical metrics that govern “whether you are safe” depend upon avoiding recording conditions that skew them. Consequent to that, traditional safety counts what benefits them more than what will harm them, since they are almost exclusively measured by post-event metrics. The problem, what makes this wrong, is that when you make subjective choices about what to record, you create a pool of data that, when analysed, is ignoring what is often the largest part of the data-set that should be analysed.

A case in point is the classification of incidents. The metrics that are used to declare a company safe will degrade that rating significantly if your near miss recording exceeds a certain ratio. While ignoring near miss recording is then almost a matter of commercial survival in some sectors, doing so actually trades off the perception of safety (by way of statistics) with the development of a safer workplace. You cannot fix what you have never seen.

This might not matter if the difference between a near miss and fatality was not often a matter of centimetres or seconds. The intention or recording near miss in a risk-managed environment is to identify the cause of the incident, basically to identify a control failure. Those failures define how to apply resources to better controls, and without the ability to analyse them, we cannot do that. By not recording them for fear of statistical self-destruction, we have no process that will avoid these failures creating unsafe conditions that increase risk until an encounter becomes a serious one, perhaps even a fatal one.

Good safety needs to be a side-effect of intentional management, not luck. This requires more data of a higher quality to perform better aggregate analysis, and if scope of discovery is being repressed the result is a skewed database. You will be analysing risk based not upon risk reality, but upon the encounter of risks where distinct failures created negative outcomes. There is no way to control preventatively with any effectiveness, since the best you will do is create a reactive control modification. What is required is a scope of discovery that provides a massive aggregate pool that can be used to execute predictive analysis.

Workers routinely identify risks in the workplace, and if you capture those identified risks you can control for them. The control mechanisms may or may not be efficacious, but the only way to determine it is to monitor effectiveness. Post-incident control failure analysis is important, but if the only incident types one ever analyses are negative impact ones (injury or fatality), then you are placing faith in the controls rather than assessing them. If you also analyse controls via inspection processes, and include a wide range of near miss and even better “risk identification” events in your analysis pool, you are creating a method to objectively create proactive preventative control improvements.

Scope of discovery is the key to better analysis and the provision of better safety.

In an asset inspection, if you check that a guard is being properly maintained, you are confirming control. If over a year the inspection is indicating the control is not being maintained, you have an opportunity to analyse that data effectively. If in 50% of the cases of asset inspections that control is failing, you have a serious risk pocket.

Now, if you have recorded a dozen near misses where that same control failed, you have a pool of data that can be analysed. Yes, there is no loss of property or personnel, but the reality is that if in a specific organisational location (welder shop floor in some shop, for example) half your near misses are indicating the same control failure, you can project with fair accuracy that at some point near miss will become something more. Maybe the metal fragment the guard is intended to deflect down to a safe pan will hit another machine and damage it, incurring property loss, or maybe that same fragment will cut an employee, blind one, or kill one.

When that happens your incident becomes an injury or fatality is that, unless you have this other data, you will be looking at a failed control in total isolation. The guard was down, the employee was negligent, case closed. Except, if half the inspections done are showing the guard is out of place, and you have a dozen near misses leading to this event, the context is probably different. Why is that guard such a problem? Is the control ineffective? Could you have implemented a modification that avoided the costly injury or fatality?

By limiting scope of discovery you create a false sense of safety, and you ensure your eventual control enhancements are done in a vacuum. Preventative measures are only possible if you have more data of better quality, and if your analysis crosses the limitations of cost-only incidents. The dozen near misses in our example would have alerted to the likelihood this control would be in a failure state eventually, but if the only time you hear about a control is after it failed, it will never be capable of providing preventative measures.

Risk-management is about embracing the range of your available data to ensure risk-awareness is real, and objective metrics exist to assist in focusing resources preventatively.

Monday, May 17, 2010

Breathing Deeply

When we created the solution-set as it stands today, we were distinctly aware that it wasn’t going to be easy to commercialise. The problem wasn’t the product, or even the vast market for it, but the educational curve attached to using the systems well.

When you have the scale of an IBM, Microsoft, or Google, you have the resources to infuse the market with educational context, the manpower opportunity to develop the expert knowledge bases to attack multiple avenues of revenue at once, and the momentum to deliver the product concept as part of broader integrated offerings. When you consist of four bodies, you often consider it pure bliss to have enough resources to last to the next quarter and pay your individual bills. Worse than that, you can’t really hire the expertise to identify the best growth avenue, and even if you do, you usually can’t call upon a cash reserve to execute the plan.

As we researched our solution concepts, and developed the tools to deliver them, we often stopped to take some deep breaths. We asked ourselves, regularly, whether the fight was worthwhile. That the answer was consistently that it was worth the fight was surprising, given we have a spectrum in the four bodies that varies from a true altruist (he actually thinks saving people’s lives is worthwhile) to a true mercenary (who thinks life is cheap and it’s all about cash).

The problem with breathing deeply is that sometimes the air stinks.

In 2008 when we took a deep breath, we discovered the investment community had about as much interest in an actual product concept as it had in anything deemed hard. Of course, we saw how that worked out when the markets crumbled, losing billions that were invested in emptiness; and we saw how that played out in 2009, when desperate corporate bailouts were done to prevent the people who caused the problem from suffering and dragging everyone else down with them. Throughout that cycle, we noticed distinct odour of misdirected fear in the air.

In 2010, of course, there isn’t as much air around as there was in the past, apparently. Now, taking a deep breath requires the kind of faith that it takes to leap off a cliff because everyone else thought it was a good idea. The lemming effect that led the investment bankers to abortive doom (abortive, since they were largely bailed out by the small guys), seems still to be in effect.

Then again, part of the problem with our product-set is that it is a concrete product-set. It can be explained, and the explanation is scary to people who like double-digit returns on the quarter but can’t conceived an emerging market that really is one. They can’t think beyond what exists, making explaining the opportunity to the standard investment group complex. The real killer, of course, is that to accelerate adoption requires exposure, education, and massive deployment to support revenues quickly. The crux of the problem is that with most investment groups, education kills the interest, since it has no direct revenue stream according to the standard wisdom.

What is odd, is that in 2009, we became distinctly aware that the real champion of our product-set would eventually be in the technology domain. They are the only companies that seem to have any grasp of emerging markets, and because of the tool elements they have the component parts to engage the core on levels that would drive revenues indirectly until the direct market matured.

Microsoft, for example, is large enough to meld this kind of risk-management tool into their back-end services, expose its interface through SharePoint, and even link it to their web strategies. Google not only has the delivery infrastructure (the product has been a cloud application for longer than the idea of the cloud has even existed), but the broad reach to actually push this to enterprises on volume levels that would reduce the cost of entry to almost nothing.

Being the size of a flea, though, the best idea in the world has no fast uptake opportunity; and that deflects investment about the same way flea powder deflects fleas.

Amusement aside, the real challenge of 2010 is no longer about the product-set, which while dynamic and cyclically improving is a fixed value point, but about how to develop a mechanism to accelerate its introduction, such as functional pre-screening and integrated vendor management. Is it pursuing partners crazy enough to recognise the potential value of this opportunity, seeking a partner or buyer with resources, or reshaping the product and directing the knowledge we gained over the years into providing something more traditional?

Sunday, May 16, 2010

Data Quality is Critical

One of the revelations of our research was that the quality of data in traditional safety programs is abysmal. When we would ask safety personnel for their investigation documents, we would get documentation that often was so incomplete as to make it impossible to identify human involvement at all. In the scope of legacy data we handled, we had an average 15% rate of serious injury incidents where the injured party could not be identified. In other cases, multiple reports on the same accidents would contradict on basic levels such as classification choices, with no additional documentation available to ascertain which was correct (both were probably incorrect). The incredible data quality deficit was made worse, by an order of magnitude, by the massive amount of static documentation that was often available.

The data quantity should probably have been less surprising, given that traditional safety operates on counts alone most times, but it was amazing to encounter piles of documents about safety meetings, some with identified hazards listed, and have no associated documentation to indicate if the assigned corrective actions were ever completed. It was usually impossible to even ascertain who was at a meeting, or who was assigned to ensure the corrective action. Even when follow-up might have been done, the scale of the paper made locating the proof impossible.

The problem of generating paper is that paper is impossible to track effectively. You can not ensure it is kept, often cannot relate it to anything else, and its fundamental fault is that every page is discrete and in no way connected to the next. Even conscientious feedback gets lost when the paper is never collated, reviewed, classified, or audited.

Traditional safety is less sensitive to bad data than risk-management because it is about counts. Showing fifty inspections is fifty activities counted, and no one ever really asks you to prove they were worthwhile. In risk-management, fifty inspections with no connective substance will expose themselves as irrelevant instantly.

One of the greatest barriers so far to deploying risk-management for even the best intentioned companies has been that when they can get their data (not always as easy as it should be), they can almost never get quality data. The employee list will contain names no one has ever heard of, have missing people, have multiple spellings of the same name, etc. The occupations list will be a third the size of the employee list, with obvious spelling mistakes, and no real relationship between people and occupations (on average, clients have about 60% of listed occupations that have no apparent people employed to do those occupations). This low quality is fatal to any system that relies upon delivering upon value propositions.

Of course, in a grander scheme, the low-quality data exposed by trying to transition to risk-management should raise flags. Management quickly becomes disillusioned, if they care at all, when they discover things like Human Resources cannot give a list of employees that makes any sense at the click of a button, or that all training records are in paper form and George down the hall might have a spreadsheet he keeps that shows it in relation to the folks he knows about, which contradicts the one Bob keeps.

One of the saddest experiences we have had is having to say to companies they have no reliable data sources, because as soon as those words are used, they have one of two reactions: they bury their heads in the sand and pretend this isn’t a problem in their normal operations; or, they see it and think they cannot afford to fix the problem.

Data quality is critical for risk-management to function operationally; but, it is vital to recognise it is critical to day-to-day operations. If your information systems cannot provide reliable basic data about operations, you need to rectify that whether you want to transition to risk-management or not. The end result is the decisions you make are based on real quality information, rather than obscure guesses.

Saturday, May 15, 2010

Your Company Can’t Do Risk-Management

What our research has shown is that most companies cannot do risk-management, whether by their own design or using our solution-set. In fact, most companies cannot ever achieve the risk-management paradigm shift. This proclamation would be depressing but for the explanation of why this is the case, which exposes the actual problem isn’t the risk-management solution-set, but choices being made. So, here are the top ten reasons your company can’t do risk-management, presented as statements. If management says even one of these, they are never going to successfully transition to real management, and will remain in chaos mode forever.

Our workers don’t know safety.
Our workers are safe only when we’re watching them.
Our safety program is world-class.
Our safety personnel are experts.
Our accident rates are below the industry average.
We believe anything more than zero-incident rates is unacceptable.
We are constantly doing more safety activities.
We are always training.
We are receiving safety awards for our excellent workplace safety.
We are a safe company.

While harsh to say, the instant any one of those statements is made with anything other than a sarcastic edge, management is choosing to maintain status quo. Real effort is not likely to be expended, or maintained, to make any changes, because things are “good enough.”

The problems with those statements helps expose why they are dangerous:

If a manager says workers don’t know safety, then what they are saying is the people who provide all operational productivity do not care about their own wellbeing. Saying that is masking the real statement, which is usually that “workers are doing things that negatively impact our productivity.” This attitude that workers are ignorant is not only false, but makes the entire provision of safety a farcical effort: without engagement there is no uptake. Workers need to be viewed as resources, assessed on a per worker basis over time, and brought collectively to a standard of safe operations that imparts higher overall productivity by way of risk-aversion and effective control compliance. Not believing workers can provide value to the safety effort, undermines their contribution.
If a manager says workers are safe only when we’re watching them, then what they are actually saying is our workers are poorly trained, poorly deployed, and poorly managed. Well-trained workers doing jobs for which they are qualified are not inherently safer when watched. Yes, people become complacent, but that is about communication, not monitoring. Monitoring is the data driver that makes for good feedback and communication, not an answer to worker complacency. Believing that watching is managing is a display of ignorance about how people function, and says a great deal about management but nothing about the frontline workers.
If a manager says the safety program is world-class, what they mean is that it is good enough for their needs. The reality is even risk-management doesn’t allow resting on laurels. Processes that impart safety are never wisely measured as world-class, because the sad fact is the world isn’t a safe place. The manager who says, “good, yes; great, never” is always going to be world-class by default, because they are always forcing change management into operations, seeking efficiencies, and maximising the risk-aversion of their workers. Just like safety is an outcome of process, so too is world-class a default outcome of pursuit of better process.
If a manager says safety personnel are experts, what they may as well be saying is, “I don’t want to know; tell the person I hired.” There is no such an animal as a safety expert, because individual people are incapable of a broad enough view to be mechanically objective. We see what our experience allows. This doesn’t degrade the value of safety expertise, but what it does is qualify that the safety expert is not valuable beyond the scope of their knowledge and experience. Far better to have a safety professional who actually manages the processes that underlie and produce safe conditions. The expertise there is the same expertise that a good human resources manager has, because so many processes that affect safety are about human relationships to processes.
If a manager says accident rates are below the industry average, they have basically shrugged. It is fairly easy to be above average in most aspects of life, given that mediocre is the modern standard for performance for most purposes. Does that satisfy anyone? Does it engender growth, revenue generation, or profitability? Does being better than the other losers really imply a pursuit of excellence? Sadly, it seems to for many; but it is observable that the companies that really excel never express such attitudes. They are far more likely to proclaim they are not yet good enough.
If a manager says we don’t accept anything more than zero-incident rates, they are being wilfully ignorant. Business requires operations of some sort, and operational activities require risks undertaken. To believe, even for an instant, that it is possible to exist in a zero-incident state forever is foolish. But focusing on the zero in that declaration, will often create conditions where suppression of incidents is commonplace, increasing risk until inevitably the exposure exceeds chances of avoidance, after which the massive impact of the accident will eradicate the entire organisation. For an example, look no farther than Bear-Stearns, a company that assumed enormous risks and consistently expressed how risk-aware and averse they were, bilking investors out of billions. Lip service does not create risk-aversion, it enhances risk.
If a manager says we are constantly doing more safety activities, they are really saying, “I get lovely reports with many numbers that mean nothing, but damn they look important.” Corner such a manager and ask them a question like, “How does doubling the number of safety meetings impart more safe outcomes?” Their answer, if they are conscientious will rightly observe that better communication can actually do that. Now, say to them, “Prove it has.” Suddenly, those obscure counts, unrelated to anything – or, worse, in reality often showing no impact on accident rates – mean very little. Smart managers see through the numbers to the realities, and question efficacy because they realise scarce resources are being applied to activities that may have no value returns associated.
If a manager says we are always training, a commendable idea, ask them, “Why?” better than 90% of managers we have asked that question achieve glassy stare status in seconds, and the best ones always admit they haven’t a clue. They seldom even know, at the executive level, what people are being trained to do. When they find out, they often sit in stunned silence and can observe that traditional safety, being how often it takes the easy path, frequently applies pointless easy training without ever checking value at all, and lags far behind on more complex critical training needs. Training is an ongoing process, but has to also be explicable. It better matter, since it is one of the highest cost aspects of control imposition faced by any company.
If a manager says we are receiving safety awards for our excellent workplace safety, unless they are smirking, they really need to educate themselves to what it takes to get a safety award. If you are in an industry that kills a handful of people every few months, you might end up at the top of that heap and awarded for only killing Bill in shipping. There isn’t anything enviable in being the best or the worst, or a certificate that is awarded based upon statistical tricks that have no connection to any reality. Try getting an award if you report every actual recordable in an industrial setting. Your statistics will betray you make you look awful, and you can easily be beat by a company that has killed a handful of people and not bothered to diligently report their recordable events. The fact the near misses you recorded meant you killed no one matter not whit to the statistical formula.
If a manager says we are a safe company, what they mean is “We haven’t killed anyone recently, or injured anyone recently.” Safe is a purely subjective term. A quality management team at the executive level will always balk at making such a statement, because an easier and more truthful one sounds so much better: “We try to be a safe company, and it takes a lot of effort to maintain that.”

What it takes to do risk-management, producing real safety from the process, is recognising that attitude is almost everything. You have to want to become safer, stay safer, and reduce your costs to do that. You need to commit to the long-term value of being able to measure your progress, focusing your resource applications to produce improvements, and understand that at the end of the day you never end the effort. Risk-management becomes a profit-driver because it is a productivity-enhancer, which just has the odd side-effect of producing safer workplaces.

Friday, May 14, 2010

The Importance of Technology

You can drive a nail into a tree with your hand, if you are committed to striking it constantly for a few decades. Likewise, you can do risk-management without a single technological tool at your disposal. But just like a nail goes in more easily with a hammer, so too do good expert tools assist in making risk-management relatively painless. Having said that, risk-management isn’t about technology, and the primary stress is always on process. All the tools do is increase the efficiency of the processes.

While our research into “safety” drove our understanding of the current models, and their serious flaws, those understandings drove our development of tools to support risk-management approaches. What is interesting is how often the incremental development of the tools drove our understanding of the opportunities presented to do risk-management well. In most experiences the creation of a tool is to address a purpose, and so too with the primary tool we created to manage risk, but as it developed it exposed other avenues to consolidate the theories we developed.

Some of the key considerations that our research raised and our develop efforts reflected were:

A need to simplify the actual mechanics of the data recording process, since we knew the data gathering and recording processes were always a challenge;
A need to spread the effort requirements to engage more people in the development of risk aversion, by having the tools assist them in communicating;
A need to monitor both data quantity and data quality, to ensure the reporting capability reported against clean data, or at least had the chance to explain the data faults;
A need to enforce value returns by focusing away from traditional distractions toward cost-provable processes that can be measured concretely;
A need to impose an integral relationship model that would provide opportunities for deep analysis without subjective effort, and allow for engagement of the cyclic improvement model; and
A need to get the right information (data) to the right people.

A few of the things our development revealed as it proceeded were:

The actual practical nature of risk factor linkage across inventory, activity, and reactive entities, leading to the theory we developed about Risk Factor Linkage and Analysis; and
The degree of expectation we could have of dependent systems to provide good inputs, revealing methods to overcome the information technology barriers in large organisations.

Ultimately, without good tools, the cost of doing risk-management will always exceed the benefits, because the process will become clumsy, the returns will be unreliable, and your focus will end up in the wrong domain. With good tools, there is nothing to prevent risk-management becoming a profit-driver.

Thursday, May 13, 2010

Due Diligence

The apparent inability to change fatality and injury rates in any real way, combined with the increasing trend toward criminal prosecution post-incident, has led to a focus on due diligence. The problem with this focus is not the focus itself – we should all be dutifully diligent – but on how to achieve it. Here again, everything we have learned in more than a decade has shown us that traditional safety fumbles this entirely.

In a traditional safety program the problem with proving due diligence is twofold: first, the proof simply doesn’t exist, because traditional safety has an abysmal lack of provable value, almost no precision focus, and is so highly reactive it has to draw any proof of diligence from a sea of assumptions; and, second, relying on luck isn’t a sign of diligence of any kind, but rather a sign of total ignorance.

What we see increasingly is that in situations where due diligence becomes a question, usually about the time a prosecution begins, the fumbling begins to generate the proof. While not criminally driven, this is essentially a fraud, because what it entails is creating something concrete from something ephemeral. If you are dutifully diligent, you can tell me today, with fair accuracy, exactly how many of your employees are competent to do their respective job tasks, and you can show it to be true by way of a provable process and certification. This requires several conditions, the first being that such a comprehensive list exists, the second being that it is up to date (or can be shown to have been updated at some specific date), the third is that it is right (certificates match claims), and the fourth is that you can generate it reliably. Any one of those conditions failing means you are generating the proof by way of magic, or, at the very least, bending it into shape after the fact. Even assuming your end report on the matter is honest, the problem is that you cannot prove to have been diligent prior to the event that required such a proof, because you can’t prove you knew the state of your organisation before that event.

Real due diligence is about consistency, accuracy, and explicability. In a court environment the test of proof is going to demand it, and ignorance is no defence against a lack of diligent effort.

What we know is that risk-management itself isn’t any better at delivering proof of due diligence than any other method. What counts in this regard are the supporting tools, which is where the development side of our research and development efforts focused. It was apparent from the outset that data had to be capable of proving diligence, and even then it was doubly clear that the best tools would face the same two problems: data quality concerns, and data management concerns. Put in basic terms, the fact is that even with the best tools, low quality (bad) data and poorly maintained (unreliable) data kill the proof of diligence.

The difference between a risk-management model and its associated tools, and traditional safety, is that risk-management at least makes due diligence possible. By focusing efforts, and by having the tools maintain relationships once they are defined, we ensure consistency. By ensuring consistency and guiding the recording of data, we ensure accuracy (so far as is mechanically possible) as long as people follow the defined processes. And by having consistent and accurate data, and by wrapping the tools in processes that are clear and repeatable in terms of results from inputs, we can explain that data. In essence, risk-management processes and tools, combined, impart the three key gauges of proof of due diligence.

If you risk-manage your company, you use tools that make it easy, and if you execute that process wilfully your outcomes are naturally both safety, a side-effect of good operational risk-management process, and proof of due diligence – because you are being dutifully diligence in your process.

What is a depressing reality in our experience thus far is that the attitude remains, “serious accidents won’t happen to us,” and by the time that fallacy is shorn away, it is far too late to reverse engineer an enterprise to be naturally diligent.

Wednesday, May 12, 2010

Rolling the Dice

Two incidents from just before Christmas in 2009 illustrate the fact that companies are rolling the dice daily, betting worker lives against some fantastically ill-defined maximisation of returns.

In the first case, Canada Post demanded employees take down a set of Christmas lights from the top of a file cabinet, stating it was a safety issue. The associated press release stated safety of employees was of paramount importance, while the union rightly observed management needed to become more concerned about high critical hazards. While not obviously wagering lives here, the fact is they have, because they directed resources (including a chest-thumping press release!) toward this bizarre event. While argument is sound that Christmas lights present a possible hazardous condition, every penny spent on this nonsense is one less spent managing risks that will eventually kill someone. This misdirection of effort is an egregious affront against the very idea of management, and shows that Canada Post has no understanding of resource application for value return. They will have spent a thousand dollars on this, at least, draining valuable budget that could be applied effectively. That they congratulated themselves in a press release only further shows the low or absent management quality. A good manager would not ever allow such an issue to be treated with such false bravado.

In the second case, four workers in Ontario, Canada, fell to their deaths on Christmas Eve from a faulty swing stage. While it will be months before an inquest determines details, what was apparent early on was that the employees were improperly trained, the swing stage was improperly maintained, and basic safety equipment was not being used. One has to wonder what the company responsible for this travesty was hoping to achieve? Were they really that utterly ignorant of the risks to their workers, or were they convinced that the risks were acceptable? If the former they are criminally negligent as well as incompetent, and if the latter they are simply criminal. In this case they killed four people, wilfully, by way of ignorance, even if in the end it turns out the safety equipment was ignored by the workers. For some reason those workers had a cultural bias toward getting a job done, rather than doing it safely. The saddest aspect of this is that in killing four people, they devastated four families, which represents social costs that go beyond any compensatory penalties they will ever face. This type of incident is why criminal prosecution is possible in workplace injury and death cases, because nothing short of hard time would ever send a message about this kind of disregard for human life.

The lack of significant decreases in fatalities over the decades is indicative of how many companies are rolling the dice rather than stacking the odds in their favour. A large part of this is because safety, being treated as an activity, simply doesn’t return on the investment. You cannot prevent or suppress the impact of accidents when your entire model for doing so is focused on counting irrelevancies that have no provable relationship to the underlying causes of accidents. Theory states that the more proactive measures one does (meetings, inspections, etc.), then the larger the drop in the accident rate. Data, of course, shows no correlation, since there isn’t a connection between counting and changing conditions that increase risk:

Our research has shown doing more of the same inapt counting actually erodes worker trust in the safety program and its purpose, and consequently they begin to view all initiatives connected to “safety” as more of the same drivel. This, in turn, increases risks dramatically, because positive initiatives end up buried in noise. And when focus is lost by the workers, the systemic failures become commonplace. This “safety exhaustion” exists at so many levels in most organizations it is a large contributor to the decline in reasonable investment in actual practical operational risk-management.

What may be sadder than the static rates themselves is that there is no real mystery why these rates remain unchanged. It all comes down to failed safety programs, failing because of a combination of flawed assumptions about safety and flawed processes. The key issue is that safety is seen as a process rather than an outcome, meaning people are trying to manage the end-result of the process, rather than focusing on the process.

Amongst the flawed assumptions about safety is the extension of the mistaken focus on outcomes to provide metrics. Total Recordable Incident Rate (TRIR) exemplifies the problem, because it focuses on outcomes of failures as a measure of success. To put it in painful perspective, we have developed an entire industry around a statistic that rewards us for failing less, without ever asking why we are failing. It is similar to grading on a bell curve, and generating statements like, “Company A is the safest in the world this instant because they have only killed five people this year!” Shockingly, that kind of ridiculous metric is exactly what we use.

Another of the assumption failures is a little harder to come at in traditional safety programs because they simply have no comparative advantage to risk-management, because you cannot compare luck to management. Activity counting in traditional safety relies upon irrational aggregation without any relationship defined between the activities and the functional operational control measures. So, in essence, we reward ourselves for doing thirty extra inspections in a given quarter, without any way to answer the validity of the inspections. This lack of relationship between the underlying risks and their controls, and the activities, means we will never have a cause-effect clarity in traditional safety. In risk-management, though, the entire structure of the process is about defining and managing the relationships in a way that provides insight. In a risk-management model we still do inspections, but our inspections might point at an asset, which points at inherent risks, which point to known controls. So, every inspection (asset or generic) shows a chain of insight, and makes it possible to analyze the control measures. The assumption of a relationship is not the clarification of a relationship, and as long as traditional models focus on outcomes they avoid relationship definitions and accountability.

Extending from the false assumption that counts somehow imply management, is that reactive, traditional safety somehow implies management. Most traditional investigations, like their activity counterparts, have absolutely no control failure trace. And even when Root Cause Analysis is done, the problem is that there is still no direct relationship defined between those efforts and what must be managed to avoid and suppress risk. I can say that the direct root cause of an accident was inattention of the employee, but I cannot manage such a condition in and of itself; what I can manage, being the controls that might impart more employee focus, are not traced by the traditional safety reliance on form-filling. Risk-management, being more orange than apple, has an integral ability to do just that, while maintaining all the features of traditional investigation models.

Assumptions are dangerous because they fail the instant any aspect of the assumption is even moderately flawed. But even pretending that the assumptions are valid, and this pretence is enormous, the flawed process destroys the integrity of traditional safety programs. The focus on outcome, even to the point of using it to measure the efficacy of the process itself, creates a scenario where participants begin to behave against the interest of safety and in the interest of deflection. This leads to poor implementation at every juncture, because inability to prove value proposition one way or another makes it acceptable to create paper, regardless of its meaning; redirect focus in knee-jerk fashion to high visibility “controls” that are often ineffective, and given focus because they appear easy to achieve; generally ignore the quality of primary control processes, which are harder to do and have a longer value delivery; and to entirely ignore the directive management control mechanism.

We have not yet encountered a company that can actually prove whether high priority corrective actions are being done. Even worse, there are no corrective actions for high-consequence events, because there is a failure to recognize the consequences as focused on the outcome.

When you try to manage an outcome rather than a process, your efforts are entirely wasted, since an effect is not a cause. And contrary to what some might wish, rolling the dice is not a process, it is a risk. Choosing risk that is undefined, uncontrolled, and ultimately unacceptable is not management; management is about using defined process to ensure outcomes. The only thing rolling the dice has in common with management is that it ensures an outcome, which is the eventual failure conditions that will kill workers. All the lip service to the idea that there is no price on a human life flies out the window alongside the improperly harnesses employee when an enterprise approaches safety as something to be done, rather than something to be achieved.

Tuesday, May 11, 2010

The Immense Cost of Reactionary Behaviours

One of the realities we came across in our research and development phase was a mercenary reality, which was that we recognised cost was a major barrier to change.

One of the clearest misconceptions we run into is the idea that the cost of risk-management is higher than staying the course. Consequently, risk-management becomes an “expensive option,” despite the fact that in more than a decade we have yet to have a single prospective client company show us a valid cost assessment of their current approaches. In most cases the best that can be done is to add the cost of training, the cost of insurance premiums, and the cost of known losses; and even then none of the three numbers can reliably be gathered by most companies. To put it bluntly, no one seems capable of defining the cost of current safety programs, and yet the cost factor is cited as an excuse for not making changes.

Part of the problem, of course, is that “safety” has been a moving target for decades, always introducing the same basic luck-based ideas with new terminology, always failing to impact the bottom line positively, and never really doing more than obscuring accident rates by happenstance. All this time, all this new investment, has made even the best management shy of investing anything concrete in making changes, since they distrust the changes of the past. Any method is viewed as another faddish distraction from the basic fact safety seems to be a crap shoot, and this attitude is extant often without even analysing the vastly different methodology of risk-management, or assessing the cost-specifics it can define.

One would think that even the basic fact that risk-management can be measured for cost would appeal, but there we often find another barrier: some management groups would prefer not to know how much “safety” is costing them beyond the direct, unavoidable measures. They know to realise the actual cost of the current reactionary model would cripple them with shareholders, who would be stunned to find out that all those resources invested accomplish nothing provable in practical terms. Indeed our research demonstrates no correlation between any traditional safety activity and accident rates, which begs the question of why are companies spending money if it is completely ineffective?

A pretence we hear a lot from companies is that their employees matter, usually twinned with the grandest lie of all, which is that you cannot put a price on human lives. While admirable ideals, the realities are provably different. The simplest calculation to determine the value of a human life is to add the cost of all “safety” in an enterprise, then divide it by the number of times a person has died. That will show you exactly what a life costs, since you will have poured that much money into outcomes that led directly to death. Of course, the fallacy in that calculation is that it really should be the total cost divided by the number of times you could have killed someone, since the purity of luck is the only reason you haven’t.

Not making changes because of cost ignores two basic facts we have discovered:

· You cannot proclaim the cost of productive change to be beyond the acceptable range if you can’t honestly define the cost of doing nothing; and

· You cannot assign a cost calculation to any system that relies on luck to avoid disaster.

The immense cost of reactionary behaviours is obvious post-incident, but within months of those expensive incidents the momentum for change is always lost because any investment is directed to avoiding the same scenario, without ever actually understanding the underlying failures. By the time the inquests determine cause, they have generally obscured the practical realities by drowning them in politics. And even when they are pristine, those recommendations come far too late to make adjustments.

Risk-management solves the timeline problems, allows fine cost awareness, and divests the luck-based approach entirely when done well. Cost is no reason to resist the benefits, because unlike what is extant in the industry today, risk-management can be managed for cost, and can also prove its value on a per-cost basis, which makes it an actual management process rather than a reactionary one, where costs simply cannot ever be controlled.

Monday, May 10, 2010

Management Versus Reactionary Behaviour

To be blunt, if your safety program is “managing crisis” it is neither managing nor imparting any value. One of the key discoveries we embedded in our solution-set relates to the recognition that you react to crisis situations, and you manage in order to avoid crisis. If you are managing, you do not experience crisis, even if you manage the outcomes of a negative incident.

Management is about improving decisions over time; thereby reducing time spent managing small issues so that larger targets can be focused upon. Whenever a company has reams of paper related to safety, there is an instant proof in those papers that risk-management isn’t being done or else there would be no static cache of paper choking actual work time.

When you manage training, for example, what you are doing is imposing controls to avoid or suppress risk, creating productive opportunities, or doing both simultaneously. Mentoring exemplifies a value proposition for training controls that is often lost when monitoring is substituted form mentoring. When Fred the welder spends half his day filling out forms to say he stood by welder apprentice Sally, where was the value? Even if Fred is an excellent mentor, especially so, is it not likely that having him spend 90% of his time doing the mentoring preferable to a fifty-fifty split between that and filling out paperwork?

When you manage productive timelines, what you actually do is set those targets, recognise them, and achieve them by proper process of operations. You don’t panic and turn off all safety guards to double speed, unless you haven’t been communicating well, and have no actual management process. Does it not make more sense to communicate effectively and manage rather than react in chaos?

When supervisors are overwhelmed by paper are they managing or shuffling? Over time good supervisors build trusted workers, with higher reliable skill sets, because they spend an inordinate amount of time present. They have the time to stand with an employee and deal with them behaviourally before stress flares, because they are not pushing shreds of paper into some black box. They have the time to manage rather than react.

In traditional safety though we almost never see anything managed. We see crisis reactions, panicky attempts to bury stupid decisions, and repeating cycles of destruction. We see it because you cannot manage an outcome, only a process, and the idea safety is itself something more than an outcome is ingrained in traditional programs.

Reaction leads to poor feedback, worse communication, and a lack of analysis – a repeating cycle of risk encounters.

Management engenders feedback, through immediate contact); good communication, since the opportunity is two-way; and provides a contextual basis for analysis, since all discrete data will be objectively related (assuming the correct tools).

The most common barrier to introducing risk-management into a company is always lack of management. It has sometimes been so severe as to beg whether any companies even understand the idea of operational management, let alone risk-management.

Sunday, May 09, 2010

Death of a Thousand Paperclips

When we talk to clients about the almost universal problems they have gathering basic data (for any purpose), we sometimes refer to this as the “death of a thousand paperclips” effect. Put succinctly, it refers to a condition where an enterprise spends its time generating enormous reams of paper (whether on files on a machine or printed) that cannot be associated. Very often they have all the necessary data, but just have none of it consolidated in any form that can be accessed, so the majority of their effort can be spent accessing data of questionable quality, which is also difficult to relate contextually.

Part of the problem is that right now traditional safety isn’t participating in management; it is counting activities and generating paper. This is true even to the point where training generates certificates without ever having a way to indicate the value of those papers. Into this mix we have seen numerous additional paper efforts, all of them predicated on some belief that all the past bad paper somehow was itself bad, rather than that it became bad because it was unmanaged. So, we see client companies with massive competency documents describing what a welder should know, who still cannot reliably tell us what Fred Smith actually does. This disconnect is because nothing is related in any way that is reliable; it shows what one used to call a lack of bottom. In plain terms, without a foundation process no amount of tweaking ever produces related value, only larger stacks of irrelevancies.

Form-filling software is state of the art in the traditional safety world. (And had we been less interested in actual management, we would have taken that path to glory.) The problem is form-filling generates forms, not value; and value is the measure by which all processes are actually judged. When you have a stack of forms ten miles high, you can neither find what you need quickly, nor discover relationships easily.

This is another point where the tooling behind a solution-set becomes part of the value proposition, and this is where the payoff comes in the models we developed, where risk factors and controls suddenly link across a broad array of profiling, active and reactive systems to produce opportunities to link and analyse data, address issues of data quality, and generate reams of paper that communicate rather than frustrate.

When we ask a company which has suffered a workplace fatality to be objective, given the current state of traditional safety they will begin to count things. They have to, since they have no other choice. They will fill out forms, counting the fields they can complete, producing an inscrutable but perfect form. They will then attempt to subjectively qualify the fatality, because really they have nothing to draw upon to test any hypothesis. They cannot, for example, really analyse this fatality in context of fifty similar near misses to identify what control failed and how it failed. Nor can they analyse the employees training instantly, relating the risk factors that they encountered that killed them against the ones the training they had protected against. Nor can they analyse what training the occupation they had should have had versus what they did have. All that can happen in traditional forms is during an emotionally distressful instant checkboxes can be checked, counted, and filed.

Real value is lost when outcomes are believed to be systems, because when the actual underlying systems fail the outcomes is shattered. In essence, the focus on outcome obscures the cause in favour of a focus on the effect. No one can be objective when the outcome becomes the exclusive focus.

Our systemic approach though isn’t about safety at all, but about risk-management. Even in the worst scenarios it is easy to be objective (generating value) when your entire process is able to answer questions that matter. If someone is dead, that won’t change; but the next person who is killed could be saved if the system can answer exactly what controls failed to allow the death, exposing faulty controls we can universally rectify, or an absence of control, or unknown risks. Even in the purely reactive conditions imposed by an accident, a risk-management model is focused not on the fatality, but on the mechanics of how to prevent the next probable fatality. It is less interested in blame than focused on avoiding repetitive incidents. Our system can tell you what the person was doing, whether they should have been doing it, what risk they encountered, what the expected severity of that risk was, what specific controls attached to that risk failed, and even go so far as to tell you whether the dead worker should have been doing the job at all given their training controls.

Death of a thousand paperclips kills the ability of traditional safety to impart a defence against disaster. No matter how you treat an accident under the traditional approaches, you have no objective weight to any outcomes, because the circumstances of the outcome provide too much focus potential. Subjective analysis is, simply, not real analysis.

Risk-management tooling is the critical difference between effective risk-management and is where our research and development effort shows real value. It is never good enough to pay lip service to an idea that can be qualified, quantified, and cyclically improved.