Wednesday, November 15, 2017

Possibility or Probability

That there is a possibility does not imply that there is a probability.
Possibilities are variables while probabilities are facts. Probabilities vary due to the effect of possibilities. A possibility is often applied to safety as a fact with an undisputed path for an event to occur. Possibility is the expression of a desire for an event to occur, while probability is an analysis of facts to establish a likelihood level of an event to occur. When possibilities are applied to regulatory compliance operations gradually become a system failure, or a dysfunctional operation, while with the application of probabilities the operations improves their safety, or functional systems and operates with an effective SMS.

There is a possibility that all marbles remain in the bowl, but a low probability.
Probabilities are levels of the likelihood of one possibility, or the confidence level of a prediction that an event will occur in the future.
There is only one possibility applied, but this possibility is applied to the different criteria of likelihood. Likelihood is defined in many shapes and forms. One method is to define the likelihood of ten independent levels based on a time-frame between events.


Likelihood levels could be defined as follows:
A) Inconceivable
Times between intervals are imaginary, theoretical, virtual, or fictional.
B) Rarely
Times between intervals are beyond factors applied for calculation of problem-solving in operation.
C) Remotely
Times between intervals are separated by breaks, or spaced greater than normal operations could foresee.
D) Randomly
Times between intervals are without definite aim, direction, rule, or method.
E) Variable
Times between intervals are indefinable.
F) Occasionally
Times between intervals are inconstant.
G) Often
Times between intervals are protracted and infrequent.
H) Frequently
Times between intervals are reliable and dependable.
I) Regularly
Times between intervals are short, constant and dependable.
J) Systematically
Times between intervals are methodical, planned and dependable, without defining the operational system or processes involved.

When applying examples to these likelihood levels they become alive and practical in a Safety Management System.

The likelihood is inconceivable.
As an example, for each departure, there is a possibility for an airplane to experience an engine failure just after liftoff. However, the probability that this event occurs is based on data applied to a likelihood level. By applying the possibility to each level of likelihood and pick one level based on data it could be established what effect a possibility of an engine failure has on operational safety for each likelihood level. In this example the possibility is applied to a likelihood level of Randomly.

D) Randomly – an engine failure just after liftoff occurs randomly. These intervals between engine failures are without definite aim, direction, rule, or method.
However, when an engine failure is applied as a possibility only, it becomes a possibility to an inconceivable event and there could not be safety in aircraft operations.

The same scenario holds water when applying one possibility to regulatory compliance.  It could be said, and it has been said that there is a possibility for an operator to be non-compliant with one regulation and therefore a regulatory non-compliance finding could be issued. When applying one possibility without a link to likelihood all operations are non-conforming to regulatory requirements and all operations becomes non-conforming to safety.


CatalinaNJB

Thursday, November 2, 2017

SMS: An Umbrella Or A Wheel

There are many names associated with the Safety Management System (SMS). A Safety Management System is often addressed as an additional layer of safety, but does not address what other layers of undefined processes this is an addition to. This statement is widely accepted as fact without analyzing the other underlined processes.  Several of steady improvements in the accident rate during the lifespan of aviation was attributable to improvements to technology, such as the introduction of more reliable engines and navigation systems. Pilot error, or human factors, were assigned as the root cause of accidents each time there was an accident. This root-cause statement included a statement that a person had failed to comply with a regulation or standard which had been arbitrary implanted by the State. 
Umbrella is a shield of protection and not s system of safety.
More than once a new regulation or standard would be arbitrary implemented after a major accident. Assigning the blame to the flight crew was an easy way out and without accountability to the operational processes.  Continuous safety improvement in aviation had become difficult when applying an approach to assign root cause of an accident to one person only. The task of continuous safety improvement had now become a task to find a flight crew member who would never be involved in a future accident. Since this is an impossible task the Safety Management System was developed to make aviation a more perfect operation with an assigned safety operational confidence level.

This new approach to manage organizational factors, human factors, supervision factors and environmental factors was looked upon as an additional layer of safety to what the aviation industry already was doing. However, the aviation industry had not been doing anything else but to comply with regulatory and standards requirements. When the SMS program was presented as an additional layer of safety, everyone assumed that by complying with this highest level of layer of a hierarchy all other regulations and standards would take care of themselves. It had become an assumption that they were self-regulated by the Safety Management System. 

While the assumption of being self-regulated is a misconception of the SMS, both the operators SMS and the aviation authority oversight system is a part of a complete package. The concept of an SMS is that the operator has processes in place for the safe operations of an aircraft or airport and processes in place to ensure regulatory compliance. When there is an audit by the aviation authority the ideal outcome is that there are zero findings, or an operational zero tolerance to compromise aviation safety.

An SMS is a formal means for operators to demonstrate their management capability to meet their obligation to operate at the highest level of safety in the public interest. While both oversight systems and are highly complementary and interactive, they are both separate and essential components of the regulatory safety management strategy. In other words, SMS is a parallel system and supporting system to the operations.

SMS is to strengthen each spoke of the wheel.
SMS has been described as an umbrella in of the operational safety management system. An umbrella is a tool that covers or protects from above. When applied to the SMS system the umbrella is an overarching system encompassing all other systems within the organization. If this was the fact, that the SMS system is an umbrella, it would take precedence over all systems within an organization. This would cause mass-confusion and inability to manage operations. With the SMS established as an umbrella, anyone can use the safety-card to disable organizational safety management.

On the other side, when the SMS system is looked at as a parallel system to operations and as a tool of a wheel with spokes it becomes manageable and practical in the application of safety. As a parallel system and a wheel of with spokes, the operator may choose to strengthen the wheel by applying more powers to one or the other spokes. SMS is not an overarching operational umbrella system. SMS is a system that is receiving data from operational practices and applying this data to each one of the spokes in the wheel to strengthen the wheel and operational confidence level of safety management. 



CatalinaNJB

Wednesday, October 18, 2017

Accountability

Accountability is without supervision, to comply with regulatory requirements, standards, policies, recommendations, job descriptions, expectations or intent of job performance and for personnel to be actively and independently involved. Accountability is an element of a just culture, which is an organizational culture where there is Trust, Learning, Accountability and Information Sharing.

Accountability is process in motion.
Accountability is a process in motion and not a static state of virtual events. Accountability is different than responsibility since it is the behavior to trigger events in a form that produces the most positive result. When a person gets the drivers license they have a responsibility to stay on the correct side of the yellow line that is dividing oncoming traffic. This personal responsibility does not leave the person even if the person is not driving a vehicle. It’s a responsibility of the license itself. This is the same for a pilot license or aircraft mechanic where the responsibility follows their licenses. On the highway it doesn’t make safety-sense to divide oncoming traffic with a 6-inch yellow line. However, it is accountability that makes this possible. A driver of a vehicle is not accountable to all and everyone on the road, but only to the first approaching vehicle, then accountable to the next vehicle and then accountable to the next vehicle and so on. Accountability is action in motion. Everyone expects that the other driver comprehends the responsibility and is accountable to safety. When driving down a two-lane highway there must be Trust involved.

Trust is the first element of a just culture. Without trust there is nothing. A pilot is trusted to become a part of the operations, trusted with a single engine bush-plane or a multi-million dollars airplane and carrying one or several hundred passengers onboard who trust the pilot and the flight crew. Without trust there are no flights.

Learning is voluntarily. 
Learning is the second element of a just culture. Trust has given a person an opportunity to apply their skills and knowledge, but they are continuing to learn and excel in performance. At times this learning curve levels off, while at other times the learning curve is steep. A steep learning curve may come from new challenges, but also by learning from indents. Incidents are not a requirement for learning and every effort is made to ensure that every flight goes right, in the sense that everyday work achieves its objectives.

The third element of a just culture is Accountability. Accountability is applied to trust and learning. Accountability is to be accountable to safety by staying of the correct side of the yellow line painted on a highway. If it was not learned what the yellow line communicates, a driver could be zigzagging across the line and if there were no trust the opposing traffic could not maintain safe separation.

Information Sharing applicable to the process.
The fourth element is Information Sharing. After trust is established learning is ongoing and accountability has a track record then Information Sharing is implemented. This information sharing, being internal, with stakeholders or with customers as advertising or operational safety is an operational tool for continuous or continual safety improvements. One fabulous way to improve safety is to share ideas across the board and then implement the best ideas in your own operations. One reason that ideas or demands from a regulatory authority does not work in operations is that a regulatory body is implementing ideas in the concept of reactive accountability and outside a just culture.

Accountability is the backbone of a successful SMS. It is not to be held accountable in a traditional reactive concept, but it is to be held accountable in a proactive concept. When there is proactive accountability there is the ability to succeed under varying conditions, so that the number of intended and acceptable outcomes are as high as possible. Accountability is to harness human factors and human resilience.


CatalinaNJB

Saturday, October 7, 2017

Discussing Numbers

Analysis of data in aviation is an application of historical data to predict future events. This data is applied to predict hazards, but is not available as a tool to predict incidents or accidents. Incidents or accidents cannot be predicted since there is no capability within a data analysis to assign time and location of a future incident or accident.

Reporting hazards is SMS painting a picture of the operations.
The concept of predicting any hazard is the same concept as predicting the hazard of an inflight icing conditions. When an airplane is flying into icing conditions, the hazard of icing is predictable based on area weather forecast and knowledge, but it cannot be pre- assigned to one specific future incident or accident since variations are unpredictable.

A high volume of reports for one operator does not necessary conclude that this operator is a higher risk than another operator.  Raw data of reports are often assumed to be a higher risk and assigned a value without further analysis of the data itself. SMS has become a tool where competitors could apply this as their competitor advantage in a contract bid that they have fewer reports than their competition. Anyone who is unfamiliar with the function of an effective SMS could easily buy into this scam that fewer reports equal lower operational risk. This is backwards of what SMS is. Without data, there is a zero-confidence level of the safety culture, or how healthy operational safety is. On the other hand, with collected data the confidence level of how safe operations is can be established by a statistical process control analysis. However, a high volume of reports does not automatically ensure safe operations, but provides more data for analysis of an SMS to implement processes for continuous safety improvements.

Let’s take a moment and analyze reports from three small airports. These airports are similar in size, operations and movements and are within 200 NM of each other. The first airport reported 7 events the last 15 years, but did not make any report submissions the last 5 years. The second airport reported 66 events during the same period, with last reported event this year. The third airport reported 264 events during the same timeframe as the other two. At the first glance, it appears that airport #3 is a high-risk airport compared to the other two. This raw data does not tell a story or provide any valuable information to make statements of operational confidence level. Only by analyzing the data is it possible to paint a picture of operational safety and make statements related to a confidence level of safety in operations.

Analyzing an SMS more than discussing numbers. 
An initial analysis of the operations shows that airport #1 quit reporting 5 years ago. Based on trends this imply that the airport stopped all reporting and not that the events stopped happening. The reason is not known until a further inquiry into airport operations is conducted.
Airport #2 is continuing to report and the numbers of reports are steadily decreasing in numbers these last 15 years. Airport #3 shows a steady reporting structure chart where the numbers of annual reports are variable, but are moving above and below the average line in the chart. Airport #3 has therefore a healthier reporting culture than the other two airports.

When analyzing facts, opposed to opinions, airport #3 is an airport with accountable process for airport users to rely on information and data from this airport in their decision-making processes. Airport #3 can with a high level of confidence state that they have in place operational processes that paints an accurate picture of their airport operations. The other two airports have no supporting documentation to support their opinions that the SMS pictures painted are accurate pictures of their operations. 

Data, information, knowledge and comprehension of operational processes are vital components for continuous safety improvements. Analyzing SMS is more than discussing numbers, where the group or person with a louder voice and better vocabulary wins the argument. When applying strategies and solutions to airport safety it is not the numbers of events that becomes the issue, but the comprehension of airport operations.


CatalinaNJB

Sunday, September 24, 2017

The Safety Card

The Safety Card is played when data does not support an opinion of a decision maker or when safety is not comprehended. The Safety Card is when safety becomes the driving force of operations without considering Residual Risk, which is the remaining risk level that exists after all selected risk control techniques have been implemented or without considering the Substitute Risk, which is the safety risk level that exists of new hazards identified by the introduction of a risk control. The Safety Card is played when safety is not defined, measured or when operational pressure is applied from a third party or social media.

The Safety Card is effective when applied to one event only.
After major aircraft accidents, there is a public outcry, and rightfully so, for airlines to improve safety.  The aviation authorities are scrambling to make new rules to protect the flying public and everyone is alleging that flying is safer than driving a vehicle. Ever since the first flight new rules and regulations have been put in place to improve safety and make flying the utopia of safe travel. But it’s not certain that more regulations make flying safer.

A quote from Transport Canada:
"Traditionally, in rail and in other safety-critical industries, safety had been pursued through compliance with prescriptive rules and regulations. In the 1990s, however, advancements in safety research demonstrated that organizations could be compliant with prescriptive regulations, yet still be unsafe. More specifically, compliance did not necessarily mean effectively managing risks."
Leonardo da Vinci was a pioneer in aviation and 400 years ahead of his time. Below are two of his quotes: “For once you have tasted flight you will walk the earth with your eyes turned skywards, for there you have been and there you will long to return.”, and “Anyone who conducts an argument by appealing to authority is not using his intelligence; he is just using his memory.

When combining these quotes, they become a description of aviation safety and the Safety Management System as we know it. Regulatory compliance is not safety risk assessments and it takes intelligence to assess risks, manage, lead and continuous improve aviation safety. Regulatory compliance is to rely on memory, while intelligence to lead with operational safety processes and the ability to learn or understand or to deal with new or trying situations. When applying memory to SMS the task of memorizing regulations does not challenge operations or assessing risks, while applying intelligence, or human factors, operations are challenged and safety risk levels are assigned.

Customer Satisfaction is loyalty, safety and accountability to the flying public
SMS is data collection and to learn and understand what story the data is telling. Aviation safety is to apply data collected, which is the product of elements with a purpose to generate information, acquire knowledge and develop comprehension for training, competencies and communication within a Safety Promotion System. The public opinion of aviation safety is based on emotions of the outcome of the flight and not on input processes. This is how it must be addressed by the public, who should not have to analyze any data to raise their voice and opinion of safety when flying. An airline only has one option when it comes to manage safety in flying, which is to view their operations from the point of view of a passenger and the public opinion. An effective SMS is where the safety policy and primary objective is to provide a high-quality level of customer service and apply this as a tool for excellence in level of safety. It is impossible to provide a high-quality level of customer service without excellence in operations of a safety management system.

When applying this concept of a customer satisfaction based approach to safety there could be a conflict between the quality-level accepted by a customer and operational control. Opinions based demands from third-parties, customers, social media or an aviation authority could develop unintended hazards and affect safety decisions. Several years ago, and long before SMS became regulated, or accepted as a value-added level of safety in aviation, an operator developed a customer satisfaction based safety management system. The concept of this system was to measure the level of safety from the point of view of customer satisfaction and apply data-based decision tools to operational control. This system functioned for several years until it was decided to apply safety as the primary driving force into operations. While customer satisfaction could be measured, analysed and defined, the concept of safety could not easily be defined or comprehended. The Safety Card was applied equally to all aspects of operations without defining safety critical areas to measure. This opinion based decision to change a word from “customer” to “safety” caused a drift in operational control and drift of processes effectiveness. Introducing the word “safety” to operations does not improve safety unless decisions are based on factual data.

CatalinaNJB

Friday, September 8, 2017

Risk Matrix Differently

Traditionally the risk matrix in aviation is a method to assess a safety risk level and a decision tool to reject or accept that risk level on its own merits. If the risk matrix is in the green area the risk is accepted and if the risk in the red area, it’s not accepted. When the risk matrix is in the yellow area, then something must be done to move the risk to an acceptable green level area. The risk matrix is applied to aircraft performance criteria or airport physical characteristics and the decision is a go or no-go decision. The traditional risk matrix does not guide the decision towards the next process, but ends the decision-making process by rejecting or accepting the risk. The decision-making tool of a risk matrix may be red, green and yellow, but the process itself is just black and white.

A risk assessment is not always perfect.
As the name suggests, the risk matrix is a tool to develop a vision of the risk level, based on certain established criteria. These criteria are generally defined as Likelihood, Severity and assumed Exposure.
Without exposure to the risk there is no likelihood that the risk is affecting safety and the severity is eliminated. The exposure level is assumed to be one (1) at the time when likelihood and severity becomes a factor. An airplane sitting on the runway ready for takeoff is not exposed to an engine failure after takeoff at that time and location, but is systematically preparing for the reaction to an engine failure after takeoff if the exposure becomes a factor. When the flight crews are reviewing their departure emergency procedures they are making an assessment of the likelihood of exposure for that particular flight and a decision to reject or accept the risk level before initiating the takeoff roll. At the time of initiating the takeoff roll the flight crew has accepted that the likelihood for exposure to an engine failure is zero. The crew have just made a go or no-go decision, or a green or red decision and it has become a black and white process. If the risk level process was true, there would never be an engine failure after takeoff.
However, since airplanes still have engine failures after takeoff the assessment of placing the likelihood of exposure into the green box, this risk level acceptance is false.

The different levels in the risk matrix are the likelihood levels and the severity levels. The FAA has defined these levels for application of aviation safety risk levels.

Likelihood Levels
Likelihood is placed into five categories of likelihood with a definition for each category. Likelihood level A is category frequent and defined as expected to occur routinely. Level B is category probable and defined as expected to occur often. Level C is category remote and defined as expected to occur infrequently. Level D is category extremely remote and defined as expected to occur rarely. The last likelihood level is level E, and category extremely improbable and defined as to be so unlikely that it is not expected to occur, but it is not impossible.

Severity Levels
Severity is placed into five categories of severity with a definition for each category. Severity level 5 is category minimal and defined as negligible safety effects. Level 4 is minor and defined as physical discomfort to persons, slight damage to aircraft. Level 3 is major and defined as physical distress or injuries to persons, substantial damage to aircraft. Level 2 is hazardous and defined as multiple serious injuries; fatal injury to a relatively small number of persons (one or two); or a hull loss without fatalities. The last severity level is level 1 catastrophic and defined as multiple fatalities (or fatality to all on board) usually with the loss of aircraft.

Traditional Risk Matrix with unconditional decisions.
Risk Matrix
When an operator unconditionally accepts these acceptable and green risk matrix levels, they accepts the risk that there will be multiple serious injuries; fatal injury to a relatively small number of persons (one or two); or a hull loss without fatalities. The definition extremely improbable is not only applicable to the opinion of likelihood, but also applicable to the process itself and the collection of data. Since the assessment of likelihood is a subjective opinion and not based on data analysis, the definition itself of being extremely improbable is false.
Extremely improbable is only true as a probability analysis based on data but not as a definition of a subjective likelihood level. For the definition, extremely improbable to be true it becomes necessary to conduct a comprehensive research of all operations globally for that type of aircraft since the first flight of that aircraft. The likelihood of extreme improbable is only true for the first flight of that aircraft type. If there was only one malfunction of that type, the definition becomes invalid. However, that an operator still accepts the risk level is an operational decision based on their safety operational confidence level. A confidence level above zero is only possible by operating with an SMS and applying SPC. Everything else is an opinion level.

Risk Matrix Differently Tool
An effective risk matrix should include more than unconditional rejections or acceptance of a risk, and should guide the operator towards further actions. This risk matrix is similar to the above risk matrix, but it is different because it provides an answer of action before rejecting or accepting the risk.

The likelihood levels based on research and data collected and defined by times between event intervals. If an operator does not have data to support a likelihood analysis, other data may be available to borrow from similar operators, from NTSB sites, TSB sites, ICAO sites or other global Civil Authority sites. This likelihood level analysis is not specific to an analysis of one operator, but to all operations with same type of airplanes. It becomes specific to the operator when enough data is collected to conduct a true analysis. E.g. when data is collected for
5 years and the operations is continuing with the same processes a prediction for the next 5 years becomes available. However, when there are changes to the operations or processes, data collected does no longer represent the prediction. One cannot predict the future unless variables are eliminated, but one can accept the risk level based on a true safety operational confidence level. An operator who has a true confidence level of 95% that their operations is failure-free for the purpose of safety is a higher confidence level and safer operations than an opinion based 100% confidence level.

A different Risk Matrix with action.
A different risk matrix tool guides the operator to an action. This action could be to Communicate the issue, Monitor the issue, Pause operations, Suspend Operations or Cease Operations. Before and judgement and decision for rejection or acceptance are made, this risk matrix has guided the operations to an action.

A risk level to Communicate is green, and acceptable. But it is not unconditionally accepted, it is communicated within the organization and to affected personnel. The operations does not have to be interrupted, but an issue, or hazard is being discovered and communicated.
The next level is to Monitor the issue. This does not imply to skip the Communication, but it is to monitor and communicate.
The next level is to Pause. A pause could be for an hour, or a day, depending on the hazard. This Pause level gives the operator an opportunity to assess both aircraft performance, or airport capability and the capability of flight crew. A Suspend level is to stop activities while a comprehensive assessment of risk level and mitigation is conducted. The final level is the Cease level, and this is a level where the risk is transferred. None of these safety risk levels are unconditionally rejected or accepted, or stand-alone risk levels. When a risk level of Cease is defined, the operator is continuing to assess the Suspend, Pause, Monitor and Communicate levels.

The Risk Matrix Differently is a tool to apply SMS principles of continuous or continual improvements without getting locked into rejecting or accepting a risk level.



CatalinaNJB


Thursday, August 24, 2017

Safety Critical Areas and Safety Critical Functions

In the production of aircraft parts there are parts and systems that are more important to maintain safety than other systems. Not all systems are equal important for the safe operations of an aircraft and these systems are the safety critical areas. Within these systems there are parts with identified functions that have a higher probability of causing a catastrophic outcome of the flight when malfunctioning.

Safety critical tools are vital to safety performance.
As with parts, within flight operations there are operational systems that are safety critical areas for the safety of a flight. Within these areas there are safety critical functions, or processes, that are safety critical to operations. Not all flight operational systems and processes are critical for the safety of a flight. In an SMS world, the tasks become to identify what are the vital few safety critical areas and functions of flight operations and what are the trivial many areas and functions.

It is commonly said, accepted in the aviation industry and demanded by the public that regulatory requirements are the minimum requirements for the safe operations of an aircraft. Nothing is farther from the facts than this statement since regulatory compliant pilots, aircraft and operators have since the first flight of 1903 experienced catastrophic accidents. If regulatory requirements were minim safety requirements there would be no accidents. Regulations are the risk level accepted by a Governing State for a Certificate to be issued to an operator with an expectation that catastrophe accidents could happen within undefined intervals. The intent, or design of regulations is not to set up for failure, or accidents, but regulatory compliance itself does not prevent accidents. Regulatory compliance is the authority for an Operator to provide a service to the flying public. However, there is one exemption to this: Where a Safety Management System is regulatory required the accountability and responsibility for safety is placed on the Operator. For an Operator, it is not acceptable to operate within a culture that accepts a catastrophic accident at any intervals, or operate with a risk level that accepts accidents. “We don’t manage Risks; we lead personnel, manage equipment and validate operational design for improved performance above the safety risk level bar.”

The flying public does not accept that safety critical is identified at the onset.
Safety critical areas and safety critical functions are the safety risk level bar which must be exceeded for continuous safety improvements in operations. The demanding task becomes to define and decide on what systems are safety critical, what processes are safety critical functions and what is not safety critical in operations. The purpose of defining safety crucial areas and functions is to operate an SMS that is compatible with safety and not a bureaucratic system for the purpose of supporting the SMS design. If what we do does not promote safety or improve safety, we are just spinning our wheels and reactive processes becomes the determining factor for safety improvements. A proactive safety management system is to define safety critical areas and functions.

Safety Critical Factors in aviation are Human Factors, Organizational Factors, Supervision Factors and Environmental Factors. Derived from these Safety Critical Factors are SMS processes as tools for continuous safety improvements. Data collected are analyzed in a Statistical Process Control software, SPC for Excel (spcforexcel.com) and analyzed in Pareto charts, Attribute control charts or Variable control charts. If an Operator has not collected enough data to analyze processes it is possible to “borrow” data and analyze as applicable to the Operator. These tools are for each Operator to define within their Enterprise what are Safety Critical Areas and Safety Critical Functions and analyze data collected for applications to implement safety changes. Unless SMS is transformed into action it is nothing else but a check-box tool in support of defined processes.

 
CatalinaNJB