Friday, September 8, 2017

Risk Matrix Differently

Traditionally the risk matrix in aviation is a method to assess a safety risk level and a decision tool to reject or accept that risk level on its own merits. If the risk matrix is in the green area the risk is accepted and if the risk in the red area, it’s not accepted. When the risk matrix is in the yellow area, then something must be done to move the risk to an acceptable green level area. The risk matrix is applied to aircraft performance criteria or airport physical characteristics and the decision is a go or no-go decision. The traditional risk matrix does not guide the decision towards the next process, but ends the decision-making process by rejecting or accepting the risk. The decision-making tool of a risk matrix may be red, green and yellow, but the process itself is just black and white.

A risk assessment is not always perfect.
As the name suggests, the risk matrix is a tool to develop a vision of the risk level, based on certain established criteria. These criteria are generally defined as Likelihood, Severity and assumed Exposure.
Without exposure to the risk there is no likelihood that the risk is affecting safety and the severity is eliminated. The exposure level is assumed to be one (1) at the time when likelihood and severity becomes a factor. An airplane sitting on the runway ready for takeoff is not exposed to an engine failure after takeoff at that time and location, but is systematically preparing for the reaction to an engine failure after takeoff if the exposure becomes a factor. When the flight crews are reviewing their departure emergency procedures they are making an assessment of the likelihood of exposure for that particular flight and a decision to reject or accept the risk level before initiating the takeoff roll. At the time of initiating the takeoff roll the flight crew has accepted that the likelihood for exposure to an engine failure is zero. The crew have just made a go or no-go decision, or a green or red decision and it has become a black and white process. If the risk level process was true, there would never be an engine failure after takeoff.
However, since airplanes still have engine failures after takeoff the assessment of placing the likelihood of exposure into the green box, this risk level acceptance is false.

The different levels in the risk matrix are the likelihood levels and the severity levels. The FAA has defined these levels for application of aviation safety risk levels.

Likelihood Levels
Likelihood is placed into five categories of likelihood with a definition for each category. Likelihood level A is category frequent and defined as expected to occur routinely. Level B is category probable and defined as expected to occur often. Level C is category remote and defined as expected to occur infrequently. Level D is category extremely remote and defined as expected to occur rarely. The last likelihood level is level E, and category extremely improbable and defined as to be so unlikely that it is not expected to occur, but it is not impossible.

Severity Levels
Severity is placed into five categories of severity with a definition for each category. Severity level 5 is category minimal and defined as negligible safety effects. Level 4 is minor and defined as physical discomfort to persons, slight damage to aircraft. Level 3 is major and defined as physical distress or injuries to persons, substantial damage to aircraft. Level 2 is hazardous and defined as multiple serious injuries; fatal injury to a relatively small number of persons (one or two); or a hull loss without fatalities. The last severity level is level 1 catastrophic and defined as multiple fatalities (or fatality to all on board) usually with the loss of aircraft.

Traditional Risk Matrix with unconditional decisions.
Risk Matrix
When an operator unconditionally accepts these acceptable and green risk matrix levels, they accepts the risk that there will be multiple serious injuries; fatal injury to a relatively small number of persons (one or two); or a hull loss without fatalities. The definition extremely improbable is not only applicable to the opinion of likelihood, but also applicable to the process itself and the collection of data. Since the assessment of likelihood is a subjective opinion and not based on data analysis, the definition itself of being extremely improbable is false.
Extremely improbable is only true as a probability analysis based on data but not as a definition of a subjective likelihood level. For the definition, extremely improbable to be true it becomes necessary to conduct a comprehensive research of all operations globally for that type of aircraft since the first flight of that aircraft. The likelihood of extreme improbable is only true for the first flight of that aircraft type. If there was only one malfunction of that type, the definition becomes invalid. However, that an operator still accepts the risk level is an operational decision based on their safety operational confidence level. A confidence level above zero is only possible by operating with an SMS and applying SPC. Everything else is an opinion level.

Risk Matrix Differently Tool
An effective risk matrix should include more than unconditional rejections or acceptance of a risk, and should guide the operator towards further actions. This risk matrix is similar to the above risk matrix, but it is different because it provides an answer of action before rejecting or accepting the risk.

The likelihood levels based on research and data collected and defined by times between event intervals. If an operator does not have data to support a likelihood analysis, other data may be available to borrow from similar operators, from NTSB sites, TSB sites, ICAO sites or other global Civil Authority sites. This likelihood level analysis is not specific to an analysis of one operator, but to all operations with same type of airplanes. It becomes specific to the operator when enough data is collected to conduct a true analysis. E.g. when data is collected for
5 years and the operations is continuing with the same processes a prediction for the next 5 years becomes available. However, when there are changes to the operations or processes, data collected does no longer represent the prediction. One cannot predict the future unless variables are eliminated, but one can accept the risk level based on a true safety operational confidence level. An operator who has a true confidence level of 95% that their operations is failure-free for the purpose of safety is a higher confidence level and safer operations than an opinion based 100% confidence level.

A different Risk Matrix with action.
A different risk matrix tool guides the operator to an action. This action could be to Communicate the issue, Monitor the issue, Pause operations, Suspend Operations or Cease Operations. Before and judgement and decision for rejection or acceptance are made, this risk matrix has guided the operations to an action.

A risk level to Communicate is green, and acceptable. But it is not unconditionally accepted, it is communicated within the organization and to affected personnel. The operations does not have to be interrupted, but an issue, or hazard is being discovered and communicated.
The next level is to Monitor the issue. This does not imply to skip the Communication, but it is to monitor and communicate.
The next level is to Pause. A pause could be for an hour, or a day, depending on the hazard. This Pause level gives the operator an opportunity to assess both aircraft performance, or airport capability and the capability of flight crew. A Suspend level is to stop activities while a comprehensive assessment of risk level and mitigation is conducted. The final level is the Cease level, and this is a level where the risk is transferred. None of these safety risk levels are unconditionally rejected or accepted, or stand-alone risk levels. When a risk level of Cease is defined, the operator is continuing to assess the Suspend, Pause, Monitor and Communicate levels.

The Risk Matrix Differently is a tool to apply SMS principles of continuous or continual improvements without getting locked into rejecting or accepting a risk level.



CatalinaNJB


Thursday, August 24, 2017

Safety Critical Areas and Safety Critical Functions

In the production of aircraft parts there are parts and systems that are more important to maintain safety than other systems. Not all systems are equal important for the safe operations of an aircraft and these systems are the safety critical areas. Within these systems there are parts with identified functions that have a higher probability of causing a catastrophic outcome of the flight when malfunctioning.

Safety critical tools are vital to safety performance.
As with parts, within flight operations there are operational systems that are safety critical areas for the safety of a flight. Within these areas there are safety critical functions, or processes, that are safety critical to operations. Not all flight operational systems and processes are critical for the safety of a flight. In an SMS world, the tasks become to identify what are the vital few safety critical areas and functions of flight operations and what are the trivial many areas and functions.

It is commonly said, accepted in the aviation industry and demanded by the public that regulatory requirements are the minimum requirements for the safe operations of an aircraft. Nothing is farther from the facts than this statement since regulatory compliant pilots, aircraft and operators have since the first flight of 1903 experienced catastrophic accidents. If regulatory requirements were minim safety requirements there would be no accidents. Regulations are the risk level accepted by a Governing State for a Certificate to be issued to an operator with an expectation that catastrophe accidents could happen within undefined intervals. The intent, or design of regulations is not to set up for failure, or accidents, but regulatory compliance itself does not prevent accidents. Regulatory compliance is the authority for an Operator to provide a service to the flying public. However, there is one exemption to this: Where a Safety Management System is regulatory required the accountability and responsibility for safety is placed on the Operator. For an Operator, it is not acceptable to operate within a culture that accepts a catastrophic accident at any intervals, or operate with a risk level that accepts accidents. “We don’t manage Risks; we lead personnel, manage equipment and validate operational design for improved performance above the safety risk level bar.”

The flying public does not accept that safety critical is identified at the onset.
Safety critical areas and safety critical functions are the safety risk level bar which must be exceeded for continuous safety improvements in operations. The demanding task becomes to define and decide on what systems are safety critical, what processes are safety critical functions and what is not safety critical in operations. The purpose of defining safety crucial areas and functions is to operate an SMS that is compatible with safety and not a bureaucratic system for the purpose of supporting the SMS design. If what we do does not promote safety or improve safety, we are just spinning our wheels and reactive processes becomes the determining factor for safety improvements. A proactive safety management system is to define safety critical areas and functions.

Safety Critical Factors in aviation are Human Factors, Organizational Factors, Supervision Factors and Environmental Factors. Derived from these Safety Critical Factors are SMS processes as tools for continuous safety improvements. Data collected are analyzed in a Statistical Process Control software, SPC for Excel (spcforexcel.com) and analyzed in Pareto charts, Attribute control charts or Variable control charts. If an Operator has not collected enough data to analyze processes it is possible to “borrow” data and analyze as applicable to the Operator. These tools are for each Operator to define within their Enterprise what are Safety Critical Areas and Safety Critical Functions and analyze data collected for applications to implement safety changes. Unless SMS is transformed into action it is nothing else but a check-box tool in support of defined processes.

 
CatalinaNJB

Thursday, August 10, 2017

SMS And Captain’s Authority

There are several accident reports of Captains making one single decision which is leading to a fatal accident. The first officer of other flight crew members may have attempted to communicate with the Captain but without luck. Often investigations would assume that if another flight crew member would have interfered with the Captain’s duties the accident would have been avoided. When sitting at an office desk with 20/20 hindsight, these accidents could have been averted, but at the time and location of event the Captain and first officer were not performing anything else but what they were trained for.

Training is more than the official training where check-boxes are filled in. Training also includes normal operations or organizational expectations of priorities and unwritten rules. Air Florida 90 departing Washington National Airport VA, United 173 on approach to Portland OR, Air Ontario 1363 departing Dryden ON, Uruguayan Air Force 571 in the Andes Mountains and KLM 4805 departing Los Rodeos Airport are all examples of Captain’s decision as the final link of an accident. When a Captain is about to make a fatal decision a lower ranking flight crew member may view this as a responsibility under a Safety Management System program to make safety decisions and interfere with the Captains’ duties, or physically take control of the aircraft.

Major accidents have generated great safety improvements.
The Captain of an aircraft is a person who is acting as the pilot-in-command and having responsibility and authority for the operation and safety of the aircraft during flight time. Flight time is the time from the moment an aircraft first moves under its own power for the purpose of taking off until the moment it comes to rest at the end of the flight.

A Safety Management System does not override this regulatory requirement. The purpose of the Safety Management System is to operate with an additional layer of safety and improve safety by continuous or continual improvements. Continuous improvement is to make changes to the current processes for improvement, while continual improvement is achieved by identifying process capability and making changes to the capability of operations, or processes to produce a more desired outcome. The beauty of an SMS is that the Safety Management System contains a process for ensuring that personnel are trained and competent to perform their duties and that they are accountable to safety. The Captain must always be trained to be competent to make final decisions and perform duties as the final authority. This authority can not be removed from the Captain. Accountability within an SMS-world is for a person, without supervision, to comply with regulatory requirements, standards, policies, recommendations, job descriptions, expectations or intent of job performance and for personnel to be actively and independently involved. Derived from accountability comes a Just Culture, which is an organizational culture where there is Trust, Learning, Accountability and Information Sharing.

When an Enterprise expects a lower ranking crew member to interfere with the Captain’s duties, based on this person’s opinion, the Enterprise has neither trained the Captains nor other flight crew members to perform their duties. The Captain’s duties are the authority for the operation and safety of the aircraft, which includes analyzing any information available for decision making. The Captain is the ultimate authority for the safe operations of an aircraft and interfering with this authority is a regulatory non-compliance activity. Any air operator should have a training program in place where the lower ranking flight crew members has an opportunity to volunteer safety information to the Captain at any time during flight time without the authority to take operational control of the aircraft. When an Enterprise is widely accepting that a lower ranking officer has the authority to interfere with the Captain’s duties there is no opportunity for safety improvements since the Enterprise is relying on the non-captain to make decisions.

Major Accidents Generates Safety Improvements

After the Air Florida 90 departing Washington National Airport VA airlines began enacting policies to ensure that at least one and more seasoned crew member was on board planes at all times. They
Major accidents have generated great safety improvements.
also began reappraising the traditional unwritten rule that the captain could not be questioned. From that point onward, first officers were encouraged to speak up if they believed a captain was making a mistake. Applying this concept is SMS in an undocumented format, where the Captain has access to information from flight crew members to make the best decision for safe operations.
After the United 173 on approach to Portland OR training addressed behavioral management challenges such as poor crew coordination, loss of situational awareness, and judgment errors frequently observed in aviation accidents. Applying this concept is SMS in an undocumented format and accepts that human behaviours or human factors play a role in safety.
After the Air Ontario 1363 departing Dryden ON many significant changes were made to the Canadian Aviation Regulations. These included new procedures regarding re-fuelling and de-icing as well as many new regulations intended to improve the general safety of all future flights in Canada. Applying this concept is SMS in an undocumented format in that proactive measurements are implemented for continuous safety improvements.
After the KLM 4805 departing Los Rodeos Airport accident changes were made to international airline regulations and to aircraft. Aviation authorities around the world introduced requirements for standard phrases and a greater emphasis on English as a common working language.
Cockpit procedures were also changed. Hierarchical relations among crew members were played down. More emphasis was placed on team decision-making by mutual agreement, part of what has become known in the industry as crew resource management. Applying this concept is SMS in an undocumented format where an Enterprise accepts that not only knowledge, but also comprehension of data is vital to safety.
After the Uruguayan Air Force 571 in the Andes Mountains there were no major safety
Remember rules or comprehend safety.
improvements implemented. However, this is also to apply the concept of SMS where the risk level, based on data, is accepted or rejected. In this case the risk level for this type of accident to happen again was accepted and no major changes to safety were implemented. As knowledge and comprehension were gained, human factors later became a safety component which had been overlooked in 1972.

SMS is that aviation safety has no end. SMS is that current safety comprehension level may be different in a few years and that other latent hazards are discovered. SMS is continuous or continual improvements where every day is a new challenge to ensure complete safety for the traveling public.



CatalinaNJB

Wednesday, July 26, 2017

AC759 CLEARED TO LAND 28R

SMS Does Not Make Aviation Safer
On July 7, 2017 Air Canada 759 lined up the approach for landing on Taxiway Charlie at SFO. This scenario of parameters was set up for the worse accident in the history of aviation. Based on this incident, does the argument hold water that a Safety Management System makes flying safer? Aviation safety is determined by several factors, where one factor to make flying safer is how well an enterprise is applying SMS as an additional layer of safety in support of their safety processes. An enterprise that supports bureaucratic processes or processes that are designed to support the organization are check-box syndrome processes. These processes only have one goal, which is to control, but not manage, the operational Safety Management System.  In the public opinion, the blame-factor may be assigned to the Safety Management System requirement itself. However, the Safety Management System itself is fail-free system and telling a story of safety performance. Since it is a parallel system to the operational safety systems, SMS is collecting samples of data to be applied to processes as an additional layer of safety. SMS is a system which regularly checks in with the operations for a snapshot.  

What is SMS?
SMS is the “ugly duckling” in safety that is to blame when things go wrong.  When expectations,
which are only opinions, are developed as guidance material under an SMS and applied as prescriptive regulations, then safety operating processes are set up for failure. With this approach an SMS is not given the authority, accountability or opportunity to function within a just culture, where there is trust, learning, accountability and information sharing. When expectations are applied as prescriptive regulatory requirement the first task of SMS becomes to ensure that the check-boxes are correctly checked and completed. In a bureaucratic organization and operating with in compliance with the check-box syndrome, any reference to operational safety is determined by the status of their check-boxes. SMS is not to count the checked boxes, but it is hard work to make operational processes safer today then what they were yesterday.

What SMS Is Not
SMS is not the magic wand of miracles for accidents never to happen again and SMS is not a system where prescriptive expectations are applied as regulations. SMS is not a one-fit-all model and SMS is not a model where everything is acceptable. SMS is not emotions or opinions based and SMS is not where processes must conform to SMS design. SMS is not a system of perfect people or a system within a perfect virtual world. SMS is not the trial and error system and SMS is not a system with an end or beginning. SMS is not to roll the dice for an answer, but it’s to drop the marbles to see where they scatter.
There are a lot of things that SMS is not, but all of these things what SMS is not, are what SMS has become.

SMS Has Become A Conglomerate Of Opinions
SMS has become a conglomerate of opinions by the virtue of good intent to make flying safer. However, good intent and opinion have turned out to be the “killer-bee” of aviation safety. SMS has become so very complex that very few can explain why certain processes are applied or corrective actions are applied. Often these changes are made since the regulator is macro managing portions, or all of an enterprise. When a regulatory finding is given to an emergency response plan full-scale test because the test discovered deficiencies, then the SMS did not fail but was successful by the discovery of faulty processes. In an attempt to establish the utopia of safety, SMS has become a system where everything is defined as a safety issue and to the degree where safety itself has become virtual facts. Operational size and complexity is forgotten and Safety Critical Areas and Safety Critical Functions have no meaning.

Safety Critical Areas And Safety Critical Functions
An enterprise is failing their SMS unless their SMS includes Safety Critical Areas and Safety Critical
Functions. Anything else by to operate with an SMS for the purpose of improving safety is to support the red-tape of a bureaucratic enterprise where the processes are designed to support their design and not their operations.

Defining Safety Critical Areas and Safety Critical Functions is to place weight on areas of operations and functions within these areas. Not all areas of aviation are safety critical. In an organization where there are no safety critical areas or functions, the decision making process is simple, but without accountability. A Safety Critical Area could be night approaches, with a Safety Critical Functions being approaches to SFO or YCB at midnight during the month of July.  An approach to SFO may be safety critical function, while an approach to YCB in the High Arctic may not be safety critical. On the other hand,
an approach into YCB on a January day at noon might be a safety critical function. When all areas of aviation are assigned the same key or same weight to safety critical areas and functions, it becomes impossible to target areas for learning and training purposes. Safety now has become wishful thinking of a utopia of aviation where accidents never will happen again. Only when it is understood that an accident could happen in the future is when the SMS tools are ready to be applied to the operational level.

A vital question to ask for continuous safety is: Does Transport Canada accepts anything less from an Enterprise but that all areas are safety critical areas and all functions are safety critical functions? If they expect that all areas and functions of aviation must be equal safety critical it becomes a conflicting task for the operator to operate with an effective SMS. In a bureaucratic organization it is preferred that everything and everyone are equal.

An enterprise operating SMS without safety critical areas and functions are spinning their wheels. SMS is the NextGen of aviation safety where processes, or “how we normally do things”, are analyzed for variables and to what risk-level these variables are affecting operations.

Does One Incident Qualify As System Failure?
Yes, it does when criteria are met and based on guidance supported by Transport Canada, it does.
This practice has been established by awarding Enterprises system failure findings for one single failure to meet an expectation. System failures are any findings under CARs 107.02. Based on this one incident, since AC 759 approach most likely did not meet the expectation; qualify as an organizational system failure. However, coming to a conclusion that one irregularity is a system failure is rush to judgement unless that one irregularity is a function of time and comprehension. Often TC inspectors are rushing to judgement when issuing a system finding for one non-compliance with one expectation. Transport Canada inspectors may have good intentions when applying expectations as system failures, but have departed from the concept of SMS and further departed from the basic of their initial SMS training when SMS first became a regulatory requirement.

Transport Canada SMS Survey
In a survey published din the JDA Journal, April 13, 2017, the vast majority of Transport Canada Inspectors view themselves as having better knowledge of airline operations  than the operator themselves have and that TC inspectors are better qualified than the operator by fixing safety problems before they become accidents or incidents. Further, this survey identified correctly that SMS is to transfers responsibility for setting acceptable risk levels and monitoring safety performance is the responsibility of the operator themselves. This is a vital and valid point to improve safety in aviation since the system under SMS is operational based and not bureaucratic based for Transport Canada to accept the risk. SMS is the NextGen of aviation safety where the regulator is removed from operations. Another point in this survey is that 81 percent of inspectors surveyed predicted a major aviation accident soon. Nobody can predict a major aviation accident, not even a TC inspector. This survey reveals the bias of inspectors towards Canadian aviation operators and the inspector’s utopia view that they know best. Transport Canada Inspectors have yet to show any data collaborating SMS-failure statements over the last ten years. This survey was published as “A Learning Lesson For FAA”.

SMS Did Not Fail
SMS did not fail AC 759 on the approach to 28R at SFO. It was the operational practices, or how the flight normally is done that failed AC 759. If there was none, two or five aircraft on Taxiway Charlie is relevant to the potential catastrophic outcome, but irrelevant to the SMS processes. Since the quality of processes are unknown until the final output is known, it becomes vital to safety that the progress of operational safety processes allows for ongoing risk assessment and decision making by flight crew.  Quality Assurance is a result of variations in quality output by the same process.

Does SMS Make Flying Safer?
Yes, SMS makes flying safer and the SFO incident does not make flying any less safe.  Safety
management systems help companies identify safety risks before they become bigger problems. If one company ignores their own SMS tools available and the help SMS offers makes that one company less safe than if they had elected to apply their tools and predict the safety risk level of identified hazards. Flying doesn’t become less safe, or have a reduction in safety without SMS, but an organization without SMS lacks the opportunity for continuous safety improvements by applying SMS concept and principles. Over time an enterprise that is “listening to their SMS” is gaining grounds in safety and become safer in operations than non-SMS, or the “ignore-SMS” organizations.

Additional Layer and Parallel Approach
SMS is an additional layer of safety and parallel approach to operational processes. SMS does not make it safe or unsafe to fly, but SMS provides data, which is processed into information and then applied as knowledge and comprehension for safer operational practices. It is when this data has reached the level of comprehension that it can be integrated into a policy, design of processes and improvement of safety. In an SMS world it is still the flight crew, maintenance personnel, ground crew and the enterprise’s management that makes a difference in continuous safety improvement or continuous decline. However, SMS is an invaluable tool that some are overlooking but they are still expecting miracles from SMS. In addition, a contributing cause factor to the SFO incident is that
airports are outdated by its 1903 design without adapting to size, complexity and traffic volume.

AC759 Cleared To Land RWY 28R
Aviation Safety must always be viewed from the public’s point of view, which is an expectation of a pleasant experience and no incidents between boarding the airplane and deplaning at destination. For some passengers it might be comforting to know that Air Canada 759 did not end in a disaster, while for others this might be a horrified experience when learning about this incident to a point where they will never travel on an airplane again. Either way, the public is expecting quality performance of aircraft and flight crew. The flight crew of AC 759 on short final realized that something was wrong, but expected someone else, in this case the Tower Controller, to make a decision for them of what to do.  This principle of avoiding safety actions is outside the parameters of an effective SMS system.

When flying regularly, technical skills are improving while technical knowledge skills levels are reduced to the degree of application. The consequences is that the performance factor are unknowingly declining until it reaches the time limit of comprehension and reached the level of unacceptable performance risk factor. A flight crew that are not able to comprehend options available when there are lights on the runway they are cleared to land on  is an enterprise systematic failure of performance management and not individual errors. The flight crew made an inquiry to the Tower Controller as to why there were lights showing on a runway where they had been cleared to land without first initiate safety actions and deviate away from the hazard.

CatalinaNJB

Thursday, July 13, 2017

SMS Information In A Suitable Medium

What is a suitable medium and who should it be suitable for?  The unknown variable of this question is who is to decide if the SMS is in a suitable medium or not. Since there is a regulatory requirement is for SMS information is made available in a suitable medium the regulation is written with ambiguity to allow for differences due to organizational size and complexity, but it’s also written for the regulator to decide based their own opinion what is a suitable medium. Airports are the smallest in size enterprise required to operate with an SMS, where there could be a couple of employees, where one is the Airport Manager and SMS Manager, and the other is the Accountable Executive. A suitable medium may be different for an airport than a large airline with thousands of employees. 

At one time, gold and cow hide may have been a suitable medium.
A suitable medium may mean something totally different from person to person, from organization to organization or from time to time. Some years ago, an electronic manual would not be a suitable medium to maintain SMS information and today paper format manuals may be obsolete. For SMS information and documentation to be established in a suitable medium, it must be suitable to the user group. The medium does not need to be suitable to the regulators or to the Accountable Executive, but must be suitable to the users who are feeding data into the SMS system.

A medium, or system that is not userfriendly for the submission of data is a safety system which in the long run will become an ineffective reactive system. This system would not operate within the concept of SMS which is a system of a just culture with proactive safety initiatives and accountability. When a system is developed for the purpose of supporting the bureaucracy of an airport or airline, it becomes ineffective as a supporting tool while a system designed in support of the processes becomes an effective safety management tool. An SMS system is fail-free since it’s another layer of safety supporting the operational safety processes.

A bureaucratic enterprise of SMS tools is best recognized by their attempt to adapt processes by enforcing the design upon operations, rather than changing the design SMS to adapt to operational safety processes. In other words, when a process is in non-compliance with the designed process, the failure may be with the design and not with the process itself.

An attempt to improve safety by enforcing operational design processes onto operational processes may cause unintended effects. If the operational performance of an aircraft is in non-compliance with the design performance, it does not improve safety by enforcing the design performance. Most enterprises are not strictly bureaucratic or an adhocracy systems, but area flexible to accommodate for both a bureaucratic system with prescriptive policies and an adhocracy system for flexibility for safe operations within size and complexity of the operations.

Establishing the SMS system in a suitable medium becomes more than just checking the check-box that an individual has drawn the line in the sand and determined what medium is suitable as a one-fit-all medium. A suitable medium may include more than one medium and include both paper format and electronic format in addition to smart-phone apps. When applying multiple mediums as suitable a bureaucratic enterprise may have difficulties adapting and analyzing data from these different sources, while an adhocracy may have invented additional process to capture all data for analysis. An effective enterprise adapts to processes for continuous safety improvements.  

CatalinaNJB

Wednesday, June 28, 2017

The Fork In The Road Test

All roads lead to Rome and there are many different ways of reaching the same goal or objective. Finding the rootcause takes a road trip defining the turns and forks in the road. If travelling by air the course may take a detour by relying on the old ADF, or be more effective following a GPS course. There are several root cause analysis techniques and they all serve a purpose to improve safety and one rootcause model may be as effective, or ineffective as another. All rootcause analysis models are designed to establish at what time or location in the failed process a different approach could have made a different outcome. The 5-Why and fish bone rootcause analysis are widely accepted within the aviation industry and assumed to have established the correct rootcause. A risk assessment of substitute and residual risks is normally conducted after the rootcause analysis to identify if there are other or unexpected hazards by the implementation of proposed corrective action plan in the form of a new risk control strategy. As a compliance criteria, the enterprise monitors the cap with a follow-up as an assessment of the effectiveness of safety improvement.

Without knowledge one fork in the road is as good as another.
Monitoring the effect of corrective action is a standard procedure for follow-up of CAP implementation. Monitoring and follow-up may be dependant on seasonal differences or timeframes for collection of enough data to establish the effectiveness. If an enterprise has lost control of their safety processes and decided to implement corrective strategies The Fork In The Road Test is a tool to identify if steps in the evaluation process are taking shortcuts and jumping to a conclusion that the new strategy is the correct strategy. This shortcut is an attempt to break the wall in a maze to make it to the end of the tunnel without following the process path.

The Fork In The Road Test is to backtrack the process to establish where in the maze the failure of the wall was  and to establish the time and location in the future where the missing link of a CAP is. This does not imply that an incident or accident can be predicted, but it implies that knowledge is vital to predict the hazards affecting the process.

When building and operating out of an ice-runway there are many considerations affecting the design of the runway. Since the runway thaws out every year, it must be rebuilt the next season and located at the exactly same location to be validated as the same runway and applying the same instrument approach procedures. In addition to the runway itself, the ice movement over time offset the precision approach to a point where it becomes unusable. When applying The Fork In The Road Test to the runway the time begins at the time when the ice melted and the location begins at the location where the ice threshold was located.

It becomes simple to see The Fork In The Road if an aircraft landed on the ice in the spring when most of the ice had melted. The time can is backtracked and establish that a change in the direction, or taking a different turn at the fork in the road at that time would have made a difference. The Fork In The Road is not the time and location when the flight crew had to make decision to change flight path, but at what time and location the aircraft could have been expected to complete the flight without an incident. When the ice is melted the aircraft was doomed at the time of departure. On the
The Fork In The Road does not always take a straight path.
other hand let’s assume that the accident happened in the middle of the winter with a foot of ice and minus 45 degrees temperature. At this point it becomes a scientific task to establish where the fork in the road was. The task becomes to identify if there were special variations that caused the change of course or an incident, or if there was normal variations that were overlooked. E.g. melting ice would be a normal variation, blowing snow would be a normal variation, darkness would be a normal variation and ice-ridges would be a normal variation.

Let’s for a minute assume that the airplane hit an ice-ridge. This establishes the Fork In The Road at a time prior to the aircraft departed from civilization. The Fork In The Road is the one trigger that would, without doubt, made a difference for safe operations. In this scenario, it would have made a difference if the normal variation had been identified and runway inspected and assessed as safe for operations. The Fork In The Road Test is not applicable to events after departure since the departure itself locked the aircraft into a path where at some point in time the flight crew would have to make a reversal decisions or an incident would happen. The Fork In The Road Test had predicted a hazard of normal variations, but since the Fork In The Road Test was not applied the hazard was not identified.

At the world’s most dangerous airport a siren sounds about 45 minutes prior to arrival and before the aircraft departs for this destination. The Fork In The Road has been identified and the corrective action implemented at the time and location where it effectively makes a difference and aviation has become safer. The Fork In The Road is in the planning and decision to complete a pleasant flight.

CatalinaNJB

Saturday, June 17, 2017

SMS Communication

A small operator communicates differently with their personnel than how a mega-enterprise would communicate. An airport with 2 or 3 people, being an Airport Manager, SMS Manager and Accountable Executive may communicate verbally without much documentation, while a larger airport may use multiple levels of communications processes. Both operators must meet the same requirement of the expectation that communication processes (written, meetings, electronic, etc.) are commensurate with the size and complexity of the organization. When applying this expectation without ambiguity, or applying the expectation with fairness to the expectation itself, both operators are expected to apply the exact same processes in communication. The simplest avenue when assessing for regulatory compliance is to apply the more complex communication processes to both operators. When applying this approach, the smaller airport’s SMS becomes a bureaucratic, unprofessional and ineffective tool for safe operations.

Small airport communication has also changed with the times.
At first glance it looks great that a small airport is expected to operate with different communication processes than a large airport, but when analyzing all available options, there is very little tolerance, or none, to apply this expectation in any other way than a prescriptive regulatory requirement. It is only when the operation is understood that the expectation can be applied with ambiguity, and with unfairness to the expectation itself, for an effective communication process for any size and complexity airport operators.

When there is a finding at a small, or large airport that the communication did not meet the regulatory requirement through this expectation, in that the information had been forgotten, misplaced or incorrectly interpreted, the operator is required to identify policies, processes procedures and practices involved that allowed for this non-compliance to occur. This in itself, that an operator allowed for a non-conformance to occur is a statement of bias in the finding implying that the operator had an option not to let this non-compliance occur. If this option was available at that time, the operatory would have taken different steps. The reason the non-compliance occurred is that the option to make a change was not available at the time when it occurred. All systems within the SMS were not function property and often it is the systems of human factors, organizational factors, supervision factors or environmental factors. Reviewing a finding in 20/20 hindsight is a simple task and to point out what could have been done differently becomes the task of applying the most complex process. However, when the operations is in the moment, the options at that location and point in time are limited to snapshots only of information, knowledge and comprehension of the events.

Human factors in communication is today integrated in automation and not visible.
Since there is an assumption in the root cause analysis requirements that the non-compliance was allowed, the question to the finding is no longer what happened, but why the operator allowed this event to escalate beyond regulatory requirements. The difference between a “what” and a “why” question is that “what” is data and “why” is someone’s opinion. When the finding is issued during an audit by the regulatory oversight team, the answer to the “why” question becomes the opinion of one person only of that team. When the opinion of that person becomes the determining factor of a root cause analysis it has become impossible to analyze the event for a factual root cause.  It could also be that the answerer to the “why” question becomes a compromised, or an average of the views of all inspectors, in which the answer is no longer relevant to the finding, but to a process where all opinions are considered. On the other hand, when the “what” question becomes the determining factor, each link to what happened must be supported by data and documented events. If the events of the “what” cannot be answered first, or before any other questions are answered, it has become an impossible task to assign a root cause to the finding. When the “what” question is answered, then a change to the “what” may be assigned and implemented.

This does not imply that the “why” question is not to be asked, but it becomes a factor of how the “why” is asked, and if the determining factor to the “why” question is an agreement between several people in a group to assign an average of indifferences, or if the “why” question is answered to the “what” question. When applying the 5-why process, the answer, or root cause, is established by the answer to the first “why”, since the rest of the answers must be locked in to the first. The more effective root cause analyses are the “fish-bone”, the “5-why matrix”, or the “fork-in-the-road” test.

When the requirement of the second expectation, as stated in the root cause analysis document, that the non-compliance was allowed, it changes the first expectation within an SMS element of different communication based on size and complexity of the airport to a prescriptive regulatory requirement. The prescriptive requirement then becomes the common denominator for the event that was allowed to occur and must be applied to the most complex communication process. The simplest way to look at this is that when the “allowed to” is allowed to be applied to an event, it is assumed that human variations do not exist and that the system is operating in an undisputed perfect virtual environment.

CatalinaNJB