Thursday, August 10, 2017

SMS And Captain’s Authority

There are several accident reports of Captains making one single decision which is leading to a fatal accident. The first officer of other flight crew members may have attempted to communicate with the Captain but without luck. Often investigations would assume that if another flight crew member would have interfered with the Captain’s duties the accident would have been avoided. When sitting at an office desk with 20/20 hindsight, these accidents could have been averted, but at the time and location of event the Captain and first officer were not performing anything else but what they were trained for.

Training is more than the official training where check-boxes are filled in. Training also includes normal operations or organizational expectations of priorities and unwritten rules. Air Florida 90 departing Washington National Airport VA, United 173 on approach to Portland OR, Air Ontario 1363 departing Dryden ON, Uruguayan Air Force 571 in the Andes Mountains and KLM 4805 departing Los Rodeos Airport are all examples of Captain’s decision as the final link of an accident. When a Captain is about to make a fatal decision a lower ranking flight crew member may view this as a responsibility under a Safety Management System program to make safety decisions and interfere with the Captains’ duties, or physically take control of the aircraft.

Major accidents have generated great safety improvements.
The Captain of an aircraft is a person who is acting as the pilot-in-command and having responsibility and authority for the operation and safety of the aircraft during flight time. Flight time is the time from the moment an aircraft first moves under its own power for the purpose of taking off until the moment it comes to rest at the end of the flight.

A Safety Management System does not override this regulatory requirement. The purpose of the Safety Management System is to operate with an additional layer of safety and improve safety by continuous or continual improvements. Continuous improvement is to make changes to the current processes for improvement, while continual improvement is achieved by identifying process capability and making changes to the capability of operations, or processes to produce a more desired outcome. The beauty of an SMS is that the Safety Management System contains a process for ensuring that personnel are trained and competent to perform their duties and that they are accountable to safety. The Captain must always be trained to be competent to make final decisions and perform duties as the final authority. This authority can not be removed from the Captain. Accountability within an SMS-world is for a person, without supervision, to comply with regulatory requirements, standards, policies, recommendations, job descriptions, expectations or intent of job performance and for personnel to be actively and independently involved. Derived from accountability comes a Just Culture, which is an organizational culture where there is Trust, Learning, Accountability and Information Sharing.

When an Enterprise expects a lower ranking crew member to interfere with the Captain’s duties, based on this person’s opinion, the Enterprise has neither trained the Captains nor other flight crew members to perform their duties. The Captain’s duties are the authority for the operation and safety of the aircraft, which includes analyzing any information available for decision making. The Captain is the ultimate authority for the safe operations of an aircraft and interfering with this authority is a regulatory non-compliance activity. Any air operator should have a training program in place where the lower ranking flight crew members has an opportunity to volunteer safety information to the Captain at any time during flight time without the authority to take operational control of the aircraft. When an Enterprise is widely accepting that a lower ranking officer has the authority to interfere with the Captain’s duties there is no opportunity for safety improvements since the Enterprise is relying on the non-captain to make decisions.

Major Accidents Generates Safety Improvements

After the Air Florida 90 departing Washington National Airport VA airlines began enacting policies to ensure that at least one and more seasoned crew member was on board planes at all times. They
Major accidents have generated great safety improvements.
also began reappraising the traditional unwritten rule that the captain could not be questioned. From that point onward, first officers were encouraged to speak up if they believed a captain was making a mistake. Applying this concept is SMS in an undocumented format, where the Captain has access to information from flight crew members to make the best decision for safe operations.
After the United 173 on approach to Portland OR training addressed behavioral management challenges such as poor crew coordination, loss of situational awareness, and judgment errors frequently observed in aviation accidents. Applying this concept is SMS in an undocumented format and accepts that human behaviours or human factors play a role in safety.
After the Air Ontario 1363 departing Dryden ON many significant changes were made to the Canadian Aviation Regulations. These included new procedures regarding re-fuelling and de-icing as well as many new regulations intended to improve the general safety of all future flights in Canada. Applying this concept is SMS in an undocumented format in that proactive measurements are implemented for continuous safety improvements.
After the KLM 4805 departing Los Rodeos Airport accident changes were made to international airline regulations and to aircraft. Aviation authorities around the world introduced requirements for standard phrases and a greater emphasis on English as a common working language.
Cockpit procedures were also changed. Hierarchical relations among crew members were played down. More emphasis was placed on team decision-making by mutual agreement, part of what has become known in the industry as crew resource management. Applying this concept is SMS in an undocumented format where an Enterprise accepts that not only knowledge, but also comprehension of data is vital to safety.
After the Uruguayan Air Force 571 in the Andes Mountains there were no major safety
Remember rules or comprehend safety.
improvements implemented. However, this is also to apply the concept of SMS where the risk level, based on data, is accepted or rejected. In this case the risk level for this type of accident to happen again was accepted and no major changes to safety were implemented. As knowledge and comprehension were gained, human factors later became a safety component which had been overlooked in 1972.

SMS is that aviation safety has no end. SMS is that current safety comprehension level may be different in a few years and that other latent hazards are discovered. SMS is continuous or continual improvements where every day is a new challenge to ensure complete safety for the traveling public.


Wednesday, July 26, 2017


SMS Does Not Make Aviation Safer
On July 7, 2017 Air Canada 759 lined up the approach for landing on Taxiway Charlie at SFO. This scenario of parameters was set up for the worse accident in the history of aviation. Based on this incident, does the argument hold water that a Safety Management System makes flying safer? Aviation safety is determined by several factors, where one factor to make flying safer is how well an enterprise is applying SMS as an additional layer of safety in support of their safety processes. An enterprise that supports bureaucratic processes or processes that are designed to support the organization are check-box syndrome processes. These processes only have one goal, which is to control, but not manage, the operational Safety Management System.  In the public opinion, the blame-factor may be assigned to the Safety Management System requirement itself. However, the Safety Management System itself is fail-free system and telling a story of safety performance. Since it is a parallel system to the operational safety systems, SMS is collecting samples of data to be applied to processes as an additional layer of safety. SMS is a system which regularly checks in with the operations for a snapshot.  

What is SMS?
SMS is the “ugly duckling” in safety that is to blame when things go wrong.  When expectations,
which are only opinions, are developed as guidance material under an SMS and applied as prescriptive regulations, then safety operating processes are set up for failure. With this approach an SMS is not given the authority, accountability or opportunity to function within a just culture, where there is trust, learning, accountability and information sharing. When expectations are applied as prescriptive regulatory requirement the first task of SMS becomes to ensure that the check-boxes are correctly checked and completed. In a bureaucratic organization and operating with in compliance with the check-box syndrome, any reference to operational safety is determined by the status of their check-boxes. SMS is not to count the checked boxes, but it is hard work to make operational processes safer today then what they were yesterday.

What SMS Is Not
SMS is not the magic wand of miracles for accidents never to happen again and SMS is not a system where prescriptive expectations are applied as regulations. SMS is not a one-fit-all model and SMS is not a model where everything is acceptable. SMS is not emotions or opinions based and SMS is not where processes must conform to SMS design. SMS is not a system of perfect people or a system within a perfect virtual world. SMS is not the trial and error system and SMS is not a system with an end or beginning. SMS is not to roll the dice for an answer, but it’s to drop the marbles to see where they scatter.
There are a lot of things that SMS is not, but all of these things what SMS is not, are what SMS has become.

SMS Has Become A Conglomerate Of Opinions
SMS has become a conglomerate of opinions by the virtue of good intent to make flying safer. However, good intent and opinion have turned out to be the “killer-bee” of aviation safety. SMS has become so very complex that very few can explain why certain processes are applied or corrective actions are applied. Often these changes are made since the regulator is macro managing portions, or all of an enterprise. When a regulatory finding is given to an emergency response plan full-scale test because the test discovered deficiencies, then the SMS did not fail but was successful by the discovery of faulty processes. In an attempt to establish the utopia of safety, SMS has become a system where everything is defined as a safety issue and to the degree where safety itself has become virtual facts. Operational size and complexity is forgotten and Safety Critical Areas and Safety Critical Functions have no meaning.

Safety Critical Areas And Safety Critical Functions
An enterprise is failing their SMS unless their SMS includes Safety Critical Areas and Safety Critical
Functions. Anything else by to operate with an SMS for the purpose of improving safety is to support the red-tape of a bureaucratic enterprise where the processes are designed to support their design and not their operations.

Defining Safety Critical Areas and Safety Critical Functions is to place weight on areas of operations and functions within these areas. Not all areas of aviation are safety critical. In an organization where there are no safety critical areas or functions, the decision making process is simple, but without accountability. A Safety Critical Area could be night approaches, with a Safety Critical Functions being approaches to SFO or YCB at midnight during the month of July.  An approach to SFO may be safety critical function, while an approach to YCB in the High Arctic may not be safety critical. On the other hand,
an approach into YCB on a January day at noon might be a safety critical function. When all areas of aviation are assigned the same key or same weight to safety critical areas and functions, it becomes impossible to target areas for learning and training purposes. Safety now has become wishful thinking of a utopia of aviation where accidents never will happen again. Only when it is understood that an accident could happen in the future is when the SMS tools are ready to be applied to the operational level.

A vital question to ask for continuous safety is: Does Transport Canada accepts anything less from an Enterprise but that all areas are safety critical areas and all functions are safety critical functions? If they expect that all areas and functions of aviation must be equal safety critical it becomes a conflicting task for the operator to operate with an effective SMS. In a bureaucratic organization it is preferred that everything and everyone are equal.

An enterprise operating SMS without safety critical areas and functions are spinning their wheels. SMS is the NextGen of aviation safety where processes, or “how we normally do things”, are analyzed for variables and to what risk-level these variables are affecting operations.

Does One Incident Qualify As System Failure?
Yes, it does when criteria are met and based on guidance supported by Transport Canada, it does.
This practice has been established by awarding Enterprises system failure findings for one single failure to meet an expectation. System failures are any findings under CARs 107.02. Based on this one incident, since AC 759 approach most likely did not meet the expectation; qualify as an organizational system failure. However, coming to a conclusion that one irregularity is a system failure is rush to judgement unless that one irregularity is a function of time and comprehension. Often TC inspectors are rushing to judgement when issuing a system finding for one non-compliance with one expectation. Transport Canada inspectors may have good intentions when applying expectations as system failures, but have departed from the concept of SMS and further departed from the basic of their initial SMS training when SMS first became a regulatory requirement.

Transport Canada SMS Survey
In a survey published din the JDA Journal, April 13, 2017, the vast majority of Transport Canada Inspectors view themselves as having better knowledge of airline operations  than the operator themselves have and that TC inspectors are better qualified than the operator by fixing safety problems before they become accidents or incidents. Further, this survey identified correctly that SMS is to transfers responsibility for setting acceptable risk levels and monitoring safety performance is the responsibility of the operator themselves. This is a vital and valid point to improve safety in aviation since the system under SMS is operational based and not bureaucratic based for Transport Canada to accept the risk. SMS is the NextGen of aviation safety where the regulator is removed from operations. Another point in this survey is that 81 percent of inspectors surveyed predicted a major aviation accident soon. Nobody can predict a major aviation accident, not even a TC inspector. This survey reveals the bias of inspectors towards Canadian aviation operators and the inspector’s utopia view that they know best. Transport Canada Inspectors have yet to show any data collaborating SMS-failure statements over the last ten years. This survey was published as “A Learning Lesson For FAA”.

SMS Did Not Fail
SMS did not fail AC 759 on the approach to 28R at SFO. It was the operational practices, or how the flight normally is done that failed AC 759. If there was none, two or five aircraft on Taxiway Charlie is relevant to the potential catastrophic outcome, but irrelevant to the SMS processes. Since the quality of processes are unknown until the final output is known, it becomes vital to safety that the progress of operational safety processes allows for ongoing risk assessment and decision making by flight crew.  Quality Assurance is a result of variations in quality output by the same process.

Does SMS Make Flying Safer?
Yes, SMS makes flying safer and the SFO incident does not make flying any less safe.  Safety
management systems help companies identify safety risks before they become bigger problems. If one company ignores their own SMS tools available and the help SMS offers makes that one company less safe than if they had elected to apply their tools and predict the safety risk level of identified hazards. Flying doesn’t become less safe, or have a reduction in safety without SMS, but an organization without SMS lacks the opportunity for continuous safety improvements by applying SMS concept and principles. Over time an enterprise that is “listening to their SMS” is gaining grounds in safety and become safer in operations than non-SMS, or the “ignore-SMS” organizations.

Additional Layer and Parallel Approach
SMS is an additional layer of safety and parallel approach to operational processes. SMS does not make it safe or unsafe to fly, but SMS provides data, which is processed into information and then applied as knowledge and comprehension for safer operational practices. It is when this data has reached the level of comprehension that it can be integrated into a policy, design of processes and improvement of safety. In an SMS world it is still the flight crew, maintenance personnel, ground crew and the enterprise’s management that makes a difference in continuous safety improvement or continuous decline. However, SMS is an invaluable tool that some are overlooking but they are still expecting miracles from SMS. In addition, a contributing cause factor to the SFO incident is that
airports are outdated by its 1903 design without adapting to size, complexity and traffic volume.

AC759 Cleared To Land RWY 28R
Aviation Safety must always be viewed from the public’s point of view, which is an expectation of a pleasant experience and no incidents between boarding the airplane and deplaning at destination. For some passengers it might be comforting to know that Air Canada 759 did not end in a disaster, while for others this might be a horrified experience when learning about this incident to a point where they will never travel on an airplane again. Either way, the public is expecting quality performance of aircraft and flight crew. The flight crew of AC 759 on short final realized that something was wrong, but expected someone else, in this case the Tower Controller, to make a decision for them of what to do.  This principle of avoiding safety actions is outside the parameters of an effective SMS system.

When flying regularly, technical skills are improving while technical knowledge skills levels are reduced to the degree of application. The consequences is that the performance factor are unknowingly declining until it reaches the time limit of comprehension and reached the level of unacceptable performance risk factor. A flight crew that are not able to comprehend options available when there are lights on the runway they are cleared to land on  is an enterprise systematic failure of performance management and not individual errors. The flight crew made an inquiry to the Tower Controller as to why there were lights showing on a runway where they had been cleared to land without first initiate safety actions and deviate away from the hazard.


Thursday, July 13, 2017

SMS Information In A Suitable Medium

What is a suitable medium and who should it be suitable for?  The unknown variable of this question is who is to decide if the SMS is in a suitable medium or not. Since there is a regulatory requirement is for SMS information is made available in a suitable medium the regulation is written with ambiguity to allow for differences due to organizational size and complexity, but it’s also written for the regulator to decide based their own opinion what is a suitable medium. Airports are the smallest in size enterprise required to operate with an SMS, where there could be a couple of employees, where one is the Airport Manager and SMS Manager, and the other is the Accountable Executive. A suitable medium may be different for an airport than a large airline with thousands of employees. 

At one time, gold and cow hide may have been a suitable medium.
A suitable medium may mean something totally different from person to person, from organization to organization or from time to time. Some years ago, an electronic manual would not be a suitable medium to maintain SMS information and today paper format manuals may be obsolete. For SMS information and documentation to be established in a suitable medium, it must be suitable to the user group. The medium does not need to be suitable to the regulators or to the Accountable Executive, but must be suitable to the users who are feeding data into the SMS system.

A medium, or system that is not userfriendly for the submission of data is a safety system which in the long run will become an ineffective reactive system. This system would not operate within the concept of SMS which is a system of a just culture with proactive safety initiatives and accountability. When a system is developed for the purpose of supporting the bureaucracy of an airport or airline, it becomes ineffective as a supporting tool while a system designed in support of the processes becomes an effective safety management tool. An SMS system is fail-free since it’s another layer of safety supporting the operational safety processes.

A bureaucratic enterprise of SMS tools is best recognized by their attempt to adapt processes by enforcing the design upon operations, rather than changing the design SMS to adapt to operational safety processes. In other words, when a process is in non-compliance with the designed process, the failure may be with the design and not with the process itself.

An attempt to improve safety by enforcing operational design processes onto operational processes may cause unintended effects. If the operational performance of an aircraft is in non-compliance with the design performance, it does not improve safety by enforcing the design performance. Most enterprises are not strictly bureaucratic or an adhocracy systems, but area flexible to accommodate for both a bureaucratic system with prescriptive policies and an adhocracy system for flexibility for safe operations within size and complexity of the operations.

Establishing the SMS system in a suitable medium becomes more than just checking the check-box that an individual has drawn the line in the sand and determined what medium is suitable as a one-fit-all medium. A suitable medium may include more than one medium and include both paper format and electronic format in addition to smart-phone apps. When applying multiple mediums as suitable a bureaucratic enterprise may have difficulties adapting and analyzing data from these different sources, while an adhocracy may have invented additional process to capture all data for analysis. An effective enterprise adapts to processes for continuous safety improvements.  


Wednesday, June 28, 2017

The Fork In The Road Test

All roads lead to Rome and there are many different ways of reaching the same goal or objective. Finding the rootcause takes a road trip defining the turns and forks in the road. If travelling by air the course may take a detour by relying on the old ADF, or be more effective following a GPS course. There are several root cause analysis techniques and they all serve a purpose to improve safety and one rootcause model may be as effective, or ineffective as another. All rootcause analysis models are designed to establish at what time or location in the failed process a different approach could have made a different outcome. The 5-Why and fish bone rootcause analysis are widely accepted within the aviation industry and assumed to have established the correct rootcause. A risk assessment of substitute and residual risks is normally conducted after the rootcause analysis to identify if there are other or unexpected hazards by the implementation of proposed corrective action plan in the form of a new risk control strategy. As a compliance criteria, the enterprise monitors the cap with a follow-up as an assessment of the effectiveness of safety improvement.

Without knowledge one fork in the road is as good as another.
Monitoring the effect of corrective action is a standard procedure for follow-up of CAP implementation. Monitoring and follow-up may be dependant on seasonal differences or timeframes for collection of enough data to establish the effectiveness. If an enterprise has lost control of their safety processes and decided to implement corrective strategies The Fork In The Road Test is a tool to identify if steps in the evaluation process are taking shortcuts and jumping to a conclusion that the new strategy is the correct strategy. This shortcut is an attempt to break the wall in a maze to make it to the end of the tunnel without following the process path.

The Fork In The Road Test is to backtrack the process to establish where in the maze the failure of the wall was  and to establish the time and location in the future where the missing link of a CAP is. This does not imply that an incident or accident can be predicted, but it implies that knowledge is vital to predict the hazards affecting the process.

When building and operating out of an ice-runway there are many considerations affecting the design of the runway. Since the runway thaws out every year, it must be rebuilt the next season and located at the exactly same location to be validated as the same runway and applying the same instrument approach procedures. In addition to the runway itself, the ice movement over time offset the precision approach to a point where it becomes unusable. When applying The Fork In The Road Test to the runway the time begins at the time when the ice melted and the location begins at the location where the ice threshold was located.

It becomes simple to see The Fork In The Road if an aircraft landed on the ice in the spring when most of the ice had melted. The time can is backtracked and establish that a change in the direction, or taking a different turn at the fork in the road at that time would have made a difference. The Fork In The Road is not the time and location when the flight crew had to make decision to change flight path, but at what time and location the aircraft could have been expected to complete the flight without an incident. When the ice is melted the aircraft was doomed at the time of departure. On the
The Fork In The Road does not always take a straight path.
other hand let’s assume that the accident happened in the middle of the winter with a foot of ice and minus 45 degrees temperature. At this point it becomes a scientific task to establish where the fork in the road was. The task becomes to identify if there were special variations that caused the change of course or an incident, or if there was normal variations that were overlooked. E.g. melting ice would be a normal variation, blowing snow would be a normal variation, darkness would be a normal variation and ice-ridges would be a normal variation.

Let’s for a minute assume that the airplane hit an ice-ridge. This establishes the Fork In The Road at a time prior to the aircraft departed from civilization. The Fork In The Road is the one trigger that would, without doubt, made a difference for safe operations. In this scenario, it would have made a difference if the normal variation had been identified and runway inspected and assessed as safe for operations. The Fork In The Road Test is not applicable to events after departure since the departure itself locked the aircraft into a path where at some point in time the flight crew would have to make a reversal decisions or an incident would happen. The Fork In The Road Test had predicted a hazard of normal variations, but since the Fork In The Road Test was not applied the hazard was not identified.

At the world’s most dangerous airport a siren sounds about 45 minutes prior to arrival and before the aircraft departs for this destination. The Fork In The Road has been identified and the corrective action implemented at the time and location where it effectively makes a difference and aviation has become safer. The Fork In The Road is in the planning and decision to complete a pleasant flight.


Saturday, June 17, 2017

SMS Communication

A small operator communicates differently with their personnel than how a mega-enterprise would communicate. An airport with 2 or 3 people, being an Airport Manager, SMS Manager and Accountable Executive may communicate verbally without much documentation, while a larger airport may use multiple levels of communications processes. Both operators must meet the same requirement of the expectation that communication processes (written, meetings, electronic, etc.) are commensurate with the size and complexity of the organization. When applying this expectation without ambiguity, or applying the expectation with fairness to the expectation itself, both operators are expected to apply the exact same processes in communication. The simplest avenue when assessing for regulatory compliance is to apply the more complex communication processes to both operators. When applying this approach, the smaller airport’s SMS becomes a bureaucratic, unprofessional and ineffective tool for safe operations.

Small airport communication has also changed with the times.
At first glance it looks great that a small airport is expected to operate with different communication processes than a large airport, but when analyzing all available options, there is very little tolerance, or none, to apply this expectation in any other way than a prescriptive regulatory requirement. It is only when the operation is understood that the expectation can be applied with ambiguity, and with unfairness to the expectation itself, for an effective communication process for any size and complexity airport operators.

When there is a finding at a small, or large airport that the communication did not meet the regulatory requirement through this expectation, in that the information had been forgotten, misplaced or incorrectly interpreted, the operator is required to identify policies, processes procedures and practices involved that allowed for this non-compliance to occur. This in itself, that an operator allowed for a non-conformance to occur is a statement of bias in the finding implying that the operator had an option not to let this non-compliance occur. If this option was available at that time, the operatory would have taken different steps. The reason the non-compliance occurred is that the option to make a change was not available at the time when it occurred. All systems within the SMS were not function property and often it is the systems of human factors, organizational factors, supervision factors or environmental factors. Reviewing a finding in 20/20 hindsight is a simple task and to point out what could have been done differently becomes the task of applying the most complex process. However, when the operations is in the moment, the options at that location and point in time are limited to snapshots only of information, knowledge and comprehension of the events.

Human factors in communication is today integrated in automation and not visible.
Since there is an assumption in the root cause analysis requirements that the non-compliance was allowed, the question to the finding is no longer what happened, but why the operator allowed this event to escalate beyond regulatory requirements. The difference between a “what” and a “why” question is that “what” is data and “why” is someone’s opinion. When the finding is issued during an audit by the regulatory oversight team, the answer to the “why” question becomes the opinion of one person only of that team. When the opinion of that person becomes the determining factor of a root cause analysis it has become impossible to analyze the event for a factual root cause.  It could also be that the answerer to the “why” question becomes a compromised, or an average of the views of all inspectors, in which the answer is no longer relevant to the finding, but to a process where all opinions are considered. On the other hand, when the “what” question becomes the determining factor, each link to what happened must be supported by data and documented events. If the events of the “what” cannot be answered first, or before any other questions are answered, it has become an impossible task to assign a root cause to the finding. When the “what” question is answered, then a change to the “what” may be assigned and implemented.

This does not imply that the “why” question is not to be asked, but it becomes a factor of how the “why” is asked, and if the determining factor to the “why” question is an agreement between several people in a group to assign an average of indifferences, or if the “why” question is answered to the “what” question. When applying the 5-why process, the answer, or root cause, is established by the answer to the first “why”, since the rest of the answers must be locked in to the first. The more effective root cause analyses are the “fish-bone”, the “5-why matrix”, or the “fork-in-the-road” test.

When the requirement of the second expectation, as stated in the root cause analysis document, that the non-compliance was allowed, it changes the first expectation within an SMS element of different communication based on size and complexity of the airport to a prescriptive regulatory requirement. The prescriptive requirement then becomes the common denominator for the event that was allowed to occur and must be applied to the most complex communication process. The simplest way to look at this is that when the “allowed to” is allowed to be applied to an event, it is assumed that human variations do not exist and that the system is operating in an undisputed perfect virtual environment.


Friday, June 2, 2017

No Data, No History, No Event

Root cause analysis is to find the single cause of why an unplanned event happened, or a link in the process where a different decision would have made a different outcome. This does not necessarily imply that a different outcome would have avoided the unplanned event, but it may have happened at a different time or location and with a different outcome. The expectation of a different outcome is that the unplanned event would not happen.

When analyzing for the root cause the 5-Why process is often applied. Unless there is an unbiased process applied to the answers of the 5-Why process, the desired answers could be established prior to initiating the process and the answers are tracked backtracked from this desired answer. The fact of this is that most 5-Why processes only allows for one option for the root cause. Since the organization is determined to establish a root cause, the root could be established without applying the 5-Why process. This is the “checkbox” syndrome of establishing a root cause by applying the approved root cause analysis. The assumption is that as long as the paperwork looks good it must be the right root cause and operations must be safe, correct? No, this is not correct. An incorrect root cause is more unsafe than a know, but non-effective root cause, since the new and incorrect root cause has not been tested and the outcome is unknown. Assuming that the new root cause is effective is to assume that opinions are facts.

Find the roots that feeds life into the process.
A root cause analysis must include data from prior documented events. If there is no data, no history or no documented event a root cause analysis cannot be based on past experiences. A onetime event is not a trend and applying a root cause analysis to one event defeats the purpose for the safe operation of an airport or aircraft. If there is no data, there is no trend and are no prior events to compare to the analysis to. The key to success is to establish data and trends to determine the root cause and make changes to the processes to reduce or eliminate another unscheduled event or failure. E.g. should a runway edge light fail and there is no data of prior failures, the short term fix is to replace the lightbulb. This might not be what the regulators wants to see, but the fact is there is no data to justify a root cause, and, in addition, there is no data to justify that the burnt our light is not an acceptable risk. Over time an airport may track the burnt out lights (which is data) and over a period of 3-7 years establish a pattern of malfunctioning lights. With this information the airport may establish the root cause and change the lights at a reasonable time prior to the bulbs are expected to burn out. It’s as simple as that.

Another option is to apply best-practices or continuous safety improvement by collecting data from the light manufacturer of how many hours or cycles a runway edge light is expected to last. If this was done, a process to change these lights prior to lights burn out could be reduced from 7 years to 6 months and their safety goal to minimize burnt out lights achieved in a short time. By applying the data supplied by the manufacturer a 5-Why analysis may not even be necessary to establish the root cause.

When only one option of questions for a root cause the question must be answered first.
Let’s assume that an airport took the best-practices route and established a lifetime for runway edge lights. However, the lights still burned out before expected and became a frustration to airport management and an inconvenience for their customers. The next step is to collect data for a root cause analysis. In the process to decide what approach to take to collect data the 5-Why Matrix was applied. The result from this matrix was to mount wildlife cameras at the airport to see if there is any wildlife connection to the burnt-out lights. Over time it was discovered that the coyotes came and chewed the power cable and that the lights therefore burned out about 2 days later. This data could now be applied to a root cause analysis, or the location of the fork in the road, and the process of transferring power could be improved. In other words, but burying the cable underground and cover it up to the light, the long term corrective action had extended the intervals of replacing the lights.

Without data, there is no event, only opinions of events. Applying a straight 5-Why does not necessarily establish the correct root cause, since the answer is locked in after the first question is answered. For the 5-Why process to be more effective the application of a matrix moves the process out-of-the-box for a nonbiased result.


Friday, May 19, 2017

Communicating or Transmitting SMS

There is an expectation that for an SMS program to conform to regulatory compliance the enterprise must have in place a process for safety authorities, responsibilities and accountabilities are transmitted to all personnel. If this process is not in place the enterprise a non-compliance System Finding under Canadian Aviation Regulation (CARs) 107.02 may be issued to the operator.  CARs 107.02 is system compliance regulation, or a design regulations for a regulatory compliant Safety Management System. When the design of the SMS is regulatory compliant, then the processes executing the SMS design must also conform to regulatory compliance. In other words, these two compliance components are the design component and the operations component.  For operators in Canada the design requirements are found in CARs 107.02, for both airlines and airports. The operations requirements are found in CARs 705.151 and CARs 302.500 respectively.

The manufacturing of this chain complies with the requirement to produce a chain.
When job descriptions are transmitted to personnel, in accordance with this expectation, the message may or may not reach the intended personnel. Transmitting is a one-way communication and it does not specific direct the communication to the intended recipient. If the communication only reach a recipient who is in a non-management position, this information may be overwhelming since it does not conform to the expectation of the person’s job description. Or, if the information transmitted reach senior management only, their response may be incorrect for their job performance expectations. This expectation that “Safety Authorities, Responsibilities And Accountabilities Are Transmitted To All Personnel” may be compliant to the expectation itself and also compliant with the regulatory requirements under operations. However, by following the “letter of the sentence” only there are other SMS required tasks that are missed and not being performed to acceptable levels. Since the interpretation of information became a conflict with the position of the intended audience there is a failure of the system.

The process in this example is functioning as expected, but the response to communication was in conflict to the job position established in the organization chart. The effect caused by lack of response was not just that the information was transmitted to incorrect positions and the job not done, but also that by not performing as the expectation intended, other parts of the SMS was crumbling and the system itself did not function.

Destroying a process could crumble a system.
It doesn’t matter how strong and well maintained 99 links in a chain are when there is one link that breaks. When the link is broken, there is a broken process somewhere that must be identified. Repairing the process by replacing the old chain with a new chain may not necessary work well, since this does not consider the process. It could be that this link in the chain was being grinded now and then by a grinding tool required for the process to function. Replacing the old with a new is an assumption that there is a manufacturing flaw without analyzing the operational processes. Then the next time it happens everybody is just as surprised as the 10 previous times. Often, the next step is to change chain manufacturer, or fire a person who authorized the supplier.

By not conforming to the intent of this expectation that “Safety Authorities, Responsibilities And Accountabilities Are Transmitted To All Personnel”, the system itself may fail and everyone is as surprised as the first time when it failed.