Saturday, June 17, 2017

SMS Communication

A small operator communicates differently with their personnel than how a mega-enterprise would communicate. An airport with 2 or 3 people, being an Airport Manager, SMS Manager and Accountable Executive may communicate verbally without much documentation, while a larger airport may use multiple levels of communications processes. Both operators must meet the same requirement of the expectation that communication processes (written, meetings, electronic, etc.) are commensurate with the size and complexity of the organization. When applying this expectation without ambiguity, or applying the expectation with fairness to the expectation itself, both operators are expected to apply the exact same processes in communication. The simplest avenue when assessing for regulatory compliance is to apply the more complex communication processes to both operators. When applying this approach, the smaller airport’s SMS becomes a bureaucratic, unprofessional and ineffective tool for safe operations.

Small airport communication has also changed with the times.
At first glance it looks great that a small airport is expected to operate with different communication processes than a large airport, but when analyzing all available options, there is very little tolerance, or none, to apply this expectation in any other way than a prescriptive regulatory requirement. It is only when the operation is understood that the expectation can be applied with ambiguity, and with unfairness to the expectation itself, for an effective communication process for any size and complexity airport operators.

When there is a finding at a small, or large airport that the communication did not meet the regulatory requirement through this expectation, in that the information had been forgotten, misplaced or incorrectly interpreted, the operator is required to identify policies, processes procedures and practices involved that allowed for this non-compliance to occur. This in itself, that an operator allowed for a non-conformance to occur is a statement of bias in the finding implying that the operator had an option not to let this non-compliance occur. If this option was available at that time, the operatory would have taken different steps. The reason the non-compliance occurred is that the option to make a change was not available at the time when it occurred. All systems within the SMS were not function property and often it is the systems of human factors, organizational factors, supervision factors or environmental factors. Reviewing a finding in 20/20 hindsight is a simple task and to point out what could have been done differently becomes the task of applying the most complex process. However, when the operations is in the moment, the options at that location and point in time are limited to snapshots only of information, knowledge and comprehension of the events.

Human factors in communication is today integrated in automation and not visible.
Since there is an assumption in the root cause analysis requirements that the non-compliance was allowed, the question to the finding is no longer what happened, but why the operator allowed this event to escalate beyond regulatory requirements. The difference between a “what” and a “why” question is that “what” is data and “why” is someone’s opinion. When the finding is issued during an audit by the regulatory oversight team, the answer to the “why” question becomes the opinion of one person only of that team. When the opinion of that person becomes the determining factor of a root cause analysis it has become impossible to analyze the event for a factual root cause.  It could also be that the answerer to the “why” question becomes a compromised, or an average of the views of all inspectors, in which the answer is no longer relevant to the finding, but to a process where all opinions are considered. On the other hand, when the “what” question becomes the determining factor, each link to what happened must be supported by data and documented events. If the events of the “what” cannot be answered first, or before any other questions are answered, it has become an impossible task to assign a root cause to the finding. When the “what” question is answered, then a change to the “what” may be assigned and implemented.

This does not imply that the “why” question is not to be asked, but it becomes a factor of how the “why” is asked, and if the determining factor to the “why” question is an agreement between several people in a group to assign an average of indifferences, or if the “why” question is answered to the “what” question. When applying the 5-why process, the answer, or root cause, is established by the answer to the first “why”, since the rest of the answers must be locked in to the first. The more effective root cause analyses are the “fish-bone”, the “5-why matrix”, or the “fork-in-the-road” test.

When the requirement of the second expectation, as stated in the root cause analysis document, that the non-compliance was allowed, it changes the first expectation within an SMS element of different communication based on size and complexity of the airport to a prescriptive regulatory requirement. The prescriptive requirement then becomes the common denominator for the event that was allowed to occur and must be applied to the most complex communication process. The simplest way to look at this is that when the “allowed to” is allowed to be applied to an event, it is assumed that human variations do not exist and that the system is operating in an undisputed perfect virtual environment.

CatalinaNJB

Friday, June 2, 2017

No Data, No History, No Event

Root cause analysis is to find the single cause of why an unplanned event happened, or a link in the process where a different decision would have made a different outcome. This does not necessarily imply that a different outcome would have avoided the unplanned event, but it may have happened at a different time or location and with a different outcome. The expectation of a different outcome is that the unplanned event would not happen.

When analyzing for the root cause the 5-Why process is often applied. Unless there is an unbiased process applied to the answers of the 5-Why process, the desired answers could be established prior to initiating the process and the answers are tracked backtracked from this desired answer. The fact of this is that most 5-Why processes only allows for one option for the root cause. Since the organization is determined to establish a root cause, the root could be established without applying the 5-Why process. This is the “checkbox” syndrome of establishing a root cause by applying the approved root cause analysis. The assumption is that as long as the paperwork looks good it must be the right root cause and operations must be safe, correct? No, this is not correct. An incorrect root cause is more unsafe than a know, but non-effective root cause, since the new and incorrect root cause has not been tested and the outcome is unknown. Assuming that the new root cause is effective is to assume that opinions are facts.

Find the roots that feeds life into the process.
A root cause analysis must include data from prior documented events. If there is no data, no history or no documented event a root cause analysis cannot be based on past experiences. A onetime event is not a trend and applying a root cause analysis to one event defeats the purpose for the safe operation of an airport or aircraft. If there is no data, there is no trend and are no prior events to compare to the analysis to. The key to success is to establish data and trends to determine the root cause and make changes to the processes to reduce or eliminate another unscheduled event or failure. E.g. should a runway edge light fail and there is no data of prior failures, the short term fix is to replace the lightbulb. This might not be what the regulators wants to see, but the fact is there is no data to justify a root cause, and, in addition, there is no data to justify that the burnt our light is not an acceptable risk. Over time an airport may track the burnt out lights (which is data) and over a period of 3-7 years establish a pattern of malfunctioning lights. With this information the airport may establish the root cause and change the lights at a reasonable time prior to the bulbs are expected to burn out. It’s as simple as that.

Another option is to apply best-practices or continuous safety improvement by collecting data from the light manufacturer of how many hours or cycles a runway edge light is expected to last. If this was done, a process to change these lights prior to lights burn out could be reduced from 7 years to 6 months and their safety goal to minimize burnt out lights achieved in a short time. By applying the data supplied by the manufacturer a 5-Why analysis may not even be necessary to establish the root cause.

When only one option of questions for a root cause the question must be answered first.
Let’s assume that an airport took the best-practices route and established a lifetime for runway edge lights. However, the lights still burned out before expected and became a frustration to airport management and an inconvenience for their customers. The next step is to collect data for a root cause analysis. In the process to decide what approach to take to collect data the 5-Why Matrix was applied. The result from this matrix was to mount wildlife cameras at the airport to see if there is any wildlife connection to the burnt-out lights. Over time it was discovered that the coyotes came and chewed the power cable and that the lights therefore burned out about 2 days later. This data could now be applied to a root cause analysis, or the location of the fork in the road, and the process of transferring power could be improved. In other words, but burying the cable underground and cover it up to the light, the long term corrective action had extended the intervals of replacing the lights.

Without data, there is no event, only opinions of events. Applying a straight 5-Why does not necessarily establish the correct root cause, since the answer is locked in after the first question is answered. For the 5-Why process to be more effective the application of a matrix moves the process out-of-the-box for a nonbiased result.


CatalinaNJB

Friday, May 19, 2017

Communicating or Transmitting SMS

There is an expectation that for an SMS program to conform to regulatory compliance the enterprise must have in place a process for safety authorities, responsibilities and accountabilities are transmitted to all personnel. If this process is not in place the enterprise a non-compliance System Finding under Canadian Aviation Regulation (CARs) 107.02 may be issued to the operator.  CARs 107.02 is system compliance regulation, or a design regulations for a regulatory compliant Safety Management System. When the design of the SMS is regulatory compliant, then the processes executing the SMS design must also conform to regulatory compliance. In other words, these two compliance components are the design component and the operations component.  For operators in Canada the design requirements are found in CARs 107.02, for both airlines and airports. The operations requirements are found in CARs 705.151 and CARs 302.500 respectively.

The manufacturing of this chain complies with the requirement to produce a chain.
When job descriptions are transmitted to personnel, in accordance with this expectation, the message may or may not reach the intended personnel. Transmitting is a one-way communication and it does not specific direct the communication to the intended recipient. If the communication only reach a recipient who is in a non-management position, this information may be overwhelming since it does not conform to the expectation of the person’s job description. Or, if the information transmitted reach senior management only, their response may be incorrect for their job performance expectations. This expectation that “Safety Authorities, Responsibilities And Accountabilities Are Transmitted To All Personnel” may be compliant to the expectation itself and also compliant with the regulatory requirements under operations. However, by following the “letter of the sentence” only there are other SMS required tasks that are missed and not being performed to acceptable levels. Since the interpretation of information became a conflict with the position of the intended audience there is a failure of the system.

The process in this example is functioning as expected, but the response to communication was in conflict to the job position established in the organization chart. The effect caused by lack of response was not just that the information was transmitted to incorrect positions and the job not done, but also that by not performing as the expectation intended, other parts of the SMS was crumbling and the system itself did not function.

Destroying a process could crumble a system.
It doesn’t matter how strong and well maintained 99 links in a chain are when there is one link that breaks. When the link is broken, there is a broken process somewhere that must be identified. Repairing the process by replacing the old chain with a new chain may not necessary work well, since this does not consider the process. It could be that this link in the chain was being grinded now and then by a grinding tool required for the process to function. Replacing the old with a new is an assumption that there is a manufacturing flaw without analyzing the operational processes. Then the next time it happens everybody is just as surprised as the 10 previous times. Often, the next step is to change chain manufacturer, or fire a person who authorized the supplier.

By not conforming to the intent of this expectation that “Safety Authorities, Responsibilities And Accountabilities Are Transmitted To All Personnel”, the system itself may fail and everyone is as surprised as the first time when it failed.


CatalinaNJB

Friday, May 5, 2017

Risk Management Differently

This is a blog with no relevance to any opinions, facts, research or science, but a trivial blog written for continuous improvement in safety by thinking beyond the horizons and outside the box. For continuous safety improvement to be effective thinking outside the box is vital for the collection of unbiased data and then bring this data back in the box to be analyzed for safety improvements. We don’t manage risks; we lead personnel, manage equipment and validate operational design for improved performance above the bar of acceptable risk level.

Improvements begins outside the box.
Risk level analysis is traditionally established by applying likelihood, severity and exposure. In a risk level analysis, the exposure is always equal 1 for the hazard to become a risk to aviation safety. Without exposure, there is no risk. Birds is a hazard to aviation safety. However, birds that are 100 miles away from the flight path are not a risk to aviation, but still classified as a hazard to aviation. Traditionally these risk levels are color coded, where green is acceptable, yellow acceptable with mitigation and red is not acceptable. There is often little or no scientific data behind these risk levels except for aircraft performance. Human factors, organizational factors, supervision factors and environmental factors are not included in these risk assessments. Human factors may affect the risk level differently one day than another day. Human factor, or the interaction between software, hardware, environment and crew and other human interactions are vital to aviation safety.

There are two elements to human performance: 1) technical knowledge and 2) technical skills. Knowledge is the theory of operations, while skills is the operations itself. At the initial licensing of a pilot, the candidate first must pass a knowledge test, and then a practical flight test. Without passing at an acceptable risk level, a pilot license cannot be issued. As the pilot is employed, this concept of refreshing both technical knowledge and technical skills becomes a concept of operational performance.

Normally a person’s retention of learning decreases with time when learning is not applied to operations. Much of the theoretical learning is not applied daily in the job, but occasionally with the use of checklist. The highest percentage-loss occurs in the first days and weeks after the leaning is completed and somewhat levels off after that. Since the learning is being applied in their skills performance by flying an aircraft daily, there is additional learning occurring on the job and their performance level of technical skills are improving in the days and weeks after the learning.

One enterprise was expecting their pilots to retain a 100% knowledge level one year after the training and would initiate the refresher course with the knowledge test and expect all candidates to be as proficient in knowledge as they were 365 days ago.  Since pilots only applied part of their knowledge regularly in the day to day job and learning was not encouraged, most of what was learned had been forgotten in 365 days. Since their jobs were dependent on passing the knowledge test, the candidates would do their own and personal refresher course the last 2-3 weeks prior to the official refresher course. When the test was take all candidates passed and the enterprise could proudly check off the box that their pilots had retained 100% knowledge in 365 days.

When assessing risk levels differently an enterprise would assess performance based on a pilot’s retention of knowledge and skills. Let’s assume the learning retention loss of knowledge is 20% per day for the first 84 days and from then on, the retention loss is 2% per day to 365 days. At the end of a year the total knowledge retention is 20%, or in other words, if the pilot took the test without studying after 365 days, it would be expected that the test result would be 20% of last result.

Their technical skills retention for pilots are not reduced after learning, but their performance is getting better since they are applying their skill in their day to day job and additionally being exposed to known and unknown hazards regularly. At the end of 365 days the pilot retention levels are 180% of what it was after the previous flight test.

When applying this data as a combined retention level factor of knowledge and skills, the pilots are performing at their 100% level after 365 days. After 5 years in the same job they are performing above their 100% initial level.

Performance factor most critical days are days 60-80.
The traditional risk level model is based on aircraft performance and pilots are expected to perform at their 100% performance level in both technical knowledge and skills. In addition, the traditional risk level matrix does only apply recommendations to accepted risk or rejected risk. A different risk matrix is to apply an action to the colors which are based on likelihood and severity. These actions are to communicate (green), monitor (yellow), pause (blue), suspend (orange) or cease (red) operations. Risk levels orange and red are applicable to aircraft performance where pilot qualifications does not impact aircraft performance limitations. When overlaying the knowledge, skills and performance factor graphs onto the risk matrix, the lowest level of performance represents knowledge, the highest skills and the middle is their performance level. A performance level should be above the monitoring (yellow) level for quality assurance of flight operations.


CatalinaNJB

Thursday, April 20, 2017

The SMS Manager


The person managing the operation of the SMS fulfils the required job functions and responsibilities to meet regulatory requirements. An effective Safety Management System is lead by a person who is technically qualified and understand the interaction of all systems. An airline and airport are to ensure that the person who occupy the position as SMS Manger is qualified to lead and manage for regulatory compliance. The regulations for an SMS manager is written with ambiguity, or written for being open to more than one interpretation. This is how a performance based regulation is written to allow for the application of size and complexity to conform to regulatory compliance in operation. Since there is no establish standards the enterprise must first establish their standards, or expectations and then the qualifications requirement for that position before the effective date of their SMS.
Ambiguity is in the design.
If the qualifications for an SMS manager is not established, or an unqualified person is placed in the position as SMS manager, there is no certainty within the enterprise that their SMS is a businesslike approach to safety with the SMS as an additional layer of safety.

One of the items an SMS manager is expected to lead and manage is to establish and maintain a reporting system to ensure the timely collection of information related to hazards, incidents and accidents that may adversely affect safety. An SMS manager with technical knowledge and intelligence of the specifics of operations may know and understand the effect of unsafe conditions also needs to be qualified to develop a reporting system to ensure timely collection of information. It is the task of the enterprise to ensure that the person in the SMS manager position has these skills required to develop and maintain a system. Without these skills applied to an SMS manager, an enterprise may slowly drift away from the regulatory performance requirements.

Another skill required is to identify hazards and carry out risk management analyses of those hazards. A safety risk management of hazards is to apply likelihood and severity of a hazard as it applies to the operations of an airport or airline. If there is no exposure to the hazard there is no risk involved. In my many years of analyzing Safety Management Systems I have heard the opinion from regulators that a pilot is exposed to an engine failure for each takeoff and the reasoning for this is that an engine failure could happen, and that the pre-take off briefing includes the actions in the event of an engine failure. This is true, that an engine failure is a hazard, but it is not true that a pilot is exposed to an engine failure at each takeoff. The exposure determines the risk and if not exposed to the hazard there is no risk. The preparation for an engine failure is a corrective action plan to action the risk if exposed.

As the root cause analysist, the SMS manager is defining time and location of the fork in the road.
Other skills required by an SMS manager are skills to investigate, analyze and identify the root cause or probable cause of all hazards, incidents and accidents, to monitor and analyze trends in hazards, incidents and accidents, to monitor and evaluate the results of corrective actions with respect to hazards, incidents and accidents, to monitor the concerns of the civil aviation industry in respect of safety and their perceived effect on the holder of an airline or airport certificate; and determine the adequacy of the training required to comply with regulatory requirements.
When the person managing the operation of the SMS fulfils the required job functions and responsibilities established by the policies, standards and job performance expectations the enterprise has established a foundation for an effective SMS with tailored job functions to one specific enterprise and not intended for duplication.


CatalinaNJB

Sunday, April 9, 2017

The Value of Safety

The last blog touched the value of safety and ROI on safety. There are several safety articles written about the return on investment of a Safety Management System with a return between 100 % and 600 %. All these ROIs are based on future predictions of a reduction in major accidents, operational incidents and hazards by applying the SMS tool. When applying an estimate of lack of future losses, the ROI does not represent the true value of safety, but a virtual value of safety. Virtual cash or virtual ROI is not an actual return based on facts or data, but an opinion and projection of a planned SMS. The value of safety is not the lack of accidents or incidents, but the total revenue generated by operations. SMS is a businesslike approach to safety and the value of safety should be applied in that manner.

Process Applications Are Limited To Technical Capability.
An investment in an airline or airport is the total cash invested in the operations. The return on this investment is based on several factors which at the end produces a profit or loss. A safety management system is neutral in producing profit of loss since it’s a system that does not produce or consume events and occurrences. A functional SMS is the financial comptroller of safety and a quality assurance program. In business, a comptroller is a management level position responsible for supervising the quality of accounting and financial reporting of an organization. As a businesslike approach to safety, SMS is responsible for supervising the quality of safety.  A financial commitment or investment affects all aspects of the organization. An investment in an aircraft or new runway affects other areas such as maintenance, customer service and training. Depending on how this single investment is promoted, marketed and managed may increase the overall ROI of the organization, or may incur a major loss. An aircraft or runway in itself is profit or loss neutral. It is the management of operations that generate a profit of loss. SMS is in this same manner accident or incident neutral, but affects outcomes based on how the SMS tool is applied. It is the application of SMS as a tool to manage and lead operations which generate the profit, losses, incidents or accidents.

Return on Investment of SMS is not the savings by a reduction of accidents or incidents, but the return of cash revenue generated by in-control processes and organizational based safety investment decisions. When purchasing an aircraft, the operator is basing their judgement on what safety-nets the manufacturer has implemented. When building a new runway the airport is basing their judgement on safety-nets applied by the construction company.  When customers decide to purchase a ticket, or an airline decide to operate out of a specific airport, their decision to purchase is based on what safety-nets and assurance, or process controls of these safety-nets the airline or airport have in place. Safety management, or leadership in process management, is the overarching tool in decision making and therefore the only profit generator in an organization.

Comfort on an airplane is important, but if there is an apparent lack of safety then other carriers are chosen. This is the same with an airport; if the runway is marginal short for operations then the airlines chose other less convenient airports of operations. Safety is therefore the only profit generator and when applied in a businesslike approach to safety the ROI is the cash returned in operations, and not the absence of accidents.

ROI projections may apply the cost of accidents, but it is not the true ROI. The true ROI is the SMS decisions that went into the process of purchasing a new aircraft, or extending a runway which contributed to the ROI and is the ROI of safety. As an ROI projection the value of safety may be applied as $1.00 per second of time spent on task as the investment, and the actual $1.00 per second spent on task as revenue. Since both airplanes and runways are ROI neutral, it’s the Safety Management System decisions that produced the ROI, or the profit of loss result. There is no single operation within aviation that does not assess for safety and the impact safety has on profit. Not as an impact of reduction in incidents or accidents, but on customer confidence level of operations.

SPCforExcel   Out of Control Tests.
Without SMS there is zero confidence level of operational safety. Operators without an SMS may believe that they have a 100% confidence level of safety. However, when mathematically calculated their confidence level of safety is 0% since there is zero data to justify their statement. With an SMS in place the operational confidence level of safety is at least 95% even if wishful thinking is for safety to be 100%. The other unaccountable 5% of confidence levels are so remote that times between intervals of one occurrence is imaginary, theoretical, virtual, or fictional.

A Safety Management System is the Constitution of an organization and the tool for operations within a just culture and accountability where the ROI is the fraction of out of control testes. Processes within the Safety Management System are analyzed in a Statistical Process Control (SPC) system with multiple test for out of control processes. Each one of these tests are assigned a weight and applied to the ROI. Without data the value of safety and ROI is just an assessment of opinions.




CatalinaNJB

Thursday, March 23, 2017

A Qualified Person Runs The SMS

When an airport operator or an air operator appoints a qualified person as the Accountable Executive, the options are wide open to appoint anyone in the organization who they see qualified to be responsible for operations or activities authorized under the certificate and to be accountable on
The AE is a position without performance requirements.
behalf of the operator for meeting the requirements of the regulations. The requirement to qualify as the AE is a person who has demonstrated control of the financial and human resources that are necessary for the activities and operations authorized under the certificate. This is a broad description of qualifications, but becomes limited to organizational structure of authority.

The appointment of an airport AE compared to an air operator AE is slightly different, since an airport certificate is issued to a land-surveyed area, while an air operator certificate is issued to an individual or a corporate body. An AE for an airport is responsible to the land-surveyed area, while the AE for an air operator is responsible to the board of directors. However, as operators both AEs are responsible on behalf of the certificate holder for meeting the requirements of the regulations, which one of them are the Safety Management System regulations.

An Accountable Executive requirement could also be a matter of identifying a person who leads the necessary cultural change of Just Culture and Quality Assurance Culture and how services are provided with safety assurance to the general public. Without an SMS there is no safety assurance.

Run the SMS as a businesslike approach to safety.
Aviation Safety Management System is the NextGen of aviation safety, where a cultural change is inevitable for an SMS leader to be successful. Culture change does not happen overnight, but over a lengthy period of time. For a Just Culture to develop, each individual in an organization must be acceptable to these changes. The Just Culture and Quality Assurance Cutlers are developed within an organization by personnel consuming data, applied learning to data for processing into information, engage their information in operational processes with an output of knowledge and by assessing this output and comprehend the systems involved in a change of culture. This change is culture is the Return on Investment (ROI)

SMS is a businesslike approach to safety, where ROI is vital to success. When a certificate holder is applying this businesslike approach to safety and appoints a qualified person as the Accountable Executive, the requirement of demonstrating control over human and financial resources is incidental to the ROI. When applying this concept the NextGen of Accountable Executive Leaders in Aviation SMS are born.  



CatalinaNJB