Friday, May 19, 2017

Communicating or Transmitting SMS

There is an expectation that for an SMS program to conform to regulatory compliance the enterprise must have in place a process for safety authorities, responsibilities and accountabilities are transmitted to all personnel. If this process is not in place the enterprise a non-compliance System Finding under Canadian Aviation Regulation (CARs) 107.02 may be issued to the operator.  CARs 107.02 is system compliance regulation, or a design regulations for a regulatory compliant Safety Management System. When the design of the SMS is regulatory compliant, then the processes executing the SMS design must also conform to regulatory compliance. In other words, these two compliance components are the design component and the operations component.  For operators in Canada the design requirements are found in CARs 107.02, for both airlines and airports. The operations requirements are found in CARs 705.151 and CARs 302.500 respectively.

The manufacturing of this chain complies with the requirement to produce a chain.
When job descriptions are transmitted to personnel, in accordance with this expectation, the message may or may not reach the intended personnel. Transmitting is a one-way communication and it does not specific direct the communication to the intended recipient. If the communication only reach a recipient who is in a non-management position, this information may be overwhelming since it does not conform to the expectation of the person’s job description. Or, if the information transmitted reach senior management only, their response may be incorrect for their job performance expectations. This expectation that “Safety Authorities, Responsibilities And Accountabilities Are Transmitted To All Personnel” may be compliant to the expectation itself and also compliant with the regulatory requirements under operations. However, by following the “letter of the sentence” only there are other SMS required tasks that are missed and not being performed to acceptable levels. Since the interpretation of information became a conflict with the position of the intended audience there is a failure of the system.

The process in this example is functioning as expected, but the response to communication was in conflict to the job position established in the organization chart. The effect caused by lack of response was not just that the information was transmitted to incorrect positions and the job not done, but also that by not performing as the expectation intended, other parts of the SMS was crumbling and the system itself did not function.

Destroying a process could crumble a system.
It doesn’t matter how strong and well maintained 99 links in a chain are when there is one link that breaks. When the link is broken, there is a broken process somewhere that must be identified. Repairing the process by replacing the old chain with a new chain may not necessary work well, since this does not consider the process. It could be that this link in the chain was being grinded now and then by a grinding tool required for the process to function. Replacing the old with a new is an assumption that there is a manufacturing flaw without analyzing the operational processes. Then the next time it happens everybody is just as surprised as the 10 previous times. Often, the next step is to change chain manufacturer, or fire a person who authorized the supplier.

By not conforming to the intent of this expectation that “Safety Authorities, Responsibilities And Accountabilities Are Transmitted To All Personnel”, the system itself may fail and everyone is as surprised as the first time when it failed.


Friday, May 5, 2017

Risk Management Differently

This is a blog with no relevance to any opinions, facts, research or science, but a trivial blog written for continuous improvement in safety by thinking beyond the horizons and outside the box. For continuous safety improvement to be effective thinking outside the box is vital for the collection of unbiased data and then bring this data back in the box to be analyzed for safety improvements. We don’t manage risks; we lead personnel, manage equipment and validate operational design for improved performance above the bar of acceptable risk level.

Improvements begins outside the box.
Risk level analysis is traditionally established by applying likelihood, severity and exposure. In a risk level analysis, the exposure is always equal 1 for the hazard to become a risk to aviation safety. Without exposure, there is no risk. Birds is a hazard to aviation safety. However, birds that are 100 miles away from the flight path are not a risk to aviation, but still classified as a hazard to aviation. Traditionally these risk levels are color coded, where green is acceptable, yellow acceptable with mitigation and red is not acceptable. There is often little or no scientific data behind these risk levels except for aircraft performance. Human factors, organizational factors, supervision factors and environmental factors are not included in these risk assessments. Human factors may affect the risk level differently one day than another day. Human factor, or the interaction between software, hardware, environment and crew and other human interactions are vital to aviation safety.

There are two elements to human performance: 1) technical knowledge and 2) technical skills. Knowledge is the theory of operations, while skills is the operations itself. At the initial licensing of a pilot, the candidate first must pass a knowledge test, and then a practical flight test. Without passing at an acceptable risk level, a pilot license cannot be issued. As the pilot is employed, this concept of refreshing both technical knowledge and technical skills becomes a concept of operational performance.

Normally a person’s retention of learning decreases with time when learning is not applied to operations. Much of the theoretical learning is not applied daily in the job, but occasionally with the use of checklist. The highest percentage-loss occurs in the first days and weeks after the leaning is completed and somewhat levels off after that. Since the learning is being applied in their skills performance by flying an aircraft daily, there is additional learning occurring on the job and their performance level of technical skills are improving in the days and weeks after the learning.

One enterprise was expecting their pilots to retain a 100% knowledge level one year after the training and would initiate the refresher course with the knowledge test and expect all candidates to be as proficient in knowledge as they were 365 days ago.  Since pilots only applied part of their knowledge regularly in the day to day job and learning was not encouraged, most of what was learned had been forgotten in 365 days. Since their jobs were dependent on passing the knowledge test, the candidates would do their own and personal refresher course the last 2-3 weeks prior to the official refresher course. When the test was take all candidates passed and the enterprise could proudly check off the box that their pilots had retained 100% knowledge in 365 days.

When assessing risk levels differently an enterprise would assess performance based on a pilot’s retention of knowledge and skills. Let’s assume the learning retention loss of knowledge is 20% per day for the first 84 days and from then on, the retention loss is 2% per day to 365 days. At the end of a year the total knowledge retention is 20%, or in other words, if the pilot took the test without studying after 365 days, it would be expected that the test result would be 20% of last result.

Their technical skills retention for pilots are not reduced after learning, but their performance is getting better since they are applying their skill in their day to day job and additionally being exposed to known and unknown hazards regularly. At the end of 365 days the pilot retention levels are 180% of what it was after the previous flight test.

When applying this data as a combined retention level factor of knowledge and skills, the pilots are performing at their 100% level after 365 days. After 5 years in the same job they are performing above their 100% initial level.

Performance factor most critical days are days 60-80.
The traditional risk level model is based on aircraft performance and pilots are expected to perform at their 100% performance level in both technical knowledge and skills. In addition, the traditional risk level matrix does only apply recommendations to accepted risk or rejected risk. A different risk matrix is to apply an action to the colors which are based on likelihood and severity. These actions are to communicate (green), monitor (yellow), pause (blue), suspend (orange) or cease (red) operations. Risk levels orange and red are applicable to aircraft performance where pilot qualifications does not impact aircraft performance limitations. When overlaying the knowledge, skills and performance factor graphs onto the risk matrix, the lowest level of performance represents knowledge, the highest skills and the middle is their performance level. A performance level should be above the monitoring (yellow) level for quality assurance of flight operations.