THE AOA PROBLEM
WHAT WE CAN DO ABOUT IT
By Captain Shem Malmquist
AN FSI COMMENTARY
The following is based only on an analysis and implications of the FAA airworthiness directive (AD) issued in the wake of the accident and is not intended to be speculative on the accident itself. Other factors unknown to me may be involved.
In the wake of the October 29 Indonesian crash of a brand new Boeing 737 MAX 8 that took the lives of 189 passengers, the FAA has issued Emergency Airworthiness Directive (AD) 2018-23-51. The 737 is the most widely flown aircraft in the world, This tragedy opens an important conversation between regulators, operators and pilots.
Lion Air, an experienced 737 operator, was the launch carrier last year for the 737 MAX 8 and the MAX 9 in March. While it will take a long time to analyze the Lion Air 610 accident, the AD points out that current system architecture has created vulnerabilities.
Anyone who flies modern jet aircraft, as I do, also knows that in some ways this conversation applies to every plane and every pilot. Attempts to assign blame to anyone at any point in this investigation sidesteps a much more important issue, one that is the essential to the future of ever more automated cockpits.
The FAA says its AD was “prompted by analysis performed by the manufacturer showing that if an erroneously high single angle of attack (AOA) sensor input is received by the flight control system, there is a potential for repeated nose-down trim commands of the horizontal stabilizer. This condition, if not addressed, could cause a flight crew to have difficulty controlling the airplane, and lead to excessive nose-down attitude, significant altitude loss, and possible impact with terrain.”
As described in the Seattle Times, “The system called MCAS, for Maneuvering Characteristics Augmentation System, is activated when a sensor on the side of the fuselage indicates a dangerously high angle of attack (AOA), the angle between the air flow and the wing.
“If the plane is in an abnormally steep turn that puts high stress on the air frame, or when its speeds fall so low it’s about to stall, MCAS will kick in and swivel the horizontal tail to push the nose of the airplane down in an effort to avert the danger”.
While the Seattle Times article incorrectly implies that the system is based on speed or “high stress on the air frame, ” the system description appears to be essentially correct. Low airspeed or a higher load factor (which can occur in a steep turn or pull up from a dive) are among the possible reasons the angle of attack can approach a stall.
Unlike other critical components such as air speed indicators or altimeters which have comparator systems that cross check each other for spurious indications and alert the pilot that there is a mismatch, pilots have no way to quickly determine if they are being misled by a faulty AOA sensor.
As with erroneous airspeed or altitude readings, the loss of the sensor itself leads to loss of secondary systems and/or can trigger other warning systems. Even on the most advanced state of the art aircraft there is no direct feedback to the pilots when the AOA sensor itself has failed. Pilots must quickly infer a faulty AOA sensor from other faults or indications.
Underlying this problem is the fact that a computer software system does not “fail” like a mechanical system. It can be incorrectly coded, or it can be incorrectly designed, but the system does not “fail” like a turbine blade that rips apart in flight. Generally, what we see is that the software was coded correctly based on the requirements provided to the people coding the software but the problem lies in the requirements and specifications provided to them. If a certain scenario was not considered in the requirements it is unlikely to find its way into the final computer coding.
The AD describes an emergency scenario where a sensor reads an erroneously high AOA and the software reacts as its designers intended. The software responds to the erroneous indication in a manner similar to the way a human might react. However, all the pilot sees is the final result. How the computer came to take an action is opaque. This makes it very difficult to crosscheck the computer’s process model (decision making process).
As Boeing and the FAA AD explain, the bad AOA sensor leads to several problems. The erroneously high indication of AOA first leads to an autopilot disconnect. The system then works to prevent a stall by adding nose-down trim. So how does this affect the process model (mental model) for the pilots?
It is standard in the Boeing aircraft that the stabilizer trim can be stopped by moving the control column in the opposite direction. Aircraft designers assume that no pilot would intentionally trim the aircraft nose up while also pushing forward on the controls to pitch the aircraft down or vice versa.
However, in the case of the B-737 MAX 8 and 9 there are reports that reversing the control column (pulling back) won’t work to stop the stabilizer trim from trimming nose-down in the scenario described in the AD. Others have discussed the rationale behind this design decision, but suffice to say that this would be different than what a pilot would be expecting based on previous experience on other Boeing 737 models. The erroneous AOA could trigger both an erroneous stall warning and a pitch down (due to the MCAS trimming the horizontal stabilizer).
This gets a lot more complicated when you consider how the FAA defines a stall condition for a transport category airplane (adapted from Title 14 CFR 25.201):
Full stall condition – any one, or combination, of the following:
– A nose-down pitch that cannot be readily arrested, which may be accompanied by an uncommanded rolling motion
– Buffeting of a magnitude and severity that is a strong and effective deterrent to further increase in angle of attack
– The pitch control reaches the aft stop for 2 sec and no further increase in pitch attitude occurs when the control is held full aft, which can lead to an excessive descent rate
– Activation of a stall identification device (e.g., stick pusher)
As can be seen, the condition described in the AD would present at least two of the criteria. First is the “nose-down pitch that cannot be readily arrested” (because the pilots were not previously aware that the system was intentionally doing that due to the erroneous sensor) and second is the “activation of a stall identification device,” in this case, a stick shaker, also due to the same erroneous sensor. The pilot could effectively be misled as to what is actually going on by the software system.
The AD also implies that it is possible that the trim cutout switches (guarded switches that disconnect electrical power from the trim system) may not work, stating:
“If relaxing the column causes the trim to move, set stabilizer trim switches to CUTOUT. If runaway continues, hold the stabilizer trim wheel against rotation and trim the airplane manually.”
Pilots are often our own worst enemy, with some contending that the situation should have been obvious, the aircraft attitude was nominal and airspeed normal. Such Monday morning quarterbacking suggests hindsight bias. The pilot placed in the middle of this situation does not have the benefit of knowing the outcome. They see the aircraft pitching down and are getting a stall warning. There has been considerable emphasis on stall recovery in the wake of the Air France 447 accident. In the aftermath of that training, pilots are being trained that a stall in a transport airplane is not always apparent nor do all stalls provide the kind of cues pilots might expect based on previous experience. Simulators are not able to fully replicate a real stall in a transport airplane, hence the training emphasizes respecting the stall warning system.
Of course this creates a new quandary. Consider a crew who incorrectly believes they are in a stall situation analogous to the Air France 447 accident, with the nose attitude at a nominal state but the actual AOA is quite high. They might try to recover by pushing over. In other words, the system is tricking the pilot into believing they might be in a non-existent deep stall. Absent any flight deck indication that the information they are relying on is wrong, it would be difficult to pass judgment on a pilot that is following their training.
Perhaps we need to consider adding a flight display alert that prominently shows an AOA failure with a mismatch AOA alert. This approach would parallel similar alerts for airspeed or altitude indication failures. Accomplishing this would be fairly straight forward. Most transport airplanes have at least two, sometimes three, AOA vanes and sensor systems. A system such as outlined by Ossmann and Joos (2017) would be one possible solution:
An advanced fault detection and diagnosis (FDD) system to monitor the triplex redundant angle of attack measurement of a commercial large transport aircraft has been presented. The FDD system incorporates signal- and model-based fault detection algorithms. Fault isolation is achieved by an individual monitoring of the three angle of attack sensors.
An alert would be valuable in any case. This is especially true when we consider what happened with other AOA failure events, such as occurred on the Airbus that led the system protection systems to make extreme maneuvers on Qantas 72. (https://en.wikipedia.org/wiki/Qantas_Flight_72).
Such an alerting system would provide the pilots with the information they need to disconnect flight computers or other actions as appropriate. This should be combined with ensuring pilots understand all of the functionality of the system so they would recognize all a particular sensor failure might impact.
Every flight depends on pilots to “fix” problems that designers did not anticipate, be they in aircraft design, procedures or the entire system design. Give the pilot the information and skills to do that. Give the pilot information that the system has an erroneous input via its sensing system.
How can we prevent future problem like this? A systems approach to analysis would be a good start. Identifying the needs up front prior to writing the requirements for the software has to happen. Implementing System Theoretic Accident Models and Processes (STAMP) would likely be the best solution we have at present. The majority of current risk analysis methods (FTA, Bow-Tie, FMEA, FMECA, PRA,, HFACS, ARP 4761, MIL-STD-882 etc.) are just not up to the task for finding complex system interaction problems as has been described here. Nor are those methods well suited to identify problems in systems that rely on humans and software. STAMP (see http://psas.scripts.mit.edu/home/) can provide a way forward.
Knowledge can keep you alive.
Captain Shem Malmquist is a veteran 777 captain and accident investigator. He is coauthor of Angle of Attack: Air France 447 and The Future of Aviation Safety and teaches an online high altitude flying course with Beyond Risk Management and Flight Safety Information.
Copyright © Shem Malmquist 2018.