MARS CLIMATE ORBITER INVESTIGATION

NASA has released the first report into the Mars Climate Orbiter (MCO) mission failure.  The Mishap Investigation Board (MIB) meetings were held at the Jet Propulsion Laboratory (JPL) between 18 - 22 October 1999.   Through individual brainstorming and group discussion, the MIB determined the root cause, contributing causes and made observations and recommendations for future NASA missions.  The focus was primarily on recommendations to successfully complete the Mars Polar Lander (MPL) mission.

THE ROOT CAUSE

A computer generated file called "SM_FORCES" contained thruster performance data measured in the English units of pound-seconds (lbf-s) instead of the required metric units of Newton-seconds (N-s).   The Mars Surveyor Operations Project (MSOP)  Project Software Interface Specification (SIS) clearly defines format and units of all computer software files for the project.  Ground based software uses the "Small Forces" file to generate an Angular Momentum Desaturation (AMD) file which is used as input to the navigation software algorithms.  The error in units resulted in underestimated spacecraft trajectory models by a factor of 4.45.

CONTRIBUTING CAUSE NO 1
Modelling Of Spacecraft Velocity Changes

Angular momentum management is necessary to keep the spacecraft’s flywheels within their linear range by firing its thrusters.  When this AMD event occurs, data is sent from the spacecraft to ground software and processed by the "Small Forces" file.  The required calculations of the thruster performance are done by the spacecraft itself and by the ground software for comparative purposes.   The mismodeling of the data only occurred in the ground software!

For the first 4 months of the flight, the AMD files were not being used due to multiple file format errors and incorrect altitude specifications!  The navigation team was calculating the trajectory perturbations themselves using timing information from E-mails received from the contractors.  Once the AMD files were fixed in April 1999, they almost immediately noticed the underestimations of trajectory perturbations.  But nothing was done.  My paid proffesion is in writing and testing computer software.  It is standard practice to test the interactions between modules when a bug has been found in one of them.  So when problems were found in the AMD file, the other related modules like the "SM_FORCES" file should have been tested to.  Apparently not!

CONTRIBUTING CAUSE NO 2
Knowledge Of Spacecraft Characteristics

They essentially had monkeys controlling the spacecraft.  A new navigation team took over the controls moments before the launch and did not participate in any of the development or testing of the ground software.   They did  not partake in the Preliminary Design review or the Critical Design review process.  Upon taking over the controls, they were not given important information on the control or AMD of the spacecraft.  Why on Earth would you want to remove the experienced people from the project and replace them with monkeys?  Could a higher power want this mission to fail?  The MIB has recommended the navigation team for the MPL be given additional training and extra support personnel.   So much for NASA's new motto of "Better, Faster, Cheaper".

CONTRIBUTING CAUSE NO 3
Trajectory Correction Manoeuvre (TCM-5)

The TCM-5 is a contingency manoeuvre used to raise the spacecraft to a safer orbiting altitude.  This manoeuvre could have been performed shortly before the fatal Mars Orbit Insertion (MOI) burn to attain a safer approach.   This emergency plan was in place but planning, commitment criteria and tests for this procedure were never even performed!  There was no defined set of Go-No Go criteria for this manoeuvre.  Because of the questions and anomalies surrounding the trajectory data, a TCM-5 manoeuvre was apparently discussed just before the MOI but was never followed through.  The agreement not to perform a TCM-5 was made without consulting any science personnel and was to be the final decision to kill the spacecraft.

The MIB has made recommendations to get the MPL operations team fully prepared and trained to perform the TCM-5 manoeuvres.  This recommendation indicates that emergency training for the MPL has also not been achieved adequately.

CONTRIBUTING CAUSE NO 4
Systems Engineering Process

The MIB has noted that the systems engineering process did not transition from development to operations adequately enough.  This resulted in inadequate contingency planning, communications and understanding of critical operations.  The MIB also observed the fact that there were numerous opportunities to identify the problem in the "Small Forces" file throughout the mission.  The observation was made, but nobody acted on it!

CONTRIBUTING CAUSE NO 5
Communications Among Project Elements

There is evidence in both the MCO and MPL projects of immense communication problems.  Key problem areas were identified between development and operations teams, operations and navigation teams, and project management and technical teams.  Apparently the navigation team was totally unaware of the fact that the spacecraft transmitted the AMD data to Earth, not realising until late in the mission.  They did not follow NASA's Incident, Surprise, Anomaly (ISA) reporting procedure when conflicts arose, choosing instead to rely on E-mail to solve problems!  Why did these people consistently go against official NASA procedures throughout the course of this mission?   Failing to follow procedure contributed to the problem "slipping through the cracks."  More like giant canyons!

There is now trouble in the ranks of the MPL team resulting from the loss of MCO.  Defensive mechanisms have arisin causing further communication problems that require urgent attention.

CONTRIBUTING CAUSE NO 6
Operations Navigation Team Staffing

The MIB have found that the staffing of the navigation team was less than adequate.  The navigation team were not focused on the MCO mission because MSOP was trying to run all three MGS, MCO, and MPL missions at once.   It's also worth noting that the MSOP does not have a mission assurance manager or personnel who's job it would have been to detect and act on all these problems.

CONTRIBUTING CAUSE NO 7
Training Of Personnel

The navigation team had not received sufficient training on the spacecraft’s design and operations.  The MIB recommended that the MPL navigation team undergo "proper training" for its design and operations.  As opposed to "Better, Faster, Cheaper" training?

CONTRIBUTING CAUSE NO 8
Verification And Validation Process

As I said earlier, sufficient end-to-end testing of the "Small Forces" modules were not completed, nor was the rest of the ground software or other related interfaces.  MPL observation number three in the MIB report indicates that final end-to-end verification and validation of the Entry-Descent-Landing procedures had not been completed at the time of this review because acceptance testing was not yet complete for the ground software.  So the MPL is on its way to Mars and the relevant software is not even finished yet.  In the software industry, last minute problems are always found in the acceptance-testing phase of the software life cycle.  These problems can result in significant re-writes of code delaying the delivery date of the software.  In my opinion, the software should have been completed before launch.  There is no recommendation in the report to ensure software is completed before launch, so perhaps this is NASA's standard policy to leave it to the last minute.

SUMMARY

The MIB has made many recommendations for the MPL mission that must be thoroughly carried out to ensure NASA doesn't lose yet another spacecraft.  One recommendation not in this report is the sacking or termination of contract for individuals or companies responsible for losing this spacecraft.  Perhaps this is because nobody knows who's actually running the show.  Lets hope and pray that the MPL doesn't have to suffer the same fate as the MCO.


 News just in:  At the time of writing, the MPL was supposed to have landed by now, but has not returned a signal to indicate a successful landing.  However there are more windows of opportunity to come in the next few days for the MPL to transmit the signal.  Let's hope NASA hasn't stuffed up again.

Home                     BannerLoop.gif (10787 bytes)