Losing Hubble – Saving Hubble

And I think it’s gonna be a long, long time
‘Til touchdown brings me ’round again to find
I’m not the man they think I am at home
Oh, no, no, no, I’m a rocket man
Rocket man, burning out his fuse up here alone
– Elton John and Bernie Taupin, “Rocketman”

The Hubble Space Telescope’s (HST’s) payload computer entered safe mode on Sunday, June 13, 2021. Hubble’s scientific instruments went dead silent and NASA’s attempts to coax the payload computer back out of safe mode and back to operational status repeatedly failed. NASA retired the only spacecraft capable of reaching and repairing Hubble, the Space Transportation System (colloquially called the Space Shuttle), more than a decade ago.

Although there was some talk of retasking existing spacecraft such as the SpaceX Crew Dragon for an HST repair mission, that talk was blue sky fantasy. The Crew Dragon lacks an airlock needed for extra-vehicular activity (EVA). Only the Space Shuttle was large enough and strong enough to transport a crew and to conduct the EVAs needed to grab and fix the HST. The shuttle had already made five servicing missions to the HST, but now it’s gone. With no Shuttle, we could have lost Hubble for good.

But we didn’t. Not this time, anyway.

After the HST payload computer halted and entered safe mode on June 13, the NASA engineering team started running tests and analyzing test data. Results suggested that the payload computer’s memory module was a possible root cause for the problem because memory errors – an inability to read back from memory what had been written – were one of the observed failure symptoms.

The HST payload computer is a NASA Standard Spacecraft Computer-1 (NSSC-1) system built in the 1980s. This payload computer, located within Hubble’s Science Instrument Command and Data Handling (SI C&DH) unit, controls and coordinates data streams from the HST’s various scientific instruments and monitors their condition.

The Hubble Space Telescope, launched in 1990, has provided humanity with a front-row seat to the cosmos for more than three decades.
Image Credit: NASA

The original NSSC-1 computer was developed by the NASA Goddard Space Flight Center and Westinghouse Electric in the early 1970s for multiple planned missions. The earliest version of this computer incorporated 1700 (that’s one thousand, seven hundred) DTL small-scale, flat-pack ICs from Fairchild Semiconductor (and not some newfangled microprocessor chip). For those of you who are Millennials, DTL was an early logic family introduced in the early 1960s, first by Signetics and then by Fairchild Semiconductor. The abbreviation stands for “Diode Transistor Logic.”

Lockheed Space Systems incorporated the NSSC-1 into the HST’s design after winning the contract to build the spacecraft in 1977. The HST finally launched 13 years later, in 1990. Long before the HST launched, the NSSC-1 processor was redesigned to fit into some very early MSI TTL gate arrays developed by Harris Semiconductor, and then later replaced with functionally equivalent but faster gate arrays from TRW. Each gate array IC incorporated approximately 130 gates of logic.

(Through a series of acquisitions and mergers, Harris Semiconductor is now part of Renesas Electronics. Fairchild Semiconductor has been bought, spun back out, and then purchased again. It’s now part of ON Semiconductor, which was spun out of Motorola Semiconductor. If you can keep track of these corporate changes without Wikipedia to aid you, then your memory is far better than mine.)

The NSSC-1 computer also used 18-bit magnetic-core or plated-wire memory for its main memory, not semiconductor memory. This was old, old technology, but microprocessors and semiconductor memory were just not ready for space-based missions at the time. They were far too new and untried for NASA’s requirements.

The HST’s original SI C&DH unit suffered a processor failure in 2008, so NASA switched control of the scientific mission to the backup payload computer. This failure happened to occur just 17 days before Shuttle mission STS-125, which was originally scheduled to fly to the HST in 2008 for its third on-orbit servicing. NASA delayed the STS-125 mission until May, 2009, which allowed sufficient time to ready an existing backup SI C&DH unit with two new payload computers. STS-125 swapped in the replacement SI C&DH unit and restored the HST’s operational redundancy, which would become most fortuitous more than twelve years later, in 2021.

Here’s how Lead Flight Director Tony Ceccacci described the STS-125 repair mission:

“It’s more like brain surgery than construction. On station spacewalks, you’re installing large pieces of equipment – trusses, modules, etc. – and putting it together like an erector set. You can’t do that with Hubble. Hubble spacewalks are comparable to standing at an operating table, doing very dexterous work.”

STS-125 was a complex repair and upgrade mission that required five EVAs. Each EVA was planned to last about 6.5 hours, but most of the STS-125 EVAs required between seven and eight hours because of the difficult and intricate work. STS-125 was the fifth and final Hubble servicing mission. Its goal was to extend Hubble’s useful life to 2014 and beyond. NASA retired the Shuttle shortly after STS-125, and the remaining Shuttles are now museum pieces. No more Shuttle missions are possible. It’s now 2021, and the Hubble has continued to operate and to deliver priceless scientific data. STS-125 was very successful indeed.

STS-125 upgraded the payload computer’s main memory from magnetic cores to CMOS RAM when it replaced the failed payload computer. Both redundant payload computers in the SI C&DH unit can access any of four independent, 64Kword CMOS memory modules. However, only one CMOS memory module is operational at a time because the NSSC-1 computer’s memory space is limited to 64 Kwords. The other three CMOS memory modules serve as redundant backups. When memory errors recurred during tests while using the other memory modules, NASA engineers concluded that it was likely not a memory failure at all but a different type of hardware failure because simultaneous failure of all four memory modules was highly unlikely.

By now, Hubble’s scientific mission had gone silent for more than a week and the NASA engineers still had no root cause suspect.

NASA engineers performed additional tests on June 23 and 24. These tests switched on the backup payload computer for the first time since it was installed in the HST during the STS-125 repair mission in 2009. After more than a decade of hibernation in space, the backup payload computer fired up successfully. However, test results showed that numerous hardware combinations, using pieces from both the primary and backup payload computers, all experienced the same memory error using any of the four memory modules.

NASA’s official list of Hubble payload computer components includes:

A Central Processing Module (CPM), which processes the commands that coordinate and control the science instruments
A Standard Interface (STINT), which bridges communications between the computer’s CPM and other components
A communications bus, which contains lines that pass signals and data between hardware
One active memory module, which stores operational commands to the instruments. There are three additional modules that serve as backups.

As an old systems engineer with a lot of troubleshooting experience, I immediately saw that the power supply was missing from this list. A janky power supply causes all sorts of failures with funny symptoms, including read/write memory problems.

The first commandment of systems troubleshooting, often quoted by my friend Dave Jones in his EEVblog YouTube videos, is:

“Thou shalt test supply voltages.”

Apparently, that was not possible from the available HST telemetry. Even worse, Fluke doesn’t sell test leads for my period-correct Fluke 77 DMM from the mid-1980s that will reach the HST’s 340-mile orbit. (Yes, I checked. All of Fluke’s test leads seem to measure 1.5 meters – about 340 miles too short.)

The Science Instrument Command and Data Handling (SI C&DH) unit consists of two independent computers – each one capable of processing science data and sending it to earth – with four redundant memory modules and two power units. STS-125 installed this replacement SI C&DH unit in 2009. Image Credit: NASA

By June 30, the NASA Hubble blog reported that the “source of the computer problem lies in the Science Instrument Command and Data Handling (SI C&DH) unit… A few hardware pieces on the SI C&DH could be the culprit(s).” The blog continued:

“The team is currently scrutinizing the Command Unit/Science Data Formatter (CU/SDF), which sends and formats commands and data. They are also looking at a power regulator within the Power Control Unit, which is designed to ensure a steady voltage supply to the payload computer’s hardware.”

With more thinking and testing, and a month after the payload computer went into safe mode, the NASA blog reported on July 14:

“A series of multi-day tests, which included attempts to restart and reconfigure the computer and the backup computer, were not successful, but the information gathered from those activities has led the Hubble team to determine that the possible cause of the problem is in the Power Control Unit (PCU).

“The PCU also resides on the SI C&DH unit. It ensures a steady voltage supply to the payload computer’s hardware. The PCU contains a power regulator that provides a constant five volts of electricity to the payload computer and its memory. A secondary protection circuit senses the voltage levels leaving the power regulator. If the voltage falls below or exceeds allowable levels, this secondary circuit tells the payload computer that it should cease operations. The team’s analysis suggests that either the voltage level from the regulator is outside of acceptable levels (thereby tripping the secondary protection circuit), or the secondary protection circuit has degraded over time and is stuck in this inhibit state.”

A dead power supply is a dreadful thing to ponder when your system is in low earth orbit, 340 miles out. Many companies that repair dead power supplies market themselves on LinkedIn, but none are likely to make on-site repair visits to the HST.

If only Hubble’s payload computer included a redundant power supply in addition to the dual-redundant processor and the quad-redundant memory.

Oh, of course there’s a redundant power supply for the payload computer, in the SI C&DH unit.

A month after the failure, the July 14 NASA blog mentioned: “Because no ground commands were able to reset the PCU, the Hubble team will be switching over to the backup side of the SI C&DH unit that contains the backup PCU.”

Then:

On July 16, the NASA blog reported that engineers had successfully activated the backup PCU and the backup Command Unit/Science Data Formatter (CU/SDF) in the Hubble’s SI C&DH unit.
On July 17, the NASA blog reported that Hubble was now back online, ready to resume its science mission.
On July 19, the science mission resumed. Hubble once again started adding to its library of more than 1.5 million incredible images and observations.

After a month of testing, NASA engineers had made the decision to switch over to the redundant power unit and brought Hubble’s scientific instruments back online in five days. NASA had cautiously and successfully restored Hubble via remote control, and the HST’s scientific spigot was wide open once more. It was NASA’s only repair alternative. Thanks to the redundancy designed into a positively ancient spacecraft by Lockheed engineers more than 40 years ago, along with two decades of periodic maintenance and repairs made by skilled Shuttle crews, NASA’s engineers had saved Hubble.

Notes:

For a terrific 2-hour video discussion about the Hubble Space Telescope’s development, presented by some of the Lockheed engineers who worked on the project for many years, click here. This Webinar was presented in April, 2021 by the Silicon Valley Technology History Committee and the IEEE Life Members Affinity Group of the IEEE Santa Clara Valley Section.
For NASA’s complete blog history of the near-miraculous engineering repair of Hubble, click here.
If you’d like more detail about the NSSC-1 computer, read this.
You may also recall that the Hubble Space Telescope’s main flight and housekeeping computer was replaced with an Intel 80486 DX2 microprocessor in December, 1999 during STS-103. Intel’s latest CEO, Pat Gelsinger, was the chief architect for the 80486 project and was heavily involved in the clock-doubling DX2 version as well.

8 thoughts on “Losing Hubble – Saving Hubble”

jackganssle says:

September 13, 2021 at 10:59 am

Great piece, Steve. Hubble has been a fantastic mission. How many other science programs have their output hanging in art museums? We’ll miss it when it is finally kaput, but let’s hope the James Webb telescope fills in for it (alas, only in the infrared).

Log in to Reply
1. Steven Leibson says:
  
  September 16, 2021 at 8:08 am
  
  Thanks Jack! Great to hear from you. I agree, Hubble’s images are stunning, after the first service mission took up some eyeglasses to fix the mirror’s myopia.
  
  Log in to Reply
beekay says:

September 13, 2021 at 12:04 pm

Fantastic description of events! Thanks to all who keep the data flowing from space!

Question: Now that the system is definitely no longer redundant, what plans, if any, are there to resolve this now “SPOF” or Single Point Of Failure on the Hubble Telescope?

Log in to Reply
1. Steven Leibson says:
  
  September 16, 2021 at 8:10 am
  
  beekay, the planned fix is the Webb telescope, which is a replacement and not a fix. We have no practical way of going up and fixing Hubble now that the US Space Shuttles are retired and decommissioned.
  
  Log in to Reply
gene plichota says:

September 13, 2021 at 5:40 pm

longevity was probably a far second to getting all functions up, at the launch….a tribute to success, that later has come to now

Log in to Reply
1. Steven Leibson says:
  
  September 16, 2021 at 8:12 am
  
  gene plichota, without the five service missions and the dedicated work of Hubble’s ground team, we would never have been able to do the kind of science and get the kind of stunning images that we did. Hubble was launched with a flawed main mirror. Initial images were blurrier than expected. The first service mission took up supplementary lenses to fix the myopic mirror.
  
  Log in to Reply
cowduo says:

September 14, 2021 at 5:26 pm

First rule for debugging space electronic’s bugs: check the grounds. – Kim Rubin

Log in to Reply
1. Steven Leibson says:
  
  September 16, 2021 at 8:13 am
  
  Very droll, Kim Rubin.
  
  Log in to Reply

Losing Hubble – Saving Hubble

Related

8 thoughts on “Losing Hubble – Saving Hubble”

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk