System Management – Not Sexy, But Critical

Your boss begins to drone on about the system maintenance check list items during your weekly meeting. Power initialization and sequencing, reset management, voltage and current monitoring, system clocking, data logging, remote communications, diagnostics and prognostics, SRAM field-programmable gate array (FPGA) management, errors and alarms, thermal management, ID and authentication, and microcontroller (MCU) boot loading. Just before you nod off in perfect sleep, you hear the words “critical,” “operational” and “job in jeopardy”…

The above is a collection of seemingly unrelated tasks with the goal of ensuring the proper operation of the system. All part of system management, these tasks focus on maximizing system uptime, identifying and communicating alert conditions, and logging data and alarm conditions. Sexy? No. Critical? Yes.

System management continues to gain importance in the design of all electronic systems. Smaller process geometries drive more multi-volt devices and are more susceptible to voltage and temperature fluctuations. Boards must also be able to initiate corrective action when fault conditions occur. Driven by the need to increase system up-time and reliability, many systems are adding in system diagnostics and prognostics, not only to help debug systems that have failed but also to identify potential failures before they arise. In markets driven by standards, reliability and uptime are key metrics by which OEMs can differentiate themselves.

Current Implementations

Often, today’s system management implementations require a large number of discrete components (sometimes numbering in the hundreds), occupy large amounts of board space, and are inflexible to change. Although standards are rapidly being developed and adopted (ATCA, MicroTCA and IPMI), most of today’s implementations are proprietary, having evolved over time within an organization. These solutions are a collection of fixed function chips and discrete components that must work in concert to create a cohesive solution: CPLD, real-time clock, power sequencer, temperature monitor, fan controller, nonvolatile memory, clock generators and configuration memory.

In addition to consuming board space, the large number of components adds both direct (unit, assembly and inventory costs) and indirect (design time, procurement and discontinuation) costs. Increased component count also increases the costs and risks associated with device and system failure. Furthermore, these hardware-implemented discrete solutions often require component changes and/or board respins even for incremental design changes, requiring requalification and making it impossible to create platform solutions.

Despite some of the costs and risks involved, today’s typical system can often simply maintain the system. However, another limitation presents itself when historical performance or failures need to be tracked without adding additional components. If failures didn’t happen, this wouldn’t be an issue, but who hasn’t received a failed board back from field testing (yes, it is okay to admit they sometimes fail). The error message is typically something lacking detail and explanation, such as “it stopped”, leaving the designer to piece two and two together on his or her own. Therefore, the ability to put together a “black box” for the board would save valuable time and effort when trying to identify failure modes and “design weak spots.”

The Changing Landscape

System management isn’t the stuff childhood dreams are made of. No one “enjoys” system management. No one wants to monitor and manually adjust temperatures when the system is overheating. No one wants to spend the day monitoring systems for error conditions and personally sending alerts when an error occurs – talk about shooting the messenger. However, it is critical that it happens and it happens correctly and in a timely fashion.

So, designers and system engineers continue to search for ways to improve the way systems are managed. Maintaining the proper environmental operating conditions is a key component to system management. For example, today’s intelligent systems not only monitor and manage thermal conditions, but they will also distribute system traffic to better balance the system and maximize performance. Innovative designers are looking to analyze system trends during operation. By analyzing how a particular parameter varies over the life of the board it is possible to predict a failure before they occur, thereby increasing system up time.

Designers have long dreamed — often during those interminable weekly or monthly management meetings — of a design workaround that would ease the system management burden. Could a flexible solution that significantly reduces the number of external components required to maintain optimal system conditions really exist? Would this dramatically reduce board space, component count and costs and increase system reliability enough to make a difference?

As it happens, a flexible, single-chip, mixed-signal FPGA can perform a host of system management tasks and offer integration that can enable replacement of hundreds of discrete components at less than 50 percent of the cost and space while maintaining system reliability. Just one of these platforms can integrate and perform all system management functions, removing “the pain” and burden currently associated with system management.

Sexy? Yes.

Click here for printable PDF
(By clicking on this link you agree to FPGA Journal’s Terms of Use for PDF files. PDF files are supplied for the private use of our readers. Republication, linking, and any other distribution of this PDF file without written permission from Techfocus Media, Inc. is strictly prohibited.)