feature article
Subscribe Now

Trying to Keep Big Things in Little Packages

ThreadX Finds a Comfortable Home on FPGAs

Embedded has always been something of a mixed blessing on FPGAs. Certainly FPGAs feature prominently in many embedded systems, but they rarely take the central computing stage. Why? One word: performance.

There are two ways to implement a processor on an FPGA. The most prevalent is to use a soft core like Nios (Altera), MicroBlaze (Xilinx), ARM (Actel, Altera), Coldfire (Altera), or Mico32 (Lattice; open). The alternative is to use one of the built-in PowerPC processors on the high-end Virtex devices from Xilinx.

The soft core approach is often more appealing for FPGA vendors because they don’t have to dedicate silicon to a processor that may or not be used and they don’t have to offer two versions of their chip – one with and one without. In fact Altera started to go down the dedicated processor route and changed their minds pretty quickly. You may hear debated the actual reasons why (sales? unholy mix of software and hardware engineering in the user community?) Bottom line, they decided against it – and consider themselves very successful with their Nios offering.

What’s the downside to the soft cores? It’s that one word again: performance. Clearly, you’re not going to be able to make FPGA gates, connected together in the form of a processor, operate as quickly as a hand-crafted processor. Or even one slapped together out of standard cells. The soft cores do very well where they’re not blazing the trail on speed.

The dedicated processors are certainly faster, and yet, here again, if there’s one thing that holds them back, it’s – you guessed it – performance. They’re faster than the soft cores, but again, at 300-500 MHz, they don’t provide nearly the speed that a dedicated processor chip can provide. So it’s almost a double-fault: they’re less flexible than a soft core but not fast enough to compete with standard chips. They’re stuck in this tweener land, neither fish nor fowl.

In fact, with the Virtex 4 family from Xilinx, designers often chose the high-end devices for such things as serial signaling; the processor came along for the ride even though it wasn’t needed. With Virtex 5, the high-end features were split out – in particular serial communications – so that they were available without a processor.

So with this as context, there are pieces of this business that do better and worse, but, relative to the overall embedded market, it’s not huge. FPGAs are much more commonly used as accelerators that receive handoffs from other off-chip processors when critical algorithms can’t cut the mustard in software.

At first blush, then, it is certainly surprising to hear John Carbone of Express Logic say that they get some of the best performance of their ThreadX operating system on FPGAs. Until you look a level deeper. Many of the soft core designs are simpler functions that don’t need or want to compete with heavy duty computing. In such a design, using a full-up operating system like Linux, with its roots in telecom, or a real-time OS like VxWorks, with its military heritage, can be a real waste of horsepower, chewing up processor cycles and swallowing memory. And that’s being kind. So it stands to reason that a smaller OS might play well here. And that’s where ThreadX is positioned.

This is even truer for small, more cost-sensitive boxes like printers, cameras, and small routers. If FPGAs are going to be used here at all, it will be the small cheap ones, and, on those, soft cores are the only option. A small, cheap device means not too much memory, so again, we need that reduced-footprint OS. And that means an OS with fewer services than the big guys provide.

The specific set of services offered by ThreadX was admittedly arrived at somewhat by successive approximation. As John tells the story, it is the descendant of the Nucleus OS, which had too few services, and which was succeeded by the Nucleus+ OS, which was too rich, giving rise to ThreadX (by the same guy, William Lamie, but now in a different company, Express Logic), which was just right. Call it the Goldilocks OS.

One of the tradeoffs for using ThreadX is that it supports only one process, although it allows multi-threading. Once you start managing threads, guaranteeing real-time performance can get trickier. Threads can be swapped in and out by the OS; just because a thread has a higher performance doesn’t mean it stays in place until done: it can still be pre-empted by a lower-priority thread. Typically, the only way to stop this from happening is to block all pre-emption.

ThreadX has a unique middle way: a pre-emption threshold. If a thread of a high priority is executing and the OS wants to swap in another thread to give it a chance to execute, the new thread must have a priority exceeding a certain threshold (which can be defined) before the pre-emption can take place. In this manner, the highest-priority tasks can be guaranteed deterministic access without completely stopping pre-emption, and time-critical operations can complete without being swapped out. If there are multiple important threads, they can be given the same high priority and then be time-shared.

Another way of providing both higher performance and greater access to processing by threads is to use multiple processors. ThreadX supports an SMP (symmetric multi-processing) configuration, again, with a single multi-threaded process. Threads can be pinned to cores so that, for example, a critical time-sensitive compute-intensive function can be given exclusive access to one core while the rest of the threads share the remaining core or cores.

SMP is generally the simplest way to manage a multicore system, but it requires that each core look identical – same memory (or one shared memory), same everything. The OS has to be able to assign computation to cores without worrying about which cores do what; they should all act alike. In applications where this doesn’t make sense, an AMP (asynchronous multi-processing) setup can be used. But a single OS can’t manage all that: with AMP, each core has its own OS and operates more or less autonomously.

Which doesn’t sound very useful if you’re trying to get these things to act like they’re all on the same team and collaborating in the furtherance of some common good. This is where a messaging system is needed so that the cores can talk to each other. You can roll your own such setup, but that’s a fair bit of infrastructure to have to build – especially when you no longer have to. The MCAPI (Multicore Communications API) standard, approved last spring, provides a messaging paradigm for just this kind of situation. Polycore Software has provided the first (and, to date, only) implementation of the MCAPI standard with their PolyMessenger and PolyGenerator products, which now support ThreadX. It works anywhere ThreadX works, meaning it works on processor cores in FPGAs.

So despite its simplicity, ThreadX supports many FPGA soft cores all the way from simple single-threaded designs up to AMP multicore – for a one-process design. Given the small-footprint nature of the RTOS and the fact that soft cores are easy to instantiate multiple times into a multicore fabric in an FPGA, it makes sense that Express Logic would see FPGAs as an important part of their business. They overlap well the space where soft cores play well.

Links:

ThreadX

Polycore

MCAPI

ARM (Actel)

ARM (Altera)

Coldfire (Altera)

Mico32

Microblaze

Nios

Leave a Reply

featured blogs
Apr 24, 2024
Diversity, equity, and inclusion (DEI) are not just words but values that are exemplified through our culture at Cadence. In the DEI@Cadence blog series, you'll find a community where employees share their perspectives and experiences. By providing a glimpse of their personal...
Apr 23, 2024
We explore Aerospace and Government (A&G) chip design and explain how Silicon Lifecycle Management (SLM) ensures semiconductor reliability for A&G applications.The post SLM Solutions for Mission-Critical Aerospace and Government Chip Designs appeared first on Chip ...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Medical Grade Power
Sponsored by Mouser Electronics and RECOM
In this episode of Chalk Talk, Amelia Dalton and Louis Bouche from RECOM explore the various design requirements for medical grade power supplies. They also examine the role that isolation and leakage current play in this arena and the solutions that RECOM offers in terms of medical grade power supplies.
Nov 9, 2023
21,598 views