feature article
Subscribe Now

Trying to Keep Big Things in Little Packages

ThreadX Finds a Comfortable Home on FPGAs

Embedded has always been something of a mixed blessing on FPGAs. Certainly FPGAs feature prominently in many embedded systems, but they rarely take the central computing stage. Why? One word: performance.

There are two ways to implement a processor on an FPGA. The most prevalent is to use a soft core like Nios (Altera), MicroBlaze (Xilinx), ARM (Actel, Altera), Coldfire (Altera), or Mico32 (Lattice; open). The alternative is to use one of the built-in PowerPC processors on the high-end Virtex devices from Xilinx.

The soft core approach is often more appealing for FPGA vendors because they don’t have to dedicate silicon to a processor that may or not be used and they don’t have to offer two versions of their chip – one with and one without. In fact Altera started to go down the dedicated processor route and changed their minds pretty quickly. You may hear debated the actual reasons why (sales? unholy mix of software and hardware engineering in the user community?) Bottom line, they decided against it – and consider themselves very successful with their Nios offering.

What’s the downside to the soft cores? It’s that one word again: performance. Clearly, you’re not going to be able to make FPGA gates, connected together in the form of a processor, operate as quickly as a hand-crafted processor. Or even one slapped together out of standard cells. The soft cores do very well where they’re not blazing the trail on speed.

The dedicated processors are certainly faster, and yet, here again, if there’s one thing that holds them back, it’s – you guessed it – performance. They’re faster than the soft cores, but again, at 300-500 MHz, they don’t provide nearly the speed that a dedicated processor chip can provide. So it’s almost a double-fault: they’re less flexible than a soft core but not fast enough to compete with standard chips. They’re stuck in this tweener land, neither fish nor fowl.

In fact, with the Virtex 4 family from Xilinx, designers often chose the high-end devices for such things as serial signaling; the processor came along for the ride even though it wasn’t needed. With Virtex 5, the high-end features were split out – in particular serial communications – so that they were available without a processor.

So with this as context, there are pieces of this business that do better and worse, but, relative to the overall embedded market, it’s not huge. FPGAs are much more commonly used as accelerators that receive handoffs from other off-chip processors when critical algorithms can’t cut the mustard in software.

At first blush, then, it is certainly surprising to hear John Carbone of Express Logic say that they get some of the best performance of their ThreadX operating system on FPGAs. Until you look a level deeper. Many of the soft core designs are simpler functions that don’t need or want to compete with heavy duty computing. In such a design, using a full-up operating system like Linux, with its roots in telecom, or a real-time OS like VxWorks, with its military heritage, can be a real waste of horsepower, chewing up processor cycles and swallowing memory. And that’s being kind. So it stands to reason that a smaller OS might play well here. And that’s where ThreadX is positioned.

This is even truer for small, more cost-sensitive boxes like printers, cameras, and small routers. If FPGAs are going to be used here at all, it will be the small cheap ones, and, on those, soft cores are the only option. A small, cheap device means not too much memory, so again, we need that reduced-footprint OS. And that means an OS with fewer services than the big guys provide.

The specific set of services offered by ThreadX was admittedly arrived at somewhat by successive approximation. As John tells the story, it is the descendant of the Nucleus OS, which had too few services, and which was succeeded by the Nucleus+ OS, which was too rich, giving rise to ThreadX (by the same guy, William Lamie, but now in a different company, Express Logic), which was just right. Call it the Goldilocks OS.

One of the tradeoffs for using ThreadX is that it supports only one process, although it allows multi-threading. Once you start managing threads, guaranteeing real-time performance can get trickier. Threads can be swapped in and out by the OS; just because a thread has a higher performance doesn’t mean it stays in place until done: it can still be pre-empted by a lower-priority thread. Typically, the only way to stop this from happening is to block all pre-emption.

ThreadX has a unique middle way: a pre-emption threshold. If a thread of a high priority is executing and the OS wants to swap in another thread to give it a chance to execute, the new thread must have a priority exceeding a certain threshold (which can be defined) before the pre-emption can take place. In this manner, the highest-priority tasks can be guaranteed deterministic access without completely stopping pre-emption, and time-critical operations can complete without being swapped out. If there are multiple important threads, they can be given the same high priority and then be time-shared.

Another way of providing both higher performance and greater access to processing by threads is to use multiple processors. ThreadX supports an SMP (symmetric multi-processing) configuration, again, with a single multi-threaded process. Threads can be pinned to cores so that, for example, a critical time-sensitive compute-intensive function can be given exclusive access to one core while the rest of the threads share the remaining core or cores.

SMP is generally the simplest way to manage a multicore system, but it requires that each core look identical – same memory (or one shared memory), same everything. The OS has to be able to assign computation to cores without worrying about which cores do what; they should all act alike. In applications where this doesn’t make sense, an AMP (asynchronous multi-processing) setup can be used. But a single OS can’t manage all that: with AMP, each core has its own OS and operates more or less autonomously.

Which doesn’t sound very useful if you’re trying to get these things to act like they’re all on the same team and collaborating in the furtherance of some common good. This is where a messaging system is needed so that the cores can talk to each other. You can roll your own such setup, but that’s a fair bit of infrastructure to have to build – especially when you no longer have to. The MCAPI (Multicore Communications API) standard, approved last spring, provides a messaging paradigm for just this kind of situation. Polycore Software has provided the first (and, to date, only) implementation of the MCAPI standard with their PolyMessenger and PolyGenerator products, which now support ThreadX. It works anywhere ThreadX works, meaning it works on processor cores in FPGAs.

So despite its simplicity, ThreadX supports many FPGA soft cores all the way from simple single-threaded designs up to AMP multicore – for a one-process design. Given the small-footprint nature of the RTOS and the fact that soft cores are easy to instantiate multiple times into a multicore fabric in an FPGA, it makes sense that Express Logic would see FPGAs as an important part of their business. They overlap well the space where soft cores play well.





ARM (Actel)

ARM (Altera)

Coldfire (Altera)




Leave a Reply

featured blogs
Oct 15, 2021
We will not let today's gray and wet weather in Fort Worth (home of Cadence's Pointwise team) put a damper on the week's CFD news which contains something from the highbrow to the... [[ Click on the title to access the full blog on the Cadence Community site. ...
Oct 13, 2021
How many times do you search the internet each day to track down for a nugget of knowhow or tidbit of trivia? Can you imagine a future without access to knowledge?...
Oct 13, 2021
High-Bandwidth Memory (HBM) interfaces prevent bottlenecks in online games, AI applications, and more; we explore design challenges and IP solutions for HBM3. The post HBM3 Will Feed the Growing Need for Speed appeared first on From Silicon To Software....
Oct 4, 2021
The latest version of Intel® Quartus® Prime software version 21.3 has been released. It introduces many new intuitive features and improvements that make it easier to design with Intel® FPGAs, including the new Intel® Agilex'„¢ FPGAs. These new features and improvements...

featured video

What are V³Link SerDes?

Sponsored by Texas Instruments

V³Link ICs are ultra-low latency SerDes that aggregate video, clock, control and GPIO data into a single-wire bidirectional bridge between industry-standard interfaces. Vision-based designs can use V³Link devices to achieve higher resolution, extend cable reach up to 15 meters and reduce system size, weight and power. Learn about the basics of V³Link technology and explore typical applications for V³Link in this training video.

Click here for more information

featured paper

Improving Design Robustness and Efficiency for Today’s Advanced Nodes

Sponsored by Synopsys

Learn how designers can take advantage of new ways to efficiently pinpoint voltage bottlenecks, drive voltage margin uniformity, and uncover opportunities to fine-tune operating voltages using PrimeShield design robustness solution.

Click to read the latest issue of Designer's Digest

featured chalk talk

Seamless Ethernet to the Edge with 10BASE-T1L Technology

Sponsored by Mouser Electronics and Analog Devices

In order to keep up with the breakneck speed of today’s innovation in Industry 4.0, we need an efficient way to connect a wide variety of edge nodes to the cloud without breaks in our communication networks, and with shorter latency, lower power, and longer reach. In this episode of Chalk Talk, Amelia Dalton chats with Fiona Treacy from Analog Devices about the benefits of seamless ethernet and how seamless ethernet’s twisted single pair design, long reach and power and data over one cable can solve your industrial connectivity woes.

Click here for more information about Analog Devices Inc. ADIN1100 10BASE-T1L Ethernet PHY