feature article
Subscribe Now

Trying to Keep Big Things in Little Packages

ThreadX Finds a Comfortable Home on FPGAs

Embedded has always been something of a mixed blessing on FPGAs. Certainly FPGAs feature prominently in many embedded systems, but they rarely take the central computing stage. Why? One word: performance.

There are two ways to implement a processor on an FPGA. The most prevalent is to use a soft core like Nios (Altera), MicroBlaze (Xilinx), ARM (Actel, Altera), Coldfire (Altera), or Mico32 (Lattice; open). The alternative is to use one of the built-in PowerPC processors on the high-end Virtex devices from Xilinx.

The soft core approach is often more appealing for FPGA vendors because they don’t have to dedicate silicon to a processor that may or not be used and they don’t have to offer two versions of their chip – one with and one without. In fact Altera started to go down the dedicated processor route and changed their minds pretty quickly. You may hear debated the actual reasons why (sales? unholy mix of software and hardware engineering in the user community?) Bottom line, they decided against it – and consider themselves very successful with their Nios offering.

What’s the downside to the soft cores? It’s that one word again: performance. Clearly, you’re not going to be able to make FPGA gates, connected together in the form of a processor, operate as quickly as a hand-crafted processor. Or even one slapped together out of standard cells. The soft cores do very well where they’re not blazing the trail on speed.

The dedicated processors are certainly faster, and yet, here again, if there’s one thing that holds them back, it’s – you guessed it – performance. They’re faster than the soft cores, but again, at 300-500 MHz, they don’t provide nearly the speed that a dedicated processor chip can provide. So it’s almost a double-fault: they’re less flexible than a soft core but not fast enough to compete with standard chips. They’re stuck in this tweener land, neither fish nor fowl.

In fact, with the Virtex 4 family from Xilinx, designers often chose the high-end devices for such things as serial signaling; the processor came along for the ride even though it wasn’t needed. With Virtex 5, the high-end features were split out – in particular serial communications – so that they were available without a processor.

So with this as context, there are pieces of this business that do better and worse, but, relative to the overall embedded market, it’s not huge. FPGAs are much more commonly used as accelerators that receive handoffs from other off-chip processors when critical algorithms can’t cut the mustard in software.

At first blush, then, it is certainly surprising to hear John Carbone of Express Logic say that they get some of the best performance of their ThreadX operating system on FPGAs. Until you look a level deeper. Many of the soft core designs are simpler functions that don’t need or want to compete with heavy duty computing. In such a design, using a full-up operating system like Linux, with its roots in telecom, or a real-time OS like VxWorks, with its military heritage, can be a real waste of horsepower, chewing up processor cycles and swallowing memory. And that’s being kind. So it stands to reason that a smaller OS might play well here. And that’s where ThreadX is positioned.

This is even truer for small, more cost-sensitive boxes like printers, cameras, and small routers. If FPGAs are going to be used here at all, it will be the small cheap ones, and, on those, soft cores are the only option. A small, cheap device means not too much memory, so again, we need that reduced-footprint OS. And that means an OS with fewer services than the big guys provide.

The specific set of services offered by ThreadX was admittedly arrived at somewhat by successive approximation. As John tells the story, it is the descendant of the Nucleus OS, which had too few services, and which was succeeded by the Nucleus+ OS, which was too rich, giving rise to ThreadX (by the same guy, William Lamie, but now in a different company, Express Logic), which was just right. Call it the Goldilocks OS.

One of the tradeoffs for using ThreadX is that it supports only one process, although it allows multi-threading. Once you start managing threads, guaranteeing real-time performance can get trickier. Threads can be swapped in and out by the OS; just because a thread has a higher performance doesn’t mean it stays in place until done: it can still be pre-empted by a lower-priority thread. Typically, the only way to stop this from happening is to block all pre-emption.

ThreadX has a unique middle way: a pre-emption threshold. If a thread of a high priority is executing and the OS wants to swap in another thread to give it a chance to execute, the new thread must have a priority exceeding a certain threshold (which can be defined) before the pre-emption can take place. In this manner, the highest-priority tasks can be guaranteed deterministic access without completely stopping pre-emption, and time-critical operations can complete without being swapped out. If there are multiple important threads, they can be given the same high priority and then be time-shared.

Another way of providing both higher performance and greater access to processing by threads is to use multiple processors. ThreadX supports an SMP (symmetric multi-processing) configuration, again, with a single multi-threaded process. Threads can be pinned to cores so that, for example, a critical time-sensitive compute-intensive function can be given exclusive access to one core while the rest of the threads share the remaining core or cores.

SMP is generally the simplest way to manage a multicore system, but it requires that each core look identical – same memory (or one shared memory), same everything. The OS has to be able to assign computation to cores without worrying about which cores do what; they should all act alike. In applications where this doesn’t make sense, an AMP (asynchronous multi-processing) setup can be used. But a single OS can’t manage all that: with AMP, each core has its own OS and operates more or less autonomously.

Which doesn’t sound very useful if you’re trying to get these things to act like they’re all on the same team and collaborating in the furtherance of some common good. This is where a messaging system is needed so that the cores can talk to each other. You can roll your own such setup, but that’s a fair bit of infrastructure to have to build – especially when you no longer have to. The MCAPI (Multicore Communications API) standard, approved last spring, provides a messaging paradigm for just this kind of situation. Polycore Software has provided the first (and, to date, only) implementation of the MCAPI standard with their PolyMessenger and PolyGenerator products, which now support ThreadX. It works anywhere ThreadX works, meaning it works on processor cores in FPGAs.

So despite its simplicity, ThreadX supports many FPGA soft cores all the way from simple single-threaded designs up to AMP multicore – for a one-process design. Given the small-footprint nature of the RTOS and the fact that soft cores are easy to instantiate multiple times into a multicore fabric in an FPGA, it makes sense that Express Logic would see FPGAs as an important part of their business. They overlap well the space where soft cores play well.

Links:

ThreadX

Polycore

MCAPI

ARM (Actel)

ARM (Altera)

Coldfire (Altera)

Mico32

Microblaze

Nios

Leave a Reply

featured blogs
Aug 3, 2021
I just discovered that Norland Nannies -- who can command a salary of $170,000 on a bad day -- are trained in self-defense and defensive driving....
Aug 3, 2021
Picking up from where we left off in the previous post , let's look at some more new and interesting changes made in Hotfix 019. As you might already know, Allegro ® System Capture is available... [[ Click on the title to access the full blog on the Cadence Community si...
Jul 30, 2021
You can't attack what you can't see, and cloaking technology for devices on Ethernet LANs is merely one of many protection layers implemented in Q-Net Security's Q-Box to protect networked devices and transaction between these devices from cyberattacks. Other security technol...
Jul 29, 2021
Learn why SoC emulation is the next frontier for power system optimization, helping chip designers shift power verification left in the SoC design flow. The post Why Wait Days for Results? The Next Frontier for Power Verification appeared first on From Silicon To Software....

featured video

Accelerate Intelligent SLAM with DesignWare ARC EV Processor IP

Sponsored by Synopsys

Simultaneous localization and mapping (SLAM) algorithms build a map and determine location in the map at the same time. But how can you speed up the results? This demo shows how ARC EV processor IP with CNN engine accelerates KudanSLAM algorithms.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

PrimeLib Next-Gen Library Characterization - Providing Accelerated Access to Advanced Process Nodes

Sponsored by Synopsys

What’s driving the need for a best-in-class solution for library characterization? In the latest Synopsys Designer’s Digest, learn about various SoC design challenges, requirements, and innovative technologies that deliver faster time-to-market with golden signoff quality. Learn how Synopsys’ PrimeLib™ solution addresses the increase in complexity and accuracy needs for advanced nodes and provides designers and foundries accelerated turn-around time and compute resource optimization.

Click to read the latest issue of Designer's Digest

featured chalk talk

SN1000 SmartNIC

Sponsored by Xilinx

Cloud providers face a variety of challenges with moving data from one place to another. In modern data centers, flexibility is a key consideration - on par with performance. Software-defined hardware acceleration offers a major breakthrough in flexibility. In this episode of Chalk Talk, Amelia Dalton chats with Kartik Srinivasan of Xilinx about the details of Smart NICs with the new Alveo SN1000 with composable hardware.

Click here for more information about the Alveo SN1000 - The Composable SmartNIC