Parallel Cores and C

Did you hear the story about the fabless semiconductor company that began shipping a family of 10 different devices within months of being founded?

Or the story about the company that developed and shipped a USB 2.0 High Speed device endpoint, supporting multiple audio channels with DSP audio enhancement, in not much more than a couple of months?

Or the company that claims to offer a field programmable device that can be designed in days, programmed in seconds, has a sleep mode of less than 500μW and a selling price of under $5?

No? You haven’t been exposed to the XMOS team then. XMOS is a company that is aiming their programmable devices directly at the consumer market. In their argument, FPGAs are getting too big, designing them is getting increasingly complicated, following the ASIC design curve, and below these is a huge gap, which XMOS aims to fill.

So, what do they have that will fill the gap? For a while the company was using the phrase “software defined silicon.” That seems to have fallen out of fashion; instead they are talking about chips that can be used “to build complete systems combining interface, DSP and control functions entirely in software.” At first glance this seems counter-intuitive: conventionally, when you need performance today, you tend to move from implementation in software to implementation in silicon. But XMOS says that anything that can be described in C, such as a TCP/IP stack or a USB interface, can be implemented in an XMOS device.

Last week Kenton Williston put the case, strongly, for software innovation as a way of getting the improvements in field programmability that were previously a fallout from Moore’s Law. And, since FPGA Journal functions just like a well-oiled machine, here is a company that uses software in just that way. One option that Kenton put forward was to abandon an FPGA fabric and instead use clusters of processor cells, surrounded by IO, and gave some examples of special-purpose devices that have followed this approach. XMOS has deliberately gone for a general-purpose approach, and, although it is aiming initially at the consumer sector, there are many other applications where the approach would be valuable.

The starting point for XMOS is the XCore, a unit that includes an event-driven processor that runs up to eight threads and provides 400 MIPS, 64 Kbytes SRAM, one-time programmable memory, JTAG, IO ports and, a vital part of the approach, 32 “channel ends.” These channel ends connect to a switch that connects to other XCores through XMOS Links, simple serial interconnect. The other cores can be on the same die or on other die.

There are two flavours of core, G and L. The G family was launched in late 2007 and is available in 2-core (XS1-G2) and 4-core (XS1-G4) versions. Just introduced is the L family, with single- and dual-core versions in 65nm silicon, compared to the G families’ 90nm. This gives much lower power consumption, and the opportunity has been taken to add Active Energy Conservation, so the device will drop from active (at 15-200 mW) to standby (15mW) to sleep modes (500μW) automatically. In active mode, the system can down-rate the system clock speed to match the application requirement, and the system clock (which is rated at 400 MHz) can be altered in software. (There was a strong hint that the 400 MHz was a conservative measure and that over-clocking was feasible.)

Thread assignment is very flexible and depends on what functions the core is carrying out. If it is running a single application that can be decomposed into several sub-applications, then each can have its own thread. If there are several applications, each can have one or more thread. Each thread is dormant until an event triggers it. If all threads are dormant, then the L family core can move into standby or sleep mode, rebooting in 3 msecs.

If only a few of the threads are assigned, then each thread has potentially more share of the available 400 MIPS processing throughput. If all 8 threads are assigned, the potential maximum is 50 MIPS/thread, but if only two are running, each has the potential to run at 200 MIPS.

But thread assignment doesn’t stop with the single core. As we have seen, XMOS provides two- and four-core devices and, using links, can add more and more devices: this means that threads can be spread over multiple cores and the application throughput can be dramatically increased.

Cores are programmed in C, C++ or XMOS’s own XC. This is a C based language that includes features for concurrency and parallel execution. All functions for a system are defined in software, including many functions that traditionally have been implemented in hardware; for example, interfaces and I/O controllers. Conversely, if the developer wants to use an operating system, this can be assigned its own thread or threads.

Thread assignment is undertaken after compilation by a linker and mapper, which optimises assignments according to the resources available, although manual assignment is an option.

XMOS has thought things through and provides a solid infrastructure. There are simple boards in development kits for the two families of chip, at $99.00 for the first board and $69.00 for all subsequent boards. The kit for the XS1-L1, for example, has on the board a single core XS1-L1, a USB connection, 16 user-configurable LEDs, four push-button switches, and a speaker for use with a software-driven 1-bit DAC. Moreover, it comes with five XS-1 L1 devices. A step further up is the XDK, which looks like a black iPOD on steroids and comes with a range of connection options and a built in codec. And there are reference design kits, including an LED tile kit and an implementation of the Ethernet Audio Visual Bridging (AVB) standards, with four SDKs preloaded with all the software needed and an Ethernet switch.

Development tools are in Linux, Windows and Mac flavours and come with an Eclipse-based IDE, a set of compilers (XC, C and C++), an XMOS extension of the GDB to cope with multi-thread and multi-core, a cycle-based simulator, and a bundle of board and manufacturing utilities. And these are all free to download. You can develop systems and run them on the simulator without having to buy any hardware.

Documentation is comprehensive, covering product descriptions, tutorials, reference manuals, and even board schematics; it looks clean and reads well. The web site is extensive, and there is a forum/user area with a range of programmes, application source code, tips and tricks, and LOL Cats.

So back to our starting claims. XMOS is shipping devices to a start-up fabless semiconductor company that is putting on its own branding and shipping as its own products. Unfortunately, as you might expect, the company does not want to be named.

Again, the audio DSP company doesn’t want to be named, but clearly the third company is XMOS. They have been shipping silicon for nine months now and claim to be shipping in volume and to be close to day-to-day profitability. Their road map includes further variants of the basic devices – with more memory, for example. And of course they can pack more cores onto a single die to create fast and powerful products.

As you may have guessed, I am enthusiastic about XMOS and the team. There are two reasons for this. The first is that they are making parallel computing simple and straightforward, building on CTO and founder David May’s more than thirty years of involvement in the problem. Secondly, they appear to be doing a lot of things right, with free tools, cheap silicon and a very open approach: open source software, large amounts of information on the web site, even a public price list.

Over the years we have seen first ASIC and then FPGA design tools rise from schematics to higher levels of abstraction. XMOS starts with a high level language, but it doesn’t need to go through translation to RTL and all the refinements needed for FPGAs. It is true there are cheaper FPGAs and there are lower-power ones. But the very fast development time and flexibility makes the XMOS approach interesting. And, for the first time in many years, you can get started in system development for minimal outlay.

Evaluating XMOS can be done in a few days – it has to be worth looking at.

Parallel Cores and C

Related

Leave a Reply Cancel reply

featured chalk talk