Multicore Momentum

A couple years ago a small ragtag conference took place in Santa Clara just before the relative behemoth Embedded Systems Conference. “Ragtag” might be a bit unfair, but it seemed that way only when compared to the much larger and better-funded conferences; perhaps “scrappy” is a better characterization. This was just a start, the first edition of the Multicore Expo, and at the time, many in multicore seemed to be grasping for relevance. The participants were all sure multicore was guaranteed in the future, but there was no swagger in the strut. Would multicore go mainstream this year? Next year?

The 2008 edition just finished, and one step into the presentation room provided a very different feel. According to organizer Markus Levy, this year’s conference was about double the size of last year’s. And the atmosphere was much more confident; this is the real deal. They had to add chairs to some of the conference rooms to accommodate the number of people in attendance.

The Multicore Expo, as the name suggests, focuses on multicore issues exclusively. In particular, embedded multicore is the central issue; there’s not so much interest in discussing how to get Microsoft Excel to run faster on your quad-core desktop box. While multicore has historically been used in embedded applications more than elsewhere, those have typically been very specialized applications using very specialized processors, with a small cadre of very specialized programmers that knew how to get performance using some very complex specialized programming models. However, now that you pretty much can’t buy a sophisticated microprocessor that consists of a single core anymore, everyone is being forced into multicore.

Issues surrounding multicore can be divided roughly into two categories. The obvious one deals with writing new programs for multicore platforms, including architecture, implementation, and validation. However, no matter how well that problem is solved, the other category covers the big objection in the sales process: how to handle legacy code. It’s one thing to write a new program according to some new paradigm. It’s a completely different problem to take tens of thousands of lines or more of existing code and make them work in a new multicore platform.

Virtualization was one approach that was frequently discussed. While more visible in its server farm guise, it has gained currency as important to managing applications running on multicore platforms. Without delving into nuances of hypervising and paravirtualization, virtualization for embedded is basically the addition of a layer of management software between the operating system and the processors that “virtualizes” the processors in the eyes of the operating system (and, by extension, the applications running on the operating system). One or more OSes can run over the virtualization layer, and the virtualization system allocates the processors to the OSes, even on a fractional basis. And, more or less, it operates like virtualization in the server farm. But for embedded applications, there are some special considerations.

One key requirement is the need for real-time operation. An RTOS may run alongside other non-real-time OSes. The virtualization layer is responsible for scheduling the OSes, and it must be able to schedule in a manner that meets the deadline and other requirements of the RTOS. Another is the ability to optimize performance; yeah, everyone cares about speed, but performance can be mission-critical in an embedded design. This means that the virtualization layer must be carefully constructed so as not to slow things down too much.

Another challenge of configuring an OS or virtualization in the embedded world is the fact that every platform more or less looks different. VirtualLogix, who sells an embedded virtualization solution, demonstrated a solution to this through a tool that boots Linux on the system (regardless of what OS actually runs during execution) and uses Linux’ ability to interrogate hardware to take an inventory of what’s in the system. This automates the construction of a hardware resource file, which can then be further tailored. They allow resources to be shared or dedicated; if dedicated, then the virtualization layer provides direct access to the hardware for higher performance. I/O can also be virtualized to allow processes to “pretend” to communicate with each other via Ethernet, a UART, or any of a number of standard I/O devices.

New programming models were often invoked as important both for writing new programs and for adapting old ones. There’s plenty of agreement that the existing languages don’t provide a good parallel programming paradigm, but solutions are in short supply, and more than one speaker referred to the need for universities not only to come up with new models, but also to start turning out trained engineers that can come into the industry using parallelism intuitively. Training us old dogs is apparently viewed as not being a promising way forward.

An alternative approach was broached by RapidMind, which provides a way of optimizing and parallelizing serial programs in real time. The idea is that C++ programs can be modified to replace standard data types with RapidMind data types and to replace key portions of code with what they call “programs,” small algorithms that more or less look like functions or objects, except that they operate on arrays. None of these changes takes the program outside the bounds of C++.

Using arrays provides data parallelism, even as the program is exploited for algorithmic parallelism. The goal is that once a program has been modified – and legacy code is a key target, even if only performance-critical portions are tweaked – then it can run on any multicore platform without recompiling, with the run-time layer providing the necessary adaptation and optimization. Parallelism is automatically exploited, without any requirement of explicit parallelization by the programmer; the program can be written as a single-threaded serial program – a characteristic of many legacy programs. It is possible to “freeze” the configuration to one platform, eliminating the run-time element (at the cost of no-compile portability), although no information regarding the resulting performance and/or footprint gain was available.

One other sign of a growing amount of infrastructure for multicore was the announcement of benchmark availability from EEMBC, the Embedded Microprocessor Benchmark Consortium. (You can stop looking for the second E in the full name; it’s not there, I already checked.) They have created a suite of benchmark “work items” that can be snapped together into a flow. Some of the pieces consist of a function that acts on data that is bound to it; others act as what they call “filters,” operating on whatever data is provided to them. There is a GUI with which you can assemble a series of these benchmarks into an overall test. You can select various systems and execution models for testing and then compare the results. The results can reveal surprising system weaknesses related to more subtle issues like over-subscription and cache coherency issues.

In an artifact of the Multicore Expo’s last couple years as a bit more of a boutiquey conference, Markus tends to run things himself, announcing all of the plenary speakers personally and engaging in banter with them at the podium. If the conference continues to grow at this pace, that personal touch may have to be eased up a bit next year as the limits of one person being everywhere at all times are reached. Unless, that is, he can create a multicore implementation of himself so that he can announce numerous different sessions in parallel.

Multicore Momentum

Related

Leave a Reply Cancel reply

featured chalk talk