Exposing your Multicores

I am writing while a carpenter is putting up some shelves. When he arrived, in addition to bringing in a huge pile of fresh timber, he also brought in an array of tools. I suspect he is a tool junky, but he does have, for example, a wonderful array of screwdrivers of different sizes, shapes and lengths. He chooses the right screw for the task and then matches the screwdriver to the screw.

We now have the opportunity to do the same thing with processing cores. Instead of forcing a standard architecture to carry out a range of functions defined through software, we can now pick and choose the processor architectures that are close to optimal for a specific function and then develop the software to carry it out. This is not new in embedded systems. We have had a choice of processors for many years, including RISC and VLIW architectures, graphics processors and mainstream controllers, and we have quietly got on with building systems that use several of these at the same time. Even in the PC environment, a PC will have a handful of processors on the motherboard and on add-in cards.

But then Intel discovered that to continue chasing process improvements down the Moore’s Law curve would have other, less happy, consequences, particularly in energy consumption and heat dissipation. By putting two (or more) processor cores on a chip, they hoped to get increased overall system performance without burning large amounts of power. At the same time, marketing groups across the industry swung into action to boost multiple cores, to say how difficult they would be to use, and how a particular company had the golden key.

A more positive effort was the founding of the Multicore Association and with it the Multicore Expo. From the start, the Multicore Association has attracted support from a broad range of companies for its work on defining standards for multicore system development. For the Association, multicore systems can be homogenous or heterogeneous and can be on a single chip, on a single board, or even on multiple boards. They have identified key areas where standards are needed and then worked to produce them.

The first area attacked was the development of a Multicore Communications API (MCAPI), defining both an Application Programming Interface and the semantics of communication. The physical issues of communication — what sort of wires and which ones to wiggle — are not addressed. This is now in its second version.

In its first version is a Resource Management API (MRAPI). This is concerned with how an application or applications share physical resources. Work is also underway on Programming Practices, Tools, Infrastructure, and Virtualization.

The Association is the sponsor of the Multicore Expo, a conference and exhibition co-located with ESC Silicon Valley, this year in the first week in May. A three-day event, it will have over fifty conference sessions on a wide range of multicore-related topics, as well as an exhibition. If you have just a cursory interest at the moment, there are also free sessions in the Multicore Expo Theatre. But the real meat of the Expo is the Wednesday afternoon executive session and the parallel threads of technical papers on Tuesday, Wednesday morning and Thursday. (Well, you would expect parallel threads at a multicore event, wouldn’t you?)

The technical papers cover the entire waterfront of the issues involved in specifying, implementing, and debugging multicore systems. I am going to cherry-pick some papers from the different threads that interest me: that isn’t to say that all the papers don’t have some points of interest, but we all have different tastes. One entire thread, which I may even sit through at the event, is on parallel techniques. A significant and real problem is how to migrate applications from a single core to multiple cores. Even when the application was originally written to run as threads, moving those threads from one core to more than one is not a simple task. Several papers in this conference thread look at migration through successful case studies. Freescale’s Rob Oshana is going to describe two migration projects — one for a video application, the other for networking — and promises to pass on the lessons that were learned in moving them to parallel applications. This will be followed by Alexander Mintz of Zircon Computing and Chris Fournier of AMD looking at techniques that can migrate single-threaded legacy code to multiple threads. They are claiming that the techniques allow scaling to multiple cores and that it is possible to gain near-linear performance gains as the number of threads available to the code increases. Critical Blue has been developing tools that examine analysing legacy C code to assess how well it can be parallelized. David Stewart, Critical Blue’s CEO, is teaming up with Frank Furth of TI to discuss the options for migrating sequential digital signal processing code to a multicore DSP.

In a Tuesday thread, Paul Stravers of Vector Fabrics will be addressing the same issues. Vector has developed tools for parallelizing legacy code, and Paul will use the complex code of the Google vp8 codec, some 85,000 lines of C, to demonstrate how the tools can analyse and partition code and then target it to a range of different platforms, including homogeneous multiprocessors, both with and without an additional Graphics Processing Unit (GPU), and to a range of heterogeneous multicore architectures. Another paper comes from Samsung India, and Javed Absar and Deepak Shekhar will demonstrate how they have overcome migration problems for the real-time applications of automatic speech recognition and face detection.

Since one of the obstacles to the wider take-up of multicore processing has been concerns about legacy code, it is heartening that we can now see real progress in migration. Even if video, face detection, speech recognition, and signal processing are among applications that are more responsive to parallelization than some others, they are responding to the use of tools rather than requiring significant hours of human analysis and partitioning.

Another major issue with multicore projects is debugging. Generally this topic has lagged behind the implementation. While it can be argued that the difficulty of implementing parallel systems has been talked up by interested parties, there is no denying that debugging multicore systems is very hard. Debugging even a straightforward sequential system can present significant challenges, and, depending on the nature of the multicore implementation, the difficulties of debugging can easily increase exponentially with the number of cores. With a single core, there are now tools that allow you to capture the state of the system when an error occurs and to roll back the system to discover what the error trigger may have been. With multiple cores, the task of discovering the state of all the cores when an error occurs in any one of them is not a trivial exercise: recreating that same state ranges from being complex to being impossible.

We are seeing the emergence of new techniques to solve this problem, but many of these are still not heavily used in the field. Simon Davidmann’s Imperas has been working on the use of software virtual prototypes for some years now. Within the Open Virtual Platforms (OVP) initiative, there are already software models of a wide number of processors that are instruction-accurate and will run operating systems and applications. Simon will demonstrate what he is calling 3-D debug as a route to developing heterogeneous multicore systems. Another new approach comes from Roni Simonian of Ariadne, who will be introducing Maze, described as ‘a randomized yet controlled deterministic environment’ for finding and reproducing bugs in concurrent programs.

The Multicore Association is addressing debugging, and Aaron Spear of VMware will be discussing the ideas behind the work on the Common Trace Format (CTF), a standard that will help in analyzing trace data from a range of different sources. Trace data will also figure in the presentations by Brian Finkel, of Wind River, and Vikas Varshney, of TI. They will both be presenting work that their companies have carried out in developing tools for debugging the end hardware.

Other threads in the conference will cover software design and real-time and multicore frameworks. There will also be some overview sessions and panel discussions, including some chaired by Jim Turley, editor of this parish, on high-level aspects of multicore developments.

We don’t usually bang on about forthcoming events, but I feel that multicore is at that awkward stage, the equivalent of a teenager with acne, large feet and a voracious appetite, yet who shows occasional flashes of turning into a civilised being. It may be that this year’s Multicore Expo is not yet the coming-of-age party for multicore, but it is certainly a significant birthday. Within five years, the more mature of us will be looking back and trying to remember what all the fuss was about: multicore will have become a standard approach to developing systems. This year is your chance to get on top of the subject, without suffering all the problems of being a pioneer. See you in San Jose?

Multicore Expo