Multicore Messaging Manifested

A few weeks ago we took a look at the new MCAPI standard that provides low-level, low-overhead message-passing capabilities for multicore and tightly-coupled multi-processor systems. But of course, standards are no good unless someone implements them, so here we take a look at the first commercially-available implementation of the MCAPI standard, built by Polycore. As MCAPI committee chair, Polycore’s Sven Brehmer has been well-positioned and motivated to bring to market a realization of the standards work.

The standard itself is simply an API, and it specifies no implementation details. An MCAPI service would consist of libraries and run-time services that implement the semantics of the API. This has been embodied in the latest release of Polycore’s Poly Messenger product. But since as an application programmer you are exposed only to the API, and, by design, you are thus insulated from the details of how it works, there’s frankly not much to say about that beyond what’s been said about MCAPI.

Except for one detail: exactly how message-passing occurs depends on the topology of the system. And in fact, there are actually two topologies: there’s a logical communication topology and a physical topology. Ultimately they have to be resolved together, but the distinction allows you to create a logical design that is somewhat independent of the underlying physical implementation, reducing the work required to port things around. In fact, this allows a full messaging system to be implemented on a single-core desktop development machine in advance of availability of the intended multicore platform.

So let’s break this into two steps: first, the specification of the logical topology, and then the realization of that topology onto a specific platform. The first step is accomplished by a separate tool called Poly Generator. This tool generates the data structures, constants, initialization routines, link drivers, etc. that Poly Messenger will need in actually carrying out the mandates of the API on the system. These are provided through automatically generated C header and code files (that is, .h and .c files) that can then be compiled into the overall code. The system is completely configured at compile time. In fact, even though the MCAPI standard allows for real-time route discovery when channels are initialized, Polycore’s implementation assumes a static topology and so determines the best route at compile time, simplifying and speeding up the run-time behavior of Poly Messenger.

Poly Generator and Poly Messenger pre-date MCAPI, so there are legacy concepts that map to MCAPI concepts. In the following description, I try to clarify those relationships. In some cases, the exact semantics may be slightly different due prior incarnations of the tools, but any such differences are resolved in favor of the MCAPI semantics when an MCAPI implementation is created to ensure full compliance.

The logical view

At the highest levels, the logical topology consists of a set of interconnected nodes. Nodes within Poly Generator correspond to nodes in MCAPI and are essentially loci of computation. They may send, receive, or simply pass along messages. Nodes may be interconnected by links of varying sorts. The topology may include subnets; if messages need to be transmitted between subnets, a “gateway” node is inferred and created by Poly Generator.

One distinction to make very clear here is that a node is not the same as a core. A node is part of the logical topology; a core is part of the physical topology. You map nodes to cores, but in fact, nodes can be “virtual”, and you can map several nodes to a single core. But more on that later. For now, you simply create the critical nodes for the system and let Poly Generator create any additional gateway nodes.

There is another similar concept, that of the “repository”; this roughly equates to the MCAPI concept of an endpoint. A repository is where messages are stored when received. It’s tempting to equate a node with a repository, but some nodes may only pass messages without actually being an endpoint, and therefore will not have repositories. Each repository is named; this symbolic name is then accessible when using the MCAPI APIs to establish message destinations.

Each node has a set of properties that can be defined. One global property that applies to all nodes is the size of the pre-allocated buffers used to store messages. This doesn’t mean that all messages have to be this size, but it does establish the maximum. Bigger buffers allow bigger payloads, but they also waste memory for those messages with smaller payloads. Deciding on the payload size itself involves a tradeoff. When lots of data has to be transferred, the complete data chunk is often broken into multiple messages. Because each message has a header, which is overhead, the more data you can assign to each message, the smaller the percentage of overhead incurred. However, messages can be prioritized. A high-priority message will not break into an ongoing lower-priority message transmission, but will be next in line to gain access to the receiving node. So the larger the payload, the longer a higher-priority message will have to wait to get through. Thus the largest payload that makes sense will vary by system and application and, once chosen, will set the buffer size.

It’s helpful, before looking at the node properties, to look at what happens when a message is sent and received using the Polycore setup. First an application program will assemble some data that needs to be transferred to another repository. The application is responsible for marshalling the data, and it will do so by acquiring memory from the heap and building the message. Once complete, the message can be sent (breaking up into multiple messages if necessary), and this is done (in the general case) by allocating one of a fixed number of pre-allocated buffers for the payload; the message payload is copied into the buffer, and then the header and a pointer to the buffer are placed in a send queue. You can specify multiple send queues, arranged by priority.

If a blocking send call is used, then the application waits until the message is sent before proceeding; if a non-blocking call is used, then the application continues on its merry way once the message is queued up.

Messages are then sent by Poly Messenger in priority order – that is, higher priority queues are emptied first, although as mentioned, high-priority messages won’t break into any lower-priority message already in progress. On the receive side, a buffer is allocated for the payload of an incoming message, and then the header and a pointer to the payload buffer are placed on an incoming queue.

Note that, if the memory containing the message is shared between two communicating nodes, it’s pretty wasteful to copy the message; it’s more efficient if you can send the message “by reference,” meaning that only the pointer to the message is communicated. This is also referred to as “zero-copy” operation, and while the MCAPI standard has no formal support for zero-copy communication yet, you can do it simply by passing the pointer as the payload. The “payload” is still copied, but because it’s just a small pointer, it can be copied much more quickly. Since there’s no API support, there are some logistics that you have to make sure the application itself takes care of, like ensuring that the sending side doesn’t trash the message before the receiving side is finished with it.

Given how this works, it means that you need to specify, for each node, the depth of its send queues, by priority; the depth of the receive queue (received messages are implicitly stored by priority); the number of data buffers for payloads; and the name associated with the repository. In addition, each node has a set of other nodes to which it’s connected by links, either directly or via a subnet. Each of those links is declared, along with the characteristics of the link.

This appears to be one area where the physical and logical topologies intersect to some extent. A link might be defined as a TCP/IP link with an IP address if the node to which it’s connected is on a different computer. Alternatively, two cores in different chips on the same board might be interconnected via Serial RapidIO. Or two cores in the same chip may be linked by some chip-specific mechanism. Shared memory is yet another means of linking. This link type is specified as part of the logical topology to ensure that the right drivers are provided, so changes to the physical topology may require logical changes if link types change.

All of this definition of the topology is done using XML. While there might be a lot of XML for a complex design, most of it is similar, so cut-and-paste dominates the editing work. In addition, there are files defining the details of the dispatcher and link drivers. Off-the-shelf drivers are currently available for TCP/IP, shared memory, and Windows pipes (the latter typically for demonstration). For other drivers, templates that you can customize are provided in C. Running Poly Generator using these files as input creates a pair of files (a .h file and a .c file), along with requisite dispatcher and link driver code, for each node.

Assigning to cores

Now comes the point at which the nodes can be assigned to cores. In the static configuration common to many embedded applications, programs are compiled into cores and stay there; there is no real-time scheduling of applications. A large application may be split up into multiple sub-programs, each of which will execute on a core. Once an application is partitioned into its constituent sub-programs, each of those sub-programs is compiled and linked for its core. This same process is used to assign nodes to cores: those files associated with any nodes destined for a particular core are included in the build for that core. That, along with the Poly Messenger libraries, allows resolution of the MCAPI calls embedded in the application.

Of course, a system like this can get far more complicated than might be suggested by this simplistic picture if, for example, you have a heterogeneous system involving multiple OSes in multiple boxes. This will bring other practical considerations into play, but, conceptually, the simple model still applies.

There are a couple possible use cases for generating a communication topology. In an ad hoc approach, an application writer would create a Poly Generator definition for his or her application and use it once for that application. Alternatively, a system engineer could create a messaging configuration that could then be made broadly available to applications writers simply to be compiled into their apps.

Given a realization of the MCAPI API, now comes the tough part – watching adoption. The shift to multicore has been slow due to numerous roadblocks, real and perceived. The transition has been accelerating, however, and the availability of a low-overhead messaging system removes one more barrier to adoption.

Multicore Messaging Manifested

Related

Leave a Reply Cancel reply

featured chalk talk