Avoiding the Failure to Communicate

Bill left via the front yard, striding purposefully down the street, while the other two scurried out the back door to the alley. John went left, Nathan went right; he was holding what little jewelry they had managed to grab. Bill had no idea whether the others had gotten anything; he hoped that they would have something to show for their efforts, despite being interrupted early. He rounded the corner just as two squad cars screamed past. One stopped in front of the house, the other headed into the alley – right towards John. Nathan had almost reached the other end of the alley, but he freaked out, dumped the jewelry, and started to run. The cop grabbed John and radioed his partner about Nathan; Nathan ended up in the other car. John didn’t know that Nathan had dropped the jewelry, and Bill didn’t even know that John and Nathan had been caught until the cops came to get him at home. During all the questioning, they never saw each other. And none of them knew what the others knew, or what the others were saying. What had started as a team operation had disintegrated into three sorry-looking dudes with no idea what to do next.

As application programs grow beyond the scope of a single processor, they can be split into separate processes running on the same or different processors. Once this happens, the processes need some way to talk to each other so they can keep their stories straight. This goes by the pretty self-explanatory name of inter-process communication, or IPC. Sounds simple enough, but, in practice, it depends on the level of what is referred to as – assuming the FCC doesn’t shut us down for saying so – system coupling, either tight or loose.

Such a characterization is actually over-simplistic. Processes may co-exist in a single CPU. They may be moved from one CPU to another by the OS. If the processes fall outside the same CPU, then messages have to get from one CPU to the other. Within a single chip, multiple processor cores can talk to each other using channels built into the chip architecture. On a board, multiple chips can talk to each other by busses or point-to-point serial connections. Boards within a chassis can communicate via the backplane. Once you leave the realm of the box, now you have to connect machines by wires, and the local network can do that. Once you leave the building, you may have entered “the cloud”. No one really knows what happens in that cloud… maybe best not to know. It’s probably like watching sausage being made. Anyway, somewhere, as we moved from intra-processor connections to the cloud, we made a transition from tightly-coupled to loosely-coupled systems.

Clusters are mesh-connected computers, and, while they fit somewhere in the middle, they’re considered tightly coupled. They were the focus as Ericsson started what would become the Transparent Inter-Process Communication (TIPC) effort, which is now an open-source project. It entered the open-source world in 2004 and is still very much a going concern. The purpose of such a protocol is to allow processors to communicate in a way that keeps the details of the physical connection transparent. Node addresses are independent of network addresses like IP or MAC addresses, so that if the network is reconfigured, the process node addresses don’t have to change. Communication can be reliable or unreliable, connected or connectionless. Direct messaging is used, meaning that a message is sent straight to the receiver, rather than to some intermediate drop-off point outside the scope of the application that the receiver has to check. Queues are part of the process doing the communicating and are managed by the process, not the OS. This means that an operating system doesn’t need to coordinate processes with their message queues.

This effort has been primarily driven by the needs of communications systems. Complex processes like call setup, billing, and system maintenance happen across computers and, potentially, over long distances. These systems also require so-called high availability, meaning the ability to swap out cards without shutting the power down, and redundancy that allows one board to “fail-over” to another board if it goes south. And the reality is, it’s not limited to clustered computers, since such networks can span the globe.

Enea, a Swedish company with a proprietary embedded OS called OSE, took things a step further with LINX to try to drive greater transparency. LINX is derived from OSE’s internal IPC messaging protocol, and it differs from TIPC conceptually in that it scales further, from both a network and a processor standpoint. First of all, TIPC is generally designed around clusters, and they’re trying to work towards extending that reach, but so far it’s pretty much a one-hop game. LINX is independent of the physical network topology and can operate anywhere, from within a single CPU (where messages don’t have to be copied between processes, since they both access the same memory) to across the cloud.

Management is also different in that TIPC nodes must maintain a map of all other nodes in the system, and if one of those nodes changes, all nodes must be updated. This can become cumbersome as the size of the network grows. LINX requires only that a node know about nodes that it cares about. This simplifies the updating process and makes it more scalable.

LINX is able to operate in a small footprint. This is important in that it allows smaller CPUs to act as nodes. LINX is small enough to operate on DSPs (a goal that TIPC is also working towards). Enea also claims substantial performance advantages, particularly in throughput, where, for some of the TIPC benchmarks, the LINX throughput was as much as 90% higher. Latency is similar to TIPC for small packets, but LINX appears to have an advantage once the MTU of the link is exceeded, which they attribute to more efficient fragmentation.

One challenge in building a distributed system that intercommunicates is upgrading. If upgrading one node means upgrading all nodes, then the barrier to keeping up to date can be extremely high. Enea has now addressed this with the just-released 2.0 version of their LINX for Linux. This edition adds protocol and feature negotiation: as the connections come up, nodes declare their feature and protocol capabilities and agree on a common set for use during the life of the connection. This allows upgrades of portions of the network while still allowing all nodes to play nicely with each other.

There’s a soupçon of tension between whether LINX is a competitor or is complementary to TIPC – the latter is the official position. Right now, LINX will interoperate only with other LINX nodes, but it’s pretty clear that if both LINX and TIPC gain popularity, the ability for them to talk to each other will be necessary.

While LINX is intrinsic to their proprietary OSE OS, they made it available to the open source world in their 2006 announcement of LINX for Linux, and it’s their goal to make it into an open-source project in much the same way that TIPC is. They’ve tried to mitigate the obligations their customers will face from GPL licensing that comes with Linux, so they split their licensing model: the lower layers that attach to Linux are governed by GPL rules, the upper layers play by the much less-restrictive BSD rules.

Standards bodies have so far viewed this area as too immature for them to address. Enea’s Mike Christofferson says that they’ve approached a couple of standards bodies before but didn’t get any bites. He says they’re looking forward to participating in the standardization process, but they don’t realistically see it happening for some years.

Avoiding the Failure to Communicate

Related

Leave a Reply Cancel reply

featured chalk talk