Networks on a Chip

A while ago we took a broad look at the status of IP, with an eye to ease of use, reuse, and interconnectability. And, in fact, it’s a complicated matter. Moreso when you get to the issue of interconnecting various kinds of IP. There are two fundamental challenges as our SoCs get more and more mind-bogglingly complex. One is how you hook all this stuff up, and the other is how you hook on to the hook-up.

Here’s the basic problem. You’ve got a million things going on in the chip, much of it happening at the same time. (Well, except for the part where everyone is trying to get to the memory.) With standard busses, you get one owner at a time. So now you need more busses bridged together to have a chance of doing more than one thing at a time, and you might transition from there to crossbar switches, and you hook all your pieces to these interconnect chunks and get it all to talk nicely.

That’s fine as far as it goes. In fact, Sonics operates in this space, with a variety of interconnect schemes from simple to complex that can be used to build a complex structure. But there’s one more piece that, even at this level, needs to be addressed: plugging in the IP. There are many bus standards, and getting IP to interface with all of them correctly is a challenge. Having each piece of IP cross-tested with the other IP in the system is even harder.

Some effort has been made to address this by standardizing either the metadata format for IP or the actual socket interface for plug-and-play IP. These are the IP-XACT and OCP-IP standards. But IP-XACT, which specifies only how you “inform” tools about the IP, without actually standardizing the interface, is only just now starting to take hold. Whether OCP-IP, the actual plug-and-play socket standard, will get traction is another matter.

This is where an adapter approach can work: use translators or wrappers (variously called things like “agents” or “network interface units” – NIUs) for a wide variety of interfaces. It’s like plugging into power in a foreign country: first you plug in the adapter, and then you plug into the grid. By having the right set of adapters, the IP itself becomes “agnostic” to whatever it’s plugging into.

So all of this helps, but there is still more to deal with. These busses and crossbars use lots of signals. And one answer that has been proffered is to move to a network-on-chip (NoC) methodology. The NoC concept gets away from the central bus approach and borrows from the communication networks you are probably using to read this right now. A network is placed on the chip, with various endpoints. At each endpoint is an NIU that takes whatever the interface is on the particular piece of IP and adapts it to the network.

So far, it’s like the bus/crossbar solution above. In addition, however, the NIU packetizes (and unpacketizes) the data being sent from (and to) the IP. Now that we have data in packets, the data can be shipped around the chip on far fewer wires than a bus would require. Which is great, but, honestly, taking a big-picture view, it sounds a bit scary at first blush. And some of the academic proposals for NoCs were indeed apparently rather scary.

Is this really a good idea?

If you think about the cabinet full of equipment your company requires to manage its network, that really doesn’t sound like an appealing thing to be designing onto your chip. It’s great to make your IP easier to plug in; it’s great to get away with fewer wires. But if one end of your chip is repeatedly hitting “Send/Receive” until it finally gets the data it needs from the other end, delayed because of some other point-to-point connection hogging the bandwidth to stream some porn video, the value of a system like this seems rather diminished.

Sonics actually bills itself as offering NoC capabilities, but, at the same time, you can find arguments by them on the web against “true” NoCs for reasons like this, albeit steering rather clearer of the gutter. And, in response, this is where the “true” NoC vendors, Arteris and Silistix are very careful to distinguish their offerings from what’s been bandied about in academia. The difference, they say, is their focus on making sure that the implementations are efficient from both a performance and an area standpoint. Which means accommodating a wide variety of configurations and network architectures to handle a range of requirements.

This is because, even when done efficiently, there is a cost to all of this, largely in latency. It takes some time – perhaps only a cycle, but a cycle nonetheless – to take parallel data, serialize it some way, do some very simple encapsulation to create a packet, set up a path, and send it on its way. But where that price pays off in reduced die area without penalizing overall system performance, the NoC providers argue that it pays dividends and that the key is to get the architecture right for the specific chip.

Complex SoCs will have heterogenous combinations of interconnect involving point-to-point connections, star, mesh, and hierarchical combinations thereof. Which means that, for each chip, the network must be carefully planned out early in the design process. Says Arteris’s Charlie Janac, “the network is the first thing to be architected and the last to be implemented.”

Then there are the routers and switches needed to get packets to their destinations. Again, images of cumbersome store-and-forward systems come to mind. But, in fact, these are implemented as cut-through, meaning a node doesn’t have to wait until the entire packet has been received to start sending it on the next hop: as soon as the first data arrives, it can start on to the next node while the rest of the packet arrives.

Yet more time can be saved since the routers can even be configured as combinatorial (“wormhole routing”), making the connection almost seem more like a traditional POTS circuit-switched arrangement than a packet-switched one. This can speed signals from one place to another, and, given that packetization can happen in a single cycle, you should be able to attain pretty good performance. In practice, you can select the latency for various transactions as you design the network.

Silistix allows designers to take things one step further in those complicated designs where it makes sense: one of their unique technologies is the ability to create asynchronous paths – “clockless” or “self-timed” nets. This is obviously in contrast to the fully synchronous nature of busses and crossbars, but is also distinct from the “globally asynchronous locally synchronous” – GALS – nature of Arteris’ structure and Silistix’s other options, which are all clocked but cross clock domain boundaries.

These act something like the high-speed serial signals you see between chips, except they don’t use the expensive 8b/10b style of encoding and clock recovery; they have a different, simpler proprietary technique. Silistix is known for this particular capability, and it’s clear they’re at pains to reinforce the fact that this asynchronous approach is merely an option to be exercised when appropriate; it’s not how they do all their NoCs. In fact it’s not how they do most of their NoCs.

The connections can be point-to-point, eliminating handshakes and collisions. The protocol doesn’t have a retry method, which would be expensive, although optional quality-of-service (QoS) features could cause packet pre-emption. Rather than requiring an acknowledge mechanism for every packet just to handle this occasional situation, it is instead handled as an error. And solid error handling is important for robust operation.

As to how much area is chewed up by all of this, well, that depends. Arteris claims that for a simple chip, you might end up with a couple percent more area. But on a complex chip, you’ll actually save a few percent net [pun intended] because of the reduction in signals and simplified layout.

Exactly how all of this is implemented is proprietary stuff. Both Arteris and Silistix use a layered approach to the protocols, but they handle the details themselves. All three companies provide design tools to help build out the architecture and then create the RTL implementing the network. Ideally, it’s a straightforward process, but I get the sense that these guys are involved in many of the designs to ensure that they go smoothly.

Silistix claims to have a design flow advantage because they have actually come up with a language for describing the requirements of the network – which things have to hook up to which and with what latency, etc. From that, their tools synthesize a network. And they do so with access to specific library files for the targeted technology so that the resulting network is much more likely to meet the timing requirements of the design, dramatically tightening the closure window. Which is critical: since the architecture is done early and implemented late, finding out at the last minute that the estimates upon which you based your architectural decisions weren’t accurate, and now you need to redo the architecture, would, to put it mildly, suck.

What’s wrong with this picture?

So let’s step back a second and review. We can architect a network that will actually save area on a complicated chip and will still meet the performance needs of the chip. Instead of having to manually custom wrap each piece of IP we’re going to use to match whatever bus it’s going to connect to on a given chip, we can simply snap it into its agent/NIU. A host of other benefits are said to accrue: potentially lower power, easier verification, easier instrumentation and debug… So… why isn’t everyone using these?

The answers vary; this is clearly a question these guys have to deal with (apparently it’s a question the Directors occasionally ask). A non-NoC vendor would argue that complexity and latency kill the practicality of the concept. The NoC players attribute the slow start to the soft stuff – things like: people got a bad taste from the over-the-top academic proposals… these are small companies attempting to influence the absolute core of the chip, and risk is perceived to be high… the pain isn’t high enough to overcome the risk, or at least the inertia of how things are done today… even that the simplicity of this puts an architect’s job at risk, meaning that he or she isn’t likely to select this option.

Whatever the reasons, traction could be better. All companies claim solid successes, so there are takers. Will they be able to transform the way the world designs chips? The jury is still out on that.