Go big or go home. That could be the company motto for Tilera, a Boston-based startup that makes the most gonzo processor you’ve probably ever seen.
Tilera’s new GX-100 chip contains one hundred – count ’em – identical microprocessors, each connected to the others though a massive terabit network. And these processors aren’t just wimpy little cores, either. They’re big 64-bit RISC machines, each with its own L1 and L2 caches, TLB, and 64-bit instruction set. Roughly speaking, each Tilera processor is about equivalent to a good PowerPC, MIPS, or Intel Xeon. In other words, a serious processor. In a big crowd of serious processors.
What would you do with this much horsepower in a single package? If you have to ask, this chip probably isn’t for you. Step aside, sonny, and let the real engineers do their work. But if your business card says Cisco, AT&T, Google, or Nokia, you probably have some good ideas about where this chip might fit. It’s aimed at high-end networking and telecommunications gear, products that have to massage a lot of data in record time.
Ushering in the Tile Era
As the company name suggests, Tilera’s big chip is made of “tiles,” or repeating arrays of processor, cache, and interconnect. The chip is homogeneous, in the sense that all 100 tiles are identical. It’s organized as a 10×10 array with each processor tile connecting directly to its neighbor to the north, south, east, and west. Processors along the edges of the chip connect to peripheral I/O (more on this later).
Each processor tile is happy to operate all on its own. In fact, they often do. After all, each one is a self-contained island with all the execution resources it needs, including two levels of local cache. If one processor needs to communicate with another, it takes only one clock cycle per “hop.” Naturally, you’d want to talk to your neighboring processors as much as possible, but even a worst-case connection needs only ten hops to get from one side of the chip to the other. That may sound like a lot, but even normal system buses on standard processor chips need 10 cycles for a bus transaction. Tilera’s chip can do dozens of these transactions all at once, from any processor to any processor.
The religiously regular nature of the GX-100 chip lends itself to task partitioning. If your task requires, say, six processors, you’ll probably want them next to each other in a 2×3 block. This shortens the communication paths among processors and avoids cutting out oddly shaped “holes” that might orphan some tiles in the 10×10 grid. There’s no requirement that cooperating processor tiles be contiguous; it’s just a good idea. With 100 tiles to work with, you can add or remove processing power in 1% increments.
Adjoining processor tiles can also ignore each other. In other words, the six tiles that handle your network protocols might have nothing to do with the adjacent ten tiles running the operating system. Or the neighboring four tiles handling the user interface. And so on. You can even duplicate tasks, so that two separate 20-tile blocks are both running Apache server software, with both being independent and unaware of each other.
How you actually partition your tasks is mostly up to you. Tilera doesn’t enforce any kind of rigor either in its hardware or its software. If you’re running a multiprocessing operating system (a few of which have been ported to Tilera), it will see the chip as 100 separate processor cores and will launch and kill threads as it sees fit. SMP Linux, for example, will move tasks around as they spawn and die, like a large-scale game of Life.
Under the Hood
If you’ve been following Tilera, you may be familiar with its tile-based approach to multiprocessing. But you’ll still be surprised at the processors themselves. The company has completely ditched the MIPS architecture it used in its previous chips (shipping since 2007) and designed its own 64-bit architecture from scratch. That means all new software tools, and it means existing Tilera code won’t run on the new chips.
Tilera doesn’t see this as much of a problem. For one, there aren’t that many existing Tilera customers around, so there isn’t much existing Tilera code to port. Second, what code there is was probably written in C, and Tilera’s new C compiler will handle the recompilation, no problem. It’s hard to imagine anyone hand-tweaking assembly code for such a beast, but, if you had, you’ll need to rewrite it for the new instruction set.
Paradoxically, Tilera touts the massive GX-100 chip as a “green” power-saving alternative to competing processors. Odd as it sounds, they may have a point. Even with 100 processors all running at 1.2 GHz or so, the chip dissipates about 55 watts. That’s not terrible, and a whole lot less than 100 (or even two) Intel Xeon chips would draw. The company claims the GX-100 is also more power-efficient than anything Cavium, Freescale, or RMI produces. At 55 watts, the GX-100 will need a good strong fan, but it won’t require exotic liquid cooling or its own power station.
So what do you hook this thing up to, apart from several stout power leads? Just about anything you want to, as long as it’s communications-related. The periphery of the GX-100 chip is peppered with every networking controller known to man, including XAUI (eight of them), Interlaken (two 10-lane interfaces), Gigabit Ethernet (32 of those), PCI Express (two 8-lanes and a 4-lane), DDR3 (four separate controllers), two independent crypto accelerators, plus the usual assortment of UARTs, USB, JTAG, I2C, SPI, and so on. What, no RS-232?
For the adventurous but less fiscally independent engineer, Tilera is also planning scaled-back versions of the GX-100 that have just 16, 36, or 64 cores. The number of I/O controllers diminishes with the reduction in tile count (mostly because there’s no room on the die or in the package), but otherwise these chips are identical to their big brother. They’re still awesome, just less awesome.
For programmers, tackling a Tilera chip must be a daunting task. It’s an entirely new instruction set and processor architecture, combined with the vagaries of interprocessor communication, shared caches, and load balancing. It doesn’t have to be complex, but it could very rapidly become so. Dabbling with the 16-tile version may be the way to start. The Tilera chips are nothing if not scalable, so learning one means you’ve cracked them all. Or you could just try overclocking that 8051 for a few more years.