feature article
Subscribe Now

Tilera Gets Its Gonzo On

Go big or go home. That could be the company motto for Tilera, a Boston-based startup that makes the most gonzo processor you’ve probably ever seen.

Tilera’s new GX-100 chip contains one hundred – count ’em – identical microprocessors, each connected to the others though a massive terabit network. And these processors aren’t just wimpy little cores, either. They’re big 64-bit RISC machines, each with its own L1 and L2 caches, TLB, and 64-bit instruction set. Roughly speaking, each Tilera processor is about equivalent to a good PowerPC, MIPS, or Intel Xeon. In other words, a serious processor. In a big crowd of serious processors.

What would you do with this much horsepower in a single package? If you have to ask, this chip probably isn’t for you. Step aside, sonny, and let the real engineers do their work. But if your business card says Cisco, AT&T, Google, or Nokia, you probably have some good ideas about where this chip might fit. It’s aimed at high-end networking and telecommunications gear, products that have to massage a lot of data in record time.

Ushering in the Tile Era

As the company name suggests, Tilera’s big chip is made of “tiles,” or repeating arrays of processor, cache, and interconnect. The chip is homogeneous, in the sense that all 100 tiles are identical. It’s organized as a 10×10 array with each processor tile connecting directly to its neighbor to the north, south, east, and west. Processors along the edges of the chip connect to peripheral I/O (more on this later).

Each processor tile is happy to operate all on its own. In fact, they often do. After all, each one is a self-contained island with all the execution resources it needs, including two levels of local cache. If one processor needs to communicate with another, it takes only one clock cycle per “hop.” Naturally, you’d want to talk to your neighboring processors as much as possible, but even a worst-case connection needs only ten hops to get from one side of the chip to the other. That may sound like a lot, but even normal system buses on standard processor chips need 10 cycles for a bus transaction. Tilera’s chip can do dozens of these transactions all at once, from any processor to any processor.

The religiously regular nature of the GX-100 chip lends itself to task partitioning. If your task requires, say, six processors, you’ll probably want them next to each other in a 2×3 block. This shortens the communication paths among processors and avoids cutting out oddly shaped “holes” that might orphan some tiles in the 10×10 grid. There’s no requirement that cooperating processor tiles be contiguous; it’s just a good idea. With 100 tiles to work with, you can add or remove processing power in 1% increments.

Adjoining processor tiles can also ignore each other. In other words, the six tiles that handle your network protocols might have nothing to do with the adjacent ten tiles running the operating system. Or the neighboring four tiles handling the user interface. And so on. You can even duplicate tasks, so that two separate 20-tile blocks are both running Apache server software, with both being independent and unaware of each other.

How you actually partition your tasks is mostly up to you. Tilera doesn’t enforce any kind of rigor either in its hardware or its software. If you’re running a multiprocessing operating system (a few of which have been ported to Tilera), it will see the chip as 100 separate processor cores and will launch and kill threads as it sees fit. SMP Linux, for example, will move tasks around as they spawn and die, like a large-scale game of Life.

Under the Hood

If you’ve been following Tilera, you may be familiar with its tile-based approach to multiprocessing. But you’ll still be surprised at the processors themselves. The company has completely ditched the MIPS architecture it used in its previous chips (shipping since 2007) and designed its own 64-bit architecture from scratch. That means all new software tools, and it means existing Tilera code won’t run on the new chips.

Tilera doesn’t see this as much of a problem. For one, there aren’t that many existing Tilera customers around, so there isn’t much existing Tilera code to port. Second, what code there is was probably written in C, and Tilera’s new C compiler will handle the recompilation, no problem. It’s hard to imagine anyone hand-tweaking assembly code for such a beast, but, if you had, you’ll need to rewrite it for the new instruction set.

Paradoxically, Tilera touts the massive GX-100 chip as a “green” power-saving alternative to competing processors. Odd as it sounds, they may have a point. Even with 100 processors all running at 1.2 GHz or so, the chip dissipates about 55 watts. That’s not terrible, and a whole lot less than 100 (or even two) Intel Xeon chips would draw. The company claims the GX-100 is also more power-efficient than anything Cavium, Freescale, or RMI produces. At 55 watts, the GX-100 will need a good strong fan, but it won’t require exotic liquid cooling or its own power station.

So what do you hook this thing up to, apart from several stout power leads? Just about anything you want to, as long as it’s communications-related. The periphery of the GX-100 chip is peppered with every networking controller known to man, including XAUI (eight of them), Interlaken (two 10-lane interfaces), Gigabit Ethernet (32 of those), PCI Express (two 8-lanes and a 4-lane), DDR3 (four separate controllers), two independent crypto accelerators, plus the usual assortment of UARTs, USB, JTAG, I2C, SPI, and so on. What, no RS-232?

For the adventurous but less fiscally independent engineer, Tilera is also planning scaled-back versions of the GX-100 that have just 16, 36, or 64 cores. The number of I/O controllers diminishes with the reduction in tile count (mostly because there’s no room on the die or in the package), but otherwise these chips are identical to their big brother. They’re still awesome, just less awesome. 

For programmers, tackling a Tilera chip must be a daunting task. It’s an entirely new instruction set and processor architecture, combined with the vagaries of interprocessor communication, shared caches, and load balancing. It doesn’t have to be complex, but it could very rapidly become so. Dabbling with the 16-tile version may be the way to start. The Tilera chips are nothing if not scalable, so learning one means you’ve cracked them all. Or you could just try overclocking that 8051 for a few more years.

Leave a Reply

featured blogs
May 25, 2023
Register only once to get access to all Cadence on-demand webinars. Unstructured meshing can be automated for much of the mesh generation process, saving significant engineering time and cost. However, controlling numerical errors resulting from the discrete mesh requires ada...
May 24, 2023
Accelerate vision transformer models and convolutional neural networks for AI vision systems with the ARC NPX6 NPU IP, the best processor for edge AI devices. The post Designing Smarter Edge AI Devices with the Award-Winning Synopsys ARC NPX6 NPU IP appeared first on New Hor...
May 8, 2023
If you are planning on traveling to Turkey in the not-so-distant future, then I have a favor to ask....

featured video

Synopsys Solution for RTL to Signoff Power Analysis

Sponsored by Synopsys

Synopsys’ industry-leading power analysis solution built on PrimePower technology that enables early RTL exploration, low power implementation and power signoff for design of energy-efficient SoCs.

Learn more about Synopsys’ Energy-Efficient SoCs Solutions

featured contest

Join the AI Generated Open-Source Silicon Design Challenge

Sponsored by Efabless

Get your AI-generated design manufactured ($9,750 value)! Enter the E-fabless open-source silicon design challenge. Use generative AI to create Verilog from natural language prompts, then implement your design using the Efabless chipIgnite platform - including an SoC template (Caravel) providing rapid chip-level integration, and an open-source RTL-to-GDS digital design flow (OpenLane). The winner gets their design manufactured by eFabless. Hurry, though - deadline is June 2!

Click here to enter!

featured chalk talk

NXP GoldVIP: Integration Platform for Intelligent Connected Vehicles
Today’s intelligent connected vehicle designs are smarter and safer than ever before and this has a lot to do with a rapidly increasing technological convergence of sensors, machine learning, over the air updates, in-vehicle high bandwidth networking and more. In this episode of Chalk Talk, Amelia Dalton chats with Brian Carlson from NXP about NXP’s new GoldVIP Platform. They examine the benefits that this kind of software integration platform can bring to automotive designs and how you can take a test drive of the GoldVIP for yourself.
Nov 29, 2022
23,886 views