feature article
Subscribe Now

Tilera Gets Its Gonzo On

Go big or go home. That could be the company motto for Tilera, a Boston-based startup that makes the most gonzo processor you’ve probably ever seen.

Tilera’s new GX-100 chip contains one hundred – count ’em – identical microprocessors, each connected to the others though a massive terabit network. And these processors aren’t just wimpy little cores, either. They’re big 64-bit RISC machines, each with its own L1 and L2 caches, TLB, and 64-bit instruction set. Roughly speaking, each Tilera processor is about equivalent to a good PowerPC, MIPS, or Intel Xeon. In other words, a serious processor. In a big crowd of serious processors.

What would you do with this much horsepower in a single package? If you have to ask, this chip probably isn’t for you. Step aside, sonny, and let the real engineers do their work. But if your business card says Cisco, AT&T, Google, or Nokia, you probably have some good ideas about where this chip might fit. It’s aimed at high-end networking and telecommunications gear, products that have to massage a lot of data in record time.

Ushering in the Tile Era

As the company name suggests, Tilera’s big chip is made of “tiles,” or repeating arrays of processor, cache, and interconnect. The chip is homogeneous, in the sense that all 100 tiles are identical. It’s organized as a 10×10 array with each processor tile connecting directly to its neighbor to the north, south, east, and west. Processors along the edges of the chip connect to peripheral I/O (more on this later).

Each processor tile is happy to operate all on its own. In fact, they often do. After all, each one is a self-contained island with all the execution resources it needs, including two levels of local cache. If one processor needs to communicate with another, it takes only one clock cycle per “hop.” Naturally, you’d want to talk to your neighboring processors as much as possible, but even a worst-case connection needs only ten hops to get from one side of the chip to the other. That may sound like a lot, but even normal system buses on standard processor chips need 10 cycles for a bus transaction. Tilera’s chip can do dozens of these transactions all at once, from any processor to any processor.

The religiously regular nature of the GX-100 chip lends itself to task partitioning. If your task requires, say, six processors, you’ll probably want them next to each other in a 2×3 block. This shortens the communication paths among processors and avoids cutting out oddly shaped “holes” that might orphan some tiles in the 10×10 grid. There’s no requirement that cooperating processor tiles be contiguous; it’s just a good idea. With 100 tiles to work with, you can add or remove processing power in 1% increments.

Adjoining processor tiles can also ignore each other. In other words, the six tiles that handle your network protocols might have nothing to do with the adjacent ten tiles running the operating system. Or the neighboring four tiles handling the user interface. And so on. You can even duplicate tasks, so that two separate 20-tile blocks are both running Apache server software, with both being independent and unaware of each other.

How you actually partition your tasks is mostly up to you. Tilera doesn’t enforce any kind of rigor either in its hardware or its software. If you’re running a multiprocessing operating system (a few of which have been ported to Tilera), it will see the chip as 100 separate processor cores and will launch and kill threads as it sees fit. SMP Linux, for example, will move tasks around as they spawn and die, like a large-scale game of Life.

Under the Hood

If you’ve been following Tilera, you may be familiar with its tile-based approach to multiprocessing. But you’ll still be surprised at the processors themselves. The company has completely ditched the MIPS architecture it used in its previous chips (shipping since 2007) and designed its own 64-bit architecture from scratch. That means all new software tools, and it means existing Tilera code won’t run on the new chips.

Tilera doesn’t see this as much of a problem. For one, there aren’t that many existing Tilera customers around, so there isn’t much existing Tilera code to port. Second, what code there is was probably written in C, and Tilera’s new C compiler will handle the recompilation, no problem. It’s hard to imagine anyone hand-tweaking assembly code for such a beast, but, if you had, you’ll need to rewrite it for the new instruction set.

Paradoxically, Tilera touts the massive GX-100 chip as a “green” power-saving alternative to competing processors. Odd as it sounds, they may have a point. Even with 100 processors all running at 1.2 GHz or so, the chip dissipates about 55 watts. That’s not terrible, and a whole lot less than 100 (or even two) Intel Xeon chips would draw. The company claims the GX-100 is also more power-efficient than anything Cavium, Freescale, or RMI produces. At 55 watts, the GX-100 will need a good strong fan, but it won’t require exotic liquid cooling or its own power station.

So what do you hook this thing up to, apart from several stout power leads? Just about anything you want to, as long as it’s communications-related. The periphery of the GX-100 chip is peppered with every networking controller known to man, including XAUI (eight of them), Interlaken (two 10-lane interfaces), Gigabit Ethernet (32 of those), PCI Express (two 8-lanes and a 4-lane), DDR3 (four separate controllers), two independent crypto accelerators, plus the usual assortment of UARTs, USB, JTAG, I2C, SPI, and so on. What, no RS-232?

For the adventurous but less fiscally independent engineer, Tilera is also planning scaled-back versions of the GX-100 that have just 16, 36, or 64 cores. The number of I/O controllers diminishes with the reduction in tile count (mostly because there’s no room on the die or in the package), but otherwise these chips are identical to their big brother. They’re still awesome, just less awesome. 

For programmers, tackling a Tilera chip must be a daunting task. It’s an entirely new instruction set and processor architecture, combined with the vagaries of interprocessor communication, shared caches, and load balancing. It doesn’t have to be complex, but it could very rapidly become so. Dabbling with the 16-tile version may be the way to start. The Tilera chips are nothing if not scalable, so learning one means you’ve cracked them all. Or you could just try overclocking that 8051 for a few more years.

Leave a Reply

featured blogs
Sep 21, 2023
Wireless communication in workplace wearables protects and boosts the occupational safety and productivity of industrial workers and front-line teams....
Sep 26, 2023
5G coverage from space has the potential to make connectivity to the Internet truly ubiquitous for a broad range of use cases....
Sep 26, 2023
Explore the LPDDR5X specification and learn how to leverage speed and efficiency improvements over LPDDR5 for ADAS, smartphones, AI accelerators, and beyond.The post How LPDDR5X Delivers the Speed Your Designs Need appeared first on Chip Design....
Sep 26, 2023
The eighth edition of the Women in CFD series features Mary Alarcon Herrera , a product engineer for the Cadence Computational Fluid Dynamics (CFD) team. Mary's unwavering passion and dedication toward a career in CFD has been instrumental in her success and has led her ...
Sep 21, 2023
Not knowing all the stuff I don't know didn't come easy. I've had to read a lot of books to get where I am....

Featured Video

Chiplet Architecture Accelerates Delivery of Industry-Leading Intel® FPGA Features and Capabilities

Sponsored by Intel

With each generation, packing millions of transistors onto shrinking dies gets more challenging. But we are continuing to change the game with advanced, targeted FPGAs for your needs. In this video, you’ll discover how Intel®’s chiplet-based approach to FPGAs delivers the latest capabilities faster than ever. Find out how we deliver on the promise of Moore’s law and push the boundaries with future innovations such as pathfinding options for chip-to-chip optical communication, exploring new ways to deliver better AI, and adopting UCIe standards in our next-generation FPGAs.

To learn more about chiplet architecture in Intel FPGA devices visit https://intel.ly/45B65Ij

featured paper

Accelerating Monte Carlo Simulations for Faster Statistical Variation Analysis, Debugging, and Signoff of Circuit Functionality

Sponsored by Cadence Design Systems

Predicting the probability of failed ICs has become difficult with aggressive process scaling and large-volume manufacturing. Learn how key EDA simulator technologies and methodologies enable fast (minimum number of simulations) and accurate high-sigma analysis.

Click to read more

featured chalk talk

E-Mobility - Charging Stations & Wallboxes AC or DC Charging?
In this episode of Chalk Talk, Amelia Dalton and Andreas Nadler from WĂĽrth Elektronik investigate e-mobility charging stations and wallboxes. We take a closer look at the benefits, components, and functions of AC and DC wallboxes and charging stations. They also examine the role that DC link capacitors play in power conversion and how WĂĽrth Elektronik can help you create your next AC and DC wallbox or charging station design.
Jul 12, 2023
9,530 views