feature article
Subscribe Now

Your Basic $99 Supercomputer

Adapteva’s Epiphany-IV Chip Rocks Floating-Point Math

What do you get when you combine a floating-point processor with a mesh network? The Adapteva Epiphany-IV microprocessor, apparently. This Boston-based startup composed of four refugees from Analog Devices has developed a brand new high-performance processor that should help smartphones and other mobile devices get even smarter.

Epiphany-IV comes in chip form or as licensed IP. You can also get an evaluation board for $99 (described below), which the company proudly boasts is the world’s most power-efficient supercomputer. Big words from such a little company. But from tiny acorns do mighty oak trees grow, as a certain British competitor can attest.

The Epiphany architecture combines simple floating-point units connected by a point-to-point mesh network. The idea is that each CPU core works on a small part of a larger data set, exchanging data with its neighbors as it goes. Like a silicon hive mind, Epiphany relies on tiny amounts of work, multiplied.

As any fifth-grader can tell you, floating-point math (i.e., fractions) is a lot harder than basic integer math. Intel’s x86 family, for instance, is notable for its miserable floating-point performance. But Intel can’t change its Paleolithic FPU architecture without breaking software compatibility. Adapteva can.

Epiphany is so focused on floating-point operations that it can’t even do basic integer multiplication or division. (It does do integer ops; just not very complex ones.) Epiphany’s dual-issue CPU dispatches one integer instruction and one FP instruction every cycle. Since memory loads and stores are considered integer operations, the bulk of the integer half of the pipeline will likely be kept busy transferring data for the floating-point half.

The CPU doesn’t have any of the aggressive hardware we often see in microprocessors today. It doesn’t reorder instructions, for example, nor does it have any kind of branch prediction. In true RISC fashion, Epiphany relies on Adapteva’s compiler to figure all this stuff out. The hardware does whatever it’s told.

According to the EEMBC CoreMark benchmark, Epiphany’s integer performance is… lackluster. It’s not even half as fast as an ARM Cortex-A9, for example (1.3 CoreMarks/MHz versus 2.9 for the ARM core). But integer arithmetic isn’t really Epiphany’s forte. It’s intended for floating-point work, and on that it excels.

Each CPU can spit out one floating-point multiply-accumulate operation (MAC) every cycle, pretty much like a DSP. And there are either 16 or 64 of these CPU cores on each Epiphany chip. Taken altogether, Epiphany is a pretty speedy little FP coprocessor. Current samples of Epiphany-IV run at 600 to 700 MHz in 28nm silicon.

Programmers of a certain age may remember the Intel i860, a roughly similar device that briefly adorned the Intel product catalog in the late 1980s. Like Epiphany, the i860 had a “naked” RISC pipeline that was streamlined and efficient, but that also leaned heavily on its compiler to know what to do. This approach yields smaller and more power-efficient hardware than most CPUs today, but it also lays several traps for unwary programmers. Where power efficiency is the goal, stripped-down hardware is a fair tradeoff. Where ease of programming is important, it’s a quagmire. Given that Adapteva has no installed software base to protect, and its own in-house compiler to play with, it can manage the complexity. If Epiphany becomes more popular, third-party developers will have to port their code carefully.

As an example of Epiphany’s less-is-faster approach, the chip has no cache, only SRAM. Specifically, each of the 16 or 64 CPU cores has its own 32KB block of SRAM that it can access directly. That memory can also be accessed by the 15 or 63 other CPUs on the same chip, via the on-chip mesh network. But they have to ask for it. In other words, there’s no cache snooping among all those SRAM blocks, so no power is wasted keeping them all coherent. That also means no cache tags, no lookup logic, and so on. SRAM is simple, predictable, and easy to manufacture, whereas caches add complexity, size, and power. Caches might be easier for programmers to manage, and they effectively multiply the size of the memory, but they’re not as simple SRAMs.

The on-chip mesh network also sacrifices transparency on the altar of simplicity. While each CPU can connect to its immediate neighbor to the north, south, east, or west, it can’t transfer data any farther than that without multiple hops. Like a miniature version of Internet routers, each CPU in Epiphany must forward packets not intended for itself. Every hop adds a clock cycle, so there’s an incentive for programmers to design software with physical, as well as logical, layouts in mind. It’s not often that you see the physical structure of a program affecting its performance.

As we’ve said before on these pages, designing a microprocessor is the easy part. Getting software support for it is hard. The company has taken the unusual step of making its lack of software someone else’s problem. To help develop Epiphany code, Adapteva has launched a Kickstarter program to raise $750,000 for the company. A $99 “pledge” gets you an Epiphany evaluation board, while a check for $499 entitles the donor to a “limited edition” evaluation board with a low serial number. Wooo.

If floating-point is your thing, and small die size and minimal power consumption are your metrics for success, Adapteva’s Epiphany design may be just the ticket. It’s far smaller than an ARM processor in terms of transistor count and silicon area (proving, once again, that ARM is not the tiny CPU that many imagine it to be), but there are drawbacks to that austere minimalism. Other processor designers don’t add transistors gratuitously; all that extra silicon is there for a reason. And in most cases, it’s to make the CPU easier to program and less likely to crash. Most CPUs have caches, MMUs, bus interfaces, peripheral I/O, and other productive uses for their transistors. Epiphany has none of those features, and as long as its users know that going in, they’ll likely be able to brag they’ve got the smallest, fastest, cheapest, and most obscure supercomputer on the block. 

Leave a Reply

featured blogs
Jul 17, 2018
In the first installment, I wrote about why I had to visit Japan in 1983, and the semiconductor stuff I did there. Today, it's all the other stuff. Japanese Food When I went on this first trip to Japan, Japanese food was not common in the US (and had been non-existent in...
Jul 16, 2018
Each instance of an Achronix Speedcore eFPGA in your ASIC or SoC design must be configured after the system powers up because Speedcore eFPGAs employ nonvolatile SRAM technology to store the eFPGA'€™s configuration bits. Each Speedcore instance contains its own FPGA configu...
Jul 12, 2018
A single failure of a machine due to heat can bring down an entire assembly line to halt. At the printed circuit board level, we designers need to provide the most robust solutions to keep the wheels...