feature article
Subscribe Now

Your Basic $99 Supercomputer

Adapteva’s Epiphany-IV Chip Rocks Floating-Point Math

What do you get when you combine a floating-point processor with a mesh network? The Adapteva Epiphany-IV microprocessor, apparently. This Boston-based startup composed of four refugees from Analog Devices has developed a brand new high-performance processor that should help smartphones and other mobile devices get even smarter.

Epiphany-IV comes in chip form or as licensed IP. You can also get an evaluation board for $99 (described below), which the company proudly boasts is the world’s most power-efficient supercomputer. Big words from such a little company. But from tiny acorns do mighty oak trees grow, as a certain British competitor can attest.

The Epiphany architecture combines simple floating-point units connected by a point-to-point mesh network. The idea is that each CPU core works on a small part of a larger data set, exchanging data with its neighbors as it goes. Like a silicon hive mind, Epiphany relies on tiny amounts of work, multiplied.

As any fifth-grader can tell you, floating-point math (i.e., fractions) is a lot harder than basic integer math. Intel’s x86 family, for instance, is notable for its miserable floating-point performance. But Intel can’t change its Paleolithic FPU architecture without breaking software compatibility. Adapteva can.

Epiphany is so focused on floating-point operations that it can’t even do basic integer multiplication or division. (It does do integer ops; just not very complex ones.) Epiphany’s dual-issue CPU dispatches one integer instruction and one FP instruction every cycle. Since memory loads and stores are considered integer operations, the bulk of the integer half of the pipeline will likely be kept busy transferring data for the floating-point half.

The CPU doesn’t have any of the aggressive hardware we often see in microprocessors today. It doesn’t reorder instructions, for example, nor does it have any kind of branch prediction. In true RISC fashion, Epiphany relies on Adapteva’s compiler to figure all this stuff out. The hardware does whatever it’s told.

According to the EEMBC CoreMark benchmark, Epiphany’s integer performance is… lackluster. It’s not even half as fast as an ARM Cortex-A9, for example (1.3 CoreMarks/MHz versus 2.9 for the ARM core). But integer arithmetic isn’t really Epiphany’s forte. It’s intended for floating-point work, and on that it excels.

Each CPU can spit out one floating-point multiply-accumulate operation (MAC) every cycle, pretty much like a DSP. And there are either 16 or 64 of these CPU cores on each Epiphany chip. Taken altogether, Epiphany is a pretty speedy little FP coprocessor. Current samples of Epiphany-IV run at 600 to 700 MHz in 28nm silicon.

Programmers of a certain age may remember the Intel i860, a roughly similar device that briefly adorned the Intel product catalog in the late 1980s. Like Epiphany, the i860 had a “naked” RISC pipeline that was streamlined and efficient, but that also leaned heavily on its compiler to know what to do. This approach yields smaller and more power-efficient hardware than most CPUs today, but it also lays several traps for unwary programmers. Where power efficiency is the goal, stripped-down hardware is a fair tradeoff. Where ease of programming is important, it’s a quagmire. Given that Adapteva has no installed software base to protect, and its own in-house compiler to play with, it can manage the complexity. If Epiphany becomes more popular, third-party developers will have to port their code carefully.

As an example of Epiphany’s less-is-faster approach, the chip has no cache, only SRAM. Specifically, each of the 16 or 64 CPU cores has its own 32KB block of SRAM that it can access directly. That memory can also be accessed by the 15 or 63 other CPUs on the same chip, via the on-chip mesh network. But they have to ask for it. In other words, there’s no cache snooping among all those SRAM blocks, so no power is wasted keeping them all coherent. That also means no cache tags, no lookup logic, and so on. SRAM is simple, predictable, and easy to manufacture, whereas caches add complexity, size, and power. Caches might be easier for programmers to manage, and they effectively multiply the size of the memory, but they’re not as simple SRAMs.

The on-chip mesh network also sacrifices transparency on the altar of simplicity. While each CPU can connect to its immediate neighbor to the north, south, east, or west, it can’t transfer data any farther than that without multiple hops. Like a miniature version of Internet routers, each CPU in Epiphany must forward packets not intended for itself. Every hop adds a clock cycle, so there’s an incentive for programmers to design software with physical, as well as logical, layouts in mind. It’s not often that you see the physical structure of a program affecting its performance.

As we’ve said before on these pages, designing a microprocessor is the easy part. Getting software support for it is hard. The company has taken the unusual step of making its lack of software someone else’s problem. To help develop Epiphany code, Adapteva has launched a Kickstarter program to raise $750,000 for the company. A $99 “pledge” gets you an Epiphany evaluation board, while a check for $499 entitles the donor to a “limited edition” evaluation board with a low serial number. Wooo.

If floating-point is your thing, and small die size and minimal power consumption are your metrics for success, Adapteva’s Epiphany design may be just the ticket. It’s far smaller than an ARM processor in terms of transistor count and silicon area (proving, once again, that ARM is not the tiny CPU that many imagine it to be), but there are drawbacks to that austere minimalism. Other processor designers don’t add transistors gratuitously; all that extra silicon is there for a reason. And in most cases, it’s to make the CPU easier to program and less likely to crash. Most CPUs have caches, MMUs, bus interfaces, peripheral I/O, and other productive uses for their transistors. Epiphany has none of those features, and as long as its users know that going in, they’ll likely be able to brag they’ve got the smallest, fastest, cheapest, and most obscure supercomputer on the block. 

Leave a Reply

featured blogs
Sep 26, 2022
Most engineers are of the view that all mesh generators use an underlying geometry that is discrete in nature, but in fact, Fidelity Pointwise can import and mesh both analytic and discrete geometry. Analytic geometry defines curves and surfaces with mathematical functions. T...
Sep 22, 2022
On Monday 26 September 2022, Earth and Jupiter will be only 365 million miles apart, which is around half of their worst-case separation....
Sep 22, 2022
Learn how to design safe and stylish interior and exterior automotive lighting systems with a look at important lighting categories and lighting design tools. The post How to Design Safe, Appealing, Functional Automotive Lighting Systems appeared first on From Silicon To Sof...

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

HARTING's HAN® 1A Connector Series

Sponsored by Mouser Electronics and HARTING

There is a big push in the electronics industry today to make our designs smaller and more modular. One way we can help solve these design challenges is with the choice of connector we select for our designs. In this episode of Chalk Talk, Goda Inokaityte from HARTING and Amelia Dalton examine the role that miniaturized connectivity plays in the future of electronic design. They also how HARTING's Han 1A connectors can help reduce errors in installation, improve serviceability and increase modularity in your next design.

Click here for more information about HARTING Han® 1A Heavy Duty Power Connectors