feature article
Subscribe Now

Your Basic $99 Supercomputer

Adapteva’s Epiphany-IV Chip Rocks Floating-Point Math

What do you get when you combine a floating-point processor with a mesh network? The Adapteva Epiphany-IV microprocessor, apparently. This Boston-based startup composed of four refugees from Analog Devices has developed a brand new high-performance processor that should help smartphones and other mobile devices get even smarter.

Epiphany-IV comes in chip form or as licensed IP. You can also get an evaluation board for $99 (described below), which the company proudly boasts is the world’s most power-efficient supercomputer. Big words from such a little company. But from tiny acorns do mighty oak trees grow, as a certain British competitor can attest.

The Epiphany architecture combines simple floating-point units connected by a point-to-point mesh network. The idea is that each CPU core works on a small part of a larger data set, exchanging data with its neighbors as it goes. Like a silicon hive mind, Epiphany relies on tiny amounts of work, multiplied.

As any fifth-grader can tell you, floating-point math (i.e., fractions) is a lot harder than basic integer math. Intel’s x86 family, for instance, is notable for its miserable floating-point performance. But Intel can’t change its Paleolithic FPU architecture without breaking software compatibility. Adapteva can.

Epiphany is so focused on floating-point operations that it can’t even do basic integer multiplication or division. (It does do integer ops; just not very complex ones.) Epiphany’s dual-issue CPU dispatches one integer instruction and one FP instruction every cycle. Since memory loads and stores are considered integer operations, the bulk of the integer half of the pipeline will likely be kept busy transferring data for the floating-point half.

The CPU doesn’t have any of the aggressive hardware we often see in microprocessors today. It doesn’t reorder instructions, for example, nor does it have any kind of branch prediction. In true RISC fashion, Epiphany relies on Adapteva’s compiler to figure all this stuff out. The hardware does whatever it’s told.

According to the EEMBC CoreMark benchmark, Epiphany’s integer performance is… lackluster. It’s not even half as fast as an ARM Cortex-A9, for example (1.3 CoreMarks/MHz versus 2.9 for the ARM core). But integer arithmetic isn’t really Epiphany’s forte. It’s intended for floating-point work, and on that it excels.

Each CPU can spit out one floating-point multiply-accumulate operation (MAC) every cycle, pretty much like a DSP. And there are either 16 or 64 of these CPU cores on each Epiphany chip. Taken altogether, Epiphany is a pretty speedy little FP coprocessor. Current samples of Epiphany-IV run at 600 to 700 MHz in 28nm silicon.

Programmers of a certain age may remember the Intel i860, a roughly similar device that briefly adorned the Intel product catalog in the late 1980s. Like Epiphany, the i860 had a “naked” RISC pipeline that was streamlined and efficient, but that also leaned heavily on its compiler to know what to do. This approach yields smaller and more power-efficient hardware than most CPUs today, but it also lays several traps for unwary programmers. Where power efficiency is the goal, stripped-down hardware is a fair tradeoff. Where ease of programming is important, it’s a quagmire. Given that Adapteva has no installed software base to protect, and its own in-house compiler to play with, it can manage the complexity. If Epiphany becomes more popular, third-party developers will have to port their code carefully.

As an example of Epiphany’s less-is-faster approach, the chip has no cache, only SRAM. Specifically, each of the 16 or 64 CPU cores has its own 32KB block of SRAM that it can access directly. That memory can also be accessed by the 15 or 63 other CPUs on the same chip, via the on-chip mesh network. But they have to ask for it. In other words, there’s no cache snooping among all those SRAM blocks, so no power is wasted keeping them all coherent. That also means no cache tags, no lookup logic, and so on. SRAM is simple, predictable, and easy to manufacture, whereas caches add complexity, size, and power. Caches might be easier for programmers to manage, and they effectively multiply the size of the memory, but they’re not as simple SRAMs.

The on-chip mesh network also sacrifices transparency on the altar of simplicity. While each CPU can connect to its immediate neighbor to the north, south, east, or west, it can’t transfer data any farther than that without multiple hops. Like a miniature version of Internet routers, each CPU in Epiphany must forward packets not intended for itself. Every hop adds a clock cycle, so there’s an incentive for programmers to design software with physical, as well as logical, layouts in mind. It’s not often that you see the physical structure of a program affecting its performance.

As we’ve said before on these pages, designing a microprocessor is the easy part. Getting software support for it is hard. The company has taken the unusual step of making its lack of software someone else’s problem. To help develop Epiphany code, Adapteva has launched a Kickstarter program to raise $750,000 for the company. A $99 “pledge” gets you an Epiphany evaluation board, while a check for $499 entitles the donor to a “limited edition” evaluation board with a low serial number. Wooo.

If floating-point is your thing, and small die size and minimal power consumption are your metrics for success, Adapteva’s Epiphany design may be just the ticket. It’s far smaller than an ARM processor in terms of transistor count and silicon area (proving, once again, that ARM is not the tiny CPU that many imagine it to be), but there are drawbacks to that austere minimalism. Other processor designers don’t add transistors gratuitously; all that extra silicon is there for a reason. And in most cases, it’s to make the CPU easier to program and less likely to crash. Most CPUs have caches, MMUs, bus interfaces, peripheral I/O, and other productive uses for their transistors. Epiphany has none of those features, and as long as its users know that going in, they’ll likely be able to brag they’ve got the smallest, fastest, cheapest, and most obscure supercomputer on the block. 

Leave a Reply

featured blogs
May 13, 2022
The Cadence Learning and Support Portal is useful to academia in many ways: Online Training, Rapid Adoption Kits (RAKs), Generic Process Design Kits (GPDKs), troubleshooting database, and so much... ...
May 12, 2022
Our PCIe 5.0 IP solutions, including digital controllers and PHYs, have passed PCI-SIG 5.0 compliance testing, becoming the first on the 5.0 integrators list. The post Synopsys IP Passes PCIe 5.0 Compliance and Makes Integrators List appeared first on From Silicon To Softwar...
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...
Apr 29, 2022
What do you do if someone starts waving furiously at you, seemingly delighted to see you, but you fear they are being overenthusiastic?...

featured video

Increasing Semiconductor Predictability in an Unpredictable World

Sponsored by Synopsys

SLM presents significant value-driven opportunities for assessing the reliability and resilience of silicon devices, from data gathered during design, manufacture, test, and in-field. Silicon data driven analytics provide new actionable insights to address the challenges posed to large scale silicon designs.

Learn More

featured paper

5 common Hall-effect sensor myths

Sponsored by Texas Instruments

Hall-effect sensors can be used in a variety of automotive and industrial systems. Higher system performance requirements created the need for improved accuracy and more integration – extending the use of Hall-effect sensors. Read this article to learn about common Hall-effect sensor misconceptions and see how these sensors can be used in real-world applications.

Click to read more

featured chalk talk

Small Form Factor Industry Standards for Embedded Computing

Sponsored by Mouser Electronics and Samtec

Trends in today’s embedded computing designs including smart sensors, autonomous vehicles, and edge computing are making embedded computing industry standards more important than ever before. In this episode of Chalk Talk, Amelia Dalton chats with Matthew Burns from Samtec about how standards organizations like PC104, PICMG and VITA s are encouraging innovation in today’s embedded designs, how Samtec supports each one of these standards organizations and how you can utilize Samtec’s high performance interconnects for your next small-form factor embedded computing designs.

Click here for more information about Samtec Industry Standards Solution