feature article
Subscribe Now

Achronix Accelerates eFPGA

7nm Speedcore Gen4 Brings Big Improvements

Perhaps when the most important problem is a nail, every solution starts to look like a hammer. With the ramping explosion in AI and machine learning, countless companies are trying to climb on the bandwagon, morphing and melding their existing technologies in an attempt to come up with a differentiated solution that will capture a meaningful share of this mind-boggling emerging opportunity. Everybody from EDA vendors to cloud data centers to GPU companies, FPGA companies, IP companies, and boutique semiconductor startups are spinning stories about how their technology is the key to unlocking the potential of AI.

Most of them will fail.

Achronix, however, looks like a pretty strong contender. This week, the company unveiled their fourth-generation Speedcore eFPGA technology, targeting 7nm CMOS. While this new IP continues the mission of allowing FPGA fabric to be part of any SoC/ASIC design, the latest version has numerous features aimed specifically at accelerating machine learning inferencing. For many with the expertise and resources to do custom chip design, Achronix has a compelling alternative to multi-chip solutions with discrete FPGAs or other custom AI accelerators.

The Speedcore Gen4 embedded FPGA (eFPGA) IP for integration into customers’ SoCs increases performance by 60% over previous generations, reduces power by 50%, and decreases die area by 65%. These are significantly better PPA jumps than one would get simply by moving to the next process node. There are major architectural improvements at the heart of Achronix’s gains. Achronix says they are focusing on bringing “programmable hardware-acceleration capabilities to a broad range of compute, networking, and storage systems for interface protocol bridging/switching, algorithmic acceleration, and packet processing applications.”

Speedcore essentially lets you build an FPGA to your exact specifications. Since modern FPGAs contain LUT fabric, multipliers/DSP blocks, and embedded memories, at the least, and often also include processor cores and other hard blocks, stand-alone FPGAs are always built as a “guess” by the FPGA company about the relative amounts of each of these resources needed for given broad classes of applications. When you select a stand-alone FPGA, it’s always some kind of compromise. You may have to take an FPGA with more multipliers than you need in order to get the amount of LUT fabric you require or the amount of high-speed IO your design needs. There is practically never a situation where the FPGA has exactly the mix of resources required for your application.

eFPGAs like Speedcore allow your to tailor the mix of resources exactly to your anticipated application needs. And, since you’re designing your own SoC anyway, you have the flexibility to merge the FPGA core with any number of other hard resources and IO. None of that changes with this new version of Speedcore, of course. But, with this edition, there are more (and more interesting) options for the types of blocks you can include in your implementation. The capabilities of those new blocks, plus (of course) the Moore’s Law gains from dropping to a 7nm process, make this a powerful new offering for those seeking to accelerate critical applications – particularly if those applications involve AI and machine learning.

Achronix has added what it calls Machine Learning Processor (MLP) blocks to the library of available blocks for building your eFPGA implementation. The company claims the new block delivers 300% higher system performance for artificial intelligence and machine learning (AI/ML) applications. These MLP blocks are aimed at the kind of matrix-multiply operations common in CNN inferencing. Each MLP includes a local cyclical register file that leverages temporal locality for optimal reuse of stored weights or data. The MLPs are tightly coupled with neighboring MLP blocks and larger embedded memory blocks, and they support multiple precision fixed point and floating point formats, including Bfloat16, 16-bit, half-precision floating point, 24-bit floating point, and block floating point (BFP). In many ML applications, reducing the precision of these calculations can yield massive gain in performance and power consumption with very little loss in accuracy. By supporting a wide range of precisions, the MLP allows you to find the optimal compromise between performance and accuracy for your application.

Other architecture changes and improvements include a new 8-1 mux, which allows up to 8-wide muxing with a single level of logic. Also new is an 8-bit ALU with 2x the adder density of the previous generation. The new ALU is aimed at AI/ML applications, where it is frequently used for adders, counters, and comparators. There is also a new 8-bit cascadable bus-maximum function, new high-efficiency dedicated shift registers, and a new 6-input LUT with 2 registers per LUT. Taken together, these should substantially improve throughput and architectural efficiency, provided the Achronix tool chain (and synthesis in particular) can take optimal advantage of the new and changed resources.

Achronix has also added a new independent dedicated bus-routing structure to the architecture, allowing bus-grouped routing separate from the normal bit-wise routing channels. This should minimize congestion as well as improving timing by providing matched-length connections for all bits in a bus. The company says these should be optimal for busses running between memories and MLPs, and they effectively create a giant, run-time-configurable switching network on-chip. Cascadable 4-to-1 bus routing provides 2x performance for busses while saving LUT resources.

Architectural improvements in the new Speedcore allow LUT-based multipliers to be implemented more efficiently, providing the ability to create a 6×6 multiplier, using only 11 LUTs, that operates at 1 GHz. Typical FPGA implementations would require 21 LUTs for the same functional implementation.

The new resources are organized on the chip using non-traditional column adjacency that Achronix says doubles compute operations’ density by providing cascade routing between the new MLP blocks and embedded memory blocks. This dataflow is optimized for AI/ML applications and should also result in significant power savings on those types of operations, because less power will be consumed in data transfers between compute and memory resources.

Achronix uses its Speedcore Builder tool to create custom Speedcore instances to match each user’s requirements. The user can then evaluate the suitability of the generated eFPGA block for their application, and Achronix can supply die size and power information as well. This allows design teams to have a solid understanding of the functional applicability, performance, and power consumption of their eFPGA implementations long before they commit to silicon.

Achronix says Speedcore Gen4 for TSMC 7nm CMOS is available today and will be in production in 1H 2019. The company will then back-port Speedcore Gen4 for TSMC 16nm and 12nm with availability in 2H 2019.

Speedcore Gen4 should bring impressive levels of FPGA and AI/ML acceleration capability to many applications, and it could save dramatically on system cost, power, and complexity compared with solutions that use stand-alone FPGAs. With the expected dramatic growth in the market for AI/ML acceleration, we also expect to see third parties developing commercial specialized accelerator chips based on the Achronix IP. It will be interesting to watch the evolution of this market as design teams size up the various competing alternatives for compute acceleration in this exciting new domain.

Leave a Reply

featured blogs
May 15, 2022
https://youtu.be/ur6dpXrELhg Made at Steve Brown's "moving to San Diego party" (camera Larry Lapides) Monday: no post Tuesday: TechInsights: Foundation for the Future Wednesday: Open... ...
May 12, 2022
Our PCIe 5.0 IP solutions, including digital controllers and PHYs, have passed PCI-SIG 5.0 compliance testing, becoming the first on the 5.0 integrators list. The post Synopsys IP Passes PCIe 5.0 Compliance and Makes Integrators List appeared first on From Silicon To Softwar...
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...
Apr 29, 2022
What do you do if someone starts waving furiously at you, seemingly delighted to see you, but you fear they are being overenthusiastic?...

featured video

Increasing Semiconductor Predictability in an Unpredictable World

Sponsored by Synopsys

SLM presents significant value-driven opportunities for assessing the reliability and resilience of silicon devices, from data gathered during design, manufacture, test, and in-field. Silicon data driven analytics provide new actionable insights to address the challenges posed to large scale silicon designs.

Learn More

featured paper

Build More Cost-Effective and More Efficient 5G Radios with Intel® Agilex™ FPGAs

Sponsored by Intel

Learn how Intel® Agilex™ FPGAs ease development of 5G applications by tackling constantly evolving requirements with flexible, highest performance-per-watt fabric, while providing a path to production and scale with Intel® eASIC™ and full ASIC implementations.

Click to read more

featured chalk talk

Tame the SiC Beast - Unleash the Full Capacity of Silicon Carbide

Sponsored by Mouser Electronics and Microchip

Wide band gap materials such as silicon carbide are revolutionizing the power industry. At the same time, they can also introduce byproducts including overheating, short circuits and over voltage. The question remains: how can we use silicon carbide without those headache-inducing side effects? In this episode of Chalk Talk, Amelia Dalton chats with Rob Weber from Microchip about Microchip’s patented augmented switching technology can make those silicon carbide side effects a thing of the past while reducing our switching losses up to 50% and accelerating our time to market as well.

Click here for more information about the Microsemi / Microchip AgileSwitch® ASDAK+ Augmented Switching™ Dev Kit