feature article
Subscribe Now

Achronix Accelerates eFPGA

7nm Speedcore Gen4 Brings Big Improvements

Perhaps when the most important problem is a nail, every solution starts to look like a hammer. With the ramping explosion in AI and machine learning, countless companies are trying to climb on the bandwagon, morphing and melding their existing technologies in an attempt to come up with a differentiated solution that will capture a meaningful share of this mind-boggling emerging opportunity. Everybody from EDA vendors to cloud data centers to GPU companies, FPGA companies, IP companies, and boutique semiconductor startups are spinning stories about how their technology is the key to unlocking the potential of AI.

Most of them will fail.

Achronix, however, looks like a pretty strong contender. This week, the company unveiled their fourth-generation Speedcore eFPGA technology, targeting 7nm CMOS. While this new IP continues the mission of allowing FPGA fabric to be part of any SoC/ASIC design, the latest version has numerous features aimed specifically at accelerating machine learning inferencing. For many with the expertise and resources to do custom chip design, Achronix has a compelling alternative to multi-chip solutions with discrete FPGAs or other custom AI accelerators.

The Speedcore Gen4 embedded FPGA (eFPGA) IP for integration into customers’ SoCs increases performance by 60% over previous generations, reduces power by 50%, and decreases die area by 65%. These are significantly better PPA jumps than one would get simply by moving to the next process node. There are major architectural improvements at the heart of Achronix’s gains. Achronix says they are focusing on bringing “programmable hardware-acceleration capabilities to a broad range of compute, networking, and storage systems for interface protocol bridging/switching, algorithmic acceleration, and packet processing applications.”

Speedcore essentially lets you build an FPGA to your exact specifications. Since modern FPGAs contain LUT fabric, multipliers/DSP blocks, and embedded memories, at the least, and often also include processor cores and other hard blocks, stand-alone FPGAs are always built as a “guess” by the FPGA company about the relative amounts of each of these resources needed for given broad classes of applications. When you select a stand-alone FPGA, it’s always some kind of compromise. You may have to take an FPGA with more multipliers than you need in order to get the amount of LUT fabric you require or the amount of high-speed IO your design needs. There is practically never a situation where the FPGA has exactly the mix of resources required for your application.

eFPGAs like Speedcore allow your to tailor the mix of resources exactly to your anticipated application needs. And, since you’re designing your own SoC anyway, you have the flexibility to merge the FPGA core with any number of other hard resources and IO. None of that changes with this new version of Speedcore, of course. But, with this edition, there are more (and more interesting) options for the types of blocks you can include in your implementation. The capabilities of those new blocks, plus (of course) the Moore’s Law gains from dropping to a 7nm process, make this a powerful new offering for those seeking to accelerate critical applications – particularly if those applications involve AI and machine learning.

Achronix has added what it calls Machine Learning Processor (MLP) blocks to the library of available blocks for building your eFPGA implementation. The company claims the new block delivers 300% higher system performance for artificial intelligence and machine learning (AI/ML) applications. These MLP blocks are aimed at the kind of matrix-multiply operations common in CNN inferencing. Each MLP includes a local cyclical register file that leverages temporal locality for optimal reuse of stored weights or data. The MLPs are tightly coupled with neighboring MLP blocks and larger embedded memory blocks, and they support multiple precision fixed point and floating point formats, including Bfloat16, 16-bit, half-precision floating point, 24-bit floating point, and block floating point (BFP). In many ML applications, reducing the precision of these calculations can yield massive gain in performance and power consumption with very little loss in accuracy. By supporting a wide range of precisions, the MLP allows you to find the optimal compromise between performance and accuracy for your application.

Other architecture changes and improvements include a new 8-1 mux, which allows up to 8-wide muxing with a single level of logic. Also new is an 8-bit ALU with 2x the adder density of the previous generation. The new ALU is aimed at AI/ML applications, where it is frequently used for adders, counters, and comparators. There is also a new 8-bit cascadable bus-maximum function, new high-efficiency dedicated shift registers, and a new 6-input LUT with 2 registers per LUT. Taken together, these should substantially improve throughput and architectural efficiency, provided the Achronix tool chain (and synthesis in particular) can take optimal advantage of the new and changed resources.

Achronix has also added a new independent dedicated bus-routing structure to the architecture, allowing bus-grouped routing separate from the normal bit-wise routing channels. This should minimize congestion as well as improving timing by providing matched-length connections for all bits in a bus. The company says these should be optimal for busses running between memories and MLPs, and they effectively create a giant, run-time-configurable switching network on-chip. Cascadable 4-to-1 bus routing provides 2x performance for busses while saving LUT resources.

Architectural improvements in the new Speedcore allow LUT-based multipliers to be implemented more efficiently, providing the ability to create a 6×6 multiplier, using only 11 LUTs, that operates at 1 GHz. Typical FPGA implementations would require 21 LUTs for the same functional implementation.

The new resources are organized on the chip using non-traditional column adjacency that Achronix says doubles compute operations’ density by providing cascade routing between the new MLP blocks and embedded memory blocks. This dataflow is optimized for AI/ML applications and should also result in significant power savings on those types of operations, because less power will be consumed in data transfers between compute and memory resources.

Achronix uses its Speedcore Builder tool to create custom Speedcore instances to match each user’s requirements. The user can then evaluate the suitability of the generated eFPGA block for their application, and Achronix can supply die size and power information as well. This allows design teams to have a solid understanding of the functional applicability, performance, and power consumption of their eFPGA implementations long before they commit to silicon.

Achronix says Speedcore Gen4 for TSMC 7nm CMOS is available today and will be in production in 1H 2019. The company will then back-port Speedcore Gen4 for TSMC 16nm and 12nm with availability in 2H 2019.

Speedcore Gen4 should bring impressive levels of FPGA and AI/ML acceleration capability to many applications, and it could save dramatically on system cost, power, and complexity compared with solutions that use stand-alone FPGAs. With the expected dramatic growth in the market for AI/ML acceleration, we also expect to see third parties developing commercial specialized accelerator chips based on the Achronix IP. It will be interesting to watch the evolution of this market as design teams size up the various competing alternatives for compute acceleration in this exciting new domain.

Leave a Reply

featured blogs
Apr 25, 2024
Structures in Allegro X layout editors let you create reusable building blocks for your PCBs, saving you time and ensuring consistency. What are Structures? Structures are pre-defined groups of design objects, such as vias, connecting lines (clines), and shapes. You can combi...
Apr 25, 2024
See how the UCIe protocol creates multi-die chips by connecting chiplets from different vendors and nodes, and learn about the role of IP and specifications.The post Want to Mix and Match Dies in a Single Package? UCIe Can Get You There appeared first on Chip Design....
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Introducing QSPICE™ Analog & Mixed-Signal Simulator
Sponsored by Mouser Electronics and Qorvo
In this episode of Chalk Talk, Amelia Dalton and Mike Engelhardt from Qorvo investigate the benefits of QSPICE™ - Qorvo’s Analog & Mixed-Signal Simulator. They also explore how you can get started using this simulator, the supporting assets available for QSPICE, and why this free analog and mixed-signal simulator is a transformational tool for power designers.
Mar 5, 2024
5,859 views