feature article
Subscribe Now

Achronix Accelerates eFPGA

7nm Speedcore Gen4 Brings Big Improvements

Perhaps when the most important problem is a nail, every solution starts to look like a hammer. With the ramping explosion in AI and machine learning, countless companies are trying to climb on the bandwagon, morphing and melding their existing technologies in an attempt to come up with a differentiated solution that will capture a meaningful share of this mind-boggling emerging opportunity. Everybody from EDA vendors to cloud data centers to GPU companies, FPGA companies, IP companies, and boutique semiconductor startups are spinning stories about how their technology is the key to unlocking the potential of AI.

Most of them will fail.

Achronix, however, looks like a pretty strong contender. This week, the company unveiled their fourth-generation Speedcore eFPGA technology, targeting 7nm CMOS. While this new IP continues the mission of allowing FPGA fabric to be part of any SoC/ASIC design, the latest version has numerous features aimed specifically at accelerating machine learning inferencing. For many with the expertise and resources to do custom chip design, Achronix has a compelling alternative to multi-chip solutions with discrete FPGAs or other custom AI accelerators.

The Speedcore Gen4 embedded FPGA (eFPGA) IP for integration into customers’ SoCs increases performance by 60% over previous generations, reduces power by 50%, and decreases die area by 65%. These are significantly better PPA jumps than one would get simply by moving to the next process node. There are major architectural improvements at the heart of Achronix’s gains. Achronix says they are focusing on bringing “programmable hardware-acceleration capabilities to a broad range of compute, networking, and storage systems for interface protocol bridging/switching, algorithmic acceleration, and packet processing applications.”

Speedcore essentially lets you build an FPGA to your exact specifications. Since modern FPGAs contain LUT fabric, multipliers/DSP blocks, and embedded memories, at the least, and often also include processor cores and other hard blocks, stand-alone FPGAs are always built as a “guess” by the FPGA company about the relative amounts of each of these resources needed for given broad classes of applications. When you select a stand-alone FPGA, it’s always some kind of compromise. You may have to take an FPGA with more multipliers than you need in order to get the amount of LUT fabric you require or the amount of high-speed IO your design needs. There is practically never a situation where the FPGA has exactly the mix of resources required for your application.

eFPGAs like Speedcore allow your to tailor the mix of resources exactly to your anticipated application needs. And, since you’re designing your own SoC anyway, you have the flexibility to merge the FPGA core with any number of other hard resources and IO. None of that changes with this new version of Speedcore, of course. But, with this edition, there are more (and more interesting) options for the types of blocks you can include in your implementation. The capabilities of those new blocks, plus (of course) the Moore’s Law gains from dropping to a 7nm process, make this a powerful new offering for those seeking to accelerate critical applications – particularly if those applications involve AI and machine learning.

Achronix has added what it calls Machine Learning Processor (MLP) blocks to the library of available blocks for building your eFPGA implementation. The company claims the new block delivers 300% higher system performance for artificial intelligence and machine learning (AI/ML) applications. These MLP blocks are aimed at the kind of matrix-multiply operations common in CNN inferencing. Each MLP includes a local cyclical register file that leverages temporal locality for optimal reuse of stored weights or data. The MLPs are tightly coupled with neighboring MLP blocks and larger embedded memory blocks, and they support multiple precision fixed point and floating point formats, including Bfloat16, 16-bit, half-precision floating point, 24-bit floating point, and block floating point (BFP). In many ML applications, reducing the precision of these calculations can yield massive gain in performance and power consumption with very little loss in accuracy. By supporting a wide range of precisions, the MLP allows you to find the optimal compromise between performance and accuracy for your application.

Other architecture changes and improvements include a new 8-1 mux, which allows up to 8-wide muxing with a single level of logic. Also new is an 8-bit ALU with 2x the adder density of the previous generation. The new ALU is aimed at AI/ML applications, where it is frequently used for adders, counters, and comparators. There is also a new 8-bit cascadable bus-maximum function, new high-efficiency dedicated shift registers, and a new 6-input LUT with 2 registers per LUT. Taken together, these should substantially improve throughput and architectural efficiency, provided the Achronix tool chain (and synthesis in particular) can take optimal advantage of the new and changed resources.

Achronix has also added a new independent dedicated bus-routing structure to the architecture, allowing bus-grouped routing separate from the normal bit-wise routing channels. This should minimize congestion as well as improving timing by providing matched-length connections for all bits in a bus. The company says these should be optimal for busses running between memories and MLPs, and they effectively create a giant, run-time-configurable switching network on-chip. Cascadable 4-to-1 bus routing provides 2x performance for busses while saving LUT resources.

Architectural improvements in the new Speedcore allow LUT-based multipliers to be implemented more efficiently, providing the ability to create a 6×6 multiplier, using only 11 LUTs, that operates at 1 GHz. Typical FPGA implementations would require 21 LUTs for the same functional implementation.

The new resources are organized on the chip using non-traditional column adjacency that Achronix says doubles compute operations’ density by providing cascade routing between the new MLP blocks and embedded memory blocks. This dataflow is optimized for AI/ML applications and should also result in significant power savings on those types of operations, because less power will be consumed in data transfers between compute and memory resources.

Achronix uses its Speedcore Builder tool to create custom Speedcore instances to match each user’s requirements. The user can then evaluate the suitability of the generated eFPGA block for their application, and Achronix can supply die size and power information as well. This allows design teams to have a solid understanding of the functional applicability, performance, and power consumption of their eFPGA implementations long before they commit to silicon.

Achronix says Speedcore Gen4 for TSMC 7nm CMOS is available today and will be in production in 1H 2019. The company will then back-port Speedcore Gen4 for TSMC 16nm and 12nm with availability in 2H 2019.

Speedcore Gen4 should bring impressive levels of FPGA and AI/ML acceleration capability to many applications, and it could save dramatically on system cost, power, and complexity compared with solutions that use stand-alone FPGAs. With the expected dramatic growth in the market for AI/ML acceleration, we also expect to see third parties developing commercial specialized accelerator chips based on the Achronix IP. It will be interesting to watch the evolution of this market as design teams size up the various competing alternatives for compute acceleration in this exciting new domain.

Leave a Reply

featured blogs
Feb 26, 2021
In the SPECTRE 20.1 base release, we released Spectre® XDP-HB as part of the new Spectre X-RF simulation technology. Spectre XDP-HB uses a highly distributed multi-machine multi-core simulation... [[ Click on the title to access the full blog on the Cadence Community si...
Feb 25, 2021
Learn how ASIL-certified EDA tools help automotive designers create safe, secure, and reliable Advanced Driver Assistance Systems (ADAS) for smart vehicles. The post Upping the Safety Game Plan for Automotive SoCs appeared first on From Silicon To Software....
Feb 24, 2021
mmWave applications are all the rage. Why? Simply put, the 5G tidal wave is coming. Also, ADAS systems use 24 GHz for SRR applications and 77 GHz for LRR applications. Obviously, the world needs mmWave tech! Traditional mmWave technology spans the 30 – 300 GHz frequency...
Feb 24, 2021
Crowbits are programmable, LEGO-compatible, magnetically-coupled electronic blocks to interest kids in electronics and computing and facilitate their STEM activities....

featured video

Silicon-Proven Automotive-Grade DesignWare IP

Sponsored by Synopsys

Get the latest on Synopsys' automotive IP portfolio supporting ISO 26262 functional safety, reliability, and quality management standards, with an available architecture for SoC development and safety management.

Click here for more information

featured paper

Making it easier to design with mmWave radar sensors using the TI third-party ecosystem

Sponsored by Texas Instruments

If you are new to radar or interested in replacing your existing sensing technology with radar, there can be a significant learning curve to both designing your product and ramping to production. In order to lower this barrier, Texas Instruments created a third-party ecosystem of radar experts who can provide solutions no matter how much help you need.

Click here to download the whitepaper

featured chalk talk

Thermocouple Temperature Sensor Solution

Sponsored by Mouser Electronics and Microchip

When it comes to temperature monitoring and management, industrial applications can be extremely demanding. With temperatures that can range from 270 to 3000 C, consumer-grade temperature probes just don’t cut it. In this episode of Chalk Talk, Amelia Dalton chats with Ezana Haile of Microchip technology about using thermocouples for temperature monitoring in industrial applications.

More information about Microchip Technology MCP9600, MCP96L00, & MCP96RL00 Thermocouple ICs