feature article
Subscribe Now

Raising the eFPGA Bar

Achronix Introduces Custom Blocks

The recent explosion of FPGA-based compute acceleration has created an enormous new market opportunity for programmable logic. With demanding new applications such as neural networks on the rise, specialized hardware is required for offloading computation loads for newer applications – both in data centers and in edge devices and systems. Traditional FPGA companies like Xilinx and Intel/Altera are making enormous pushes to capture this rapidly emerging market.

Traditionally, FPGAs have been relegated to mostly low-volume and prototyping roles, with ASSPs and custom chips coming along and snatching the high volume sales as applications rolled into the mainstream. The interesting thing about the compute acceleration market, however, is that hardware programmability is actually required long term in order to swap various acceleration loads. For FPGA companies, that means the risk of being supplanted by later ASSP implementations is much lower. FPGAs could actually potentially go the distance and hold onto the sockets long term.

At the same time, however, eFPGAs represent a serious challenge to FPGAs in capturing the long, big-money tail of the compute acceleration wave. eFPGAs are IP blocks that can be used by custom chip designers to add FPGA fabric to any custom IC design. That means that custom devices and ASSPs could be created for compute acceleration that include on-chip customizable FPGA fabric without the need for an external FPGA.

Integrating the FPGA fabric with other custom hardware on chip brings some serious advantages over parking an FPGA near an SoC or conventional processor. The connectivity available for moving data between the programmable logic and other hardware is dramatically increased, with greater bandwidth, lower latency, and much lower power consumption. If you were designing a custom SoC or other application-specific chip and planned to park an FPGA next to it to gain the flexibility of programmable logic, you can now put that FPGA fabric inside your chip and save on the BOM, power, board area, and complexity while improving your system performance. If you’re designing a chip anyway, eFPGA seems like a clear win from an integration point of view.

One of the major challenges with FPGAs has always been balancing the amount and type of hardened logic with the LUT fabric. In addition to LUTs, FPGAs typically have hardened/optimized resources such as DSP/arithmetic blocks, various configurations of distributed on-chip RAM, specialized IO blocks, and even processors and peripherals. But, for any given application, finding an FPGA with the particular mix of those hardened resources you need is a challenge, and ultimately most designers end up with a surplus of one or more resources in order to get the required amount of another.

Achronix has just announced “Speedcore Custom Blocks” for their family of eFPGAs, and it appears to solve that problem and more. Speedcore Custom Blocks allow you to build exactly the embedded FPGA you need for your application, with the optimal amount of each type of hardened resource. Furthermore, it allows you to create custom cells inside the FPGA fabric area specific to your application.

Here’s how it works. Achronix will first profile your design to identify functions that would be optimal candidates for custom block implementation. They are looking for regularly repeated functions where a custom implementation would yield significant advantages in area, performance, and power. They then create a new Speedcore instance to try out in the ACE tool suite. You can iterate on this process until you have a block architecture that works best for your design, and then your final FPGA blocks will be created with your new custom blocks included. Additionally, the Achronix ACE design tools will be automatically customized to take advantage of the new hardware when implementing target designs for your FPGA.

Standard Speedcore blocks that can be tuned include expected functions such as LUTs with ALUs, block RAM, LRAM, and DSP blocks. Examples include a die-size reduction for a convolutional neural network: 576 instances of 18×27 multipliers were replaced with instances of 16×8 multipliers for a 50% area savings, and 288 instances of 32×128 RAM were replaced by 144 instances of 48×1024 RAM for a 25% area savings. This yielded an overall die size reduction of 35%, simply by using the flexibility to “right size” the DSP and RAM resources in the FPGA fabrid.

Customer-specific blocks can run a huge gamut, including things like packet inspection (PX) blocks, TCAMs, and parallel string search. In each case, rolling the custom blocks into the FPGA fabric yielded significant advantages in power, performance, and area. Optimizing the FPGA fabric in this way brings huge rewards on top of the already-significant reduction in total silicon area from using an eFPGA (versus a stand-alone FPGA) in the first place. Speedcore Custom Blocks clearly have the potential to bring massive advantages to eFPGA solutions for many (if not most) applications.

It is important to point out that the job doesn’t end with just throwing a column or two of specialized hardware into the FPGA fabric. In order to be useful, new blocks need to be easy to design in, and they need to be fully supported by the FPGA design tool suite. Achronix works with the customer to design a specialized GUI for each custom block. This GUI includes validation rules, and it automatically creates the component for use in RTL designs. These blocks are also then visible in the floorplan view and critical path views in the ACE tools. The timing-driven place-and-route takes advantage of the custom blocks, and the timing information for the new blocks is presented in the standard way in the timing reports.

This ability to customize eFPGAs for the application, both by adjusting the mix of standard hard IP and by rolling in new custom function blocks, brings significant new value to the eFPGA approach. For ASSP designers or system design teams whose project already includes creating a custom chip, eFPGA with custom blocks offers a solution that brings much more than simple in-the-field flexibility for a custom chip design. It has the potential to be a serious game changer for high-value compute acceleration applications.

Leave a Reply

featured blogs
Dec 8, 2023
Read the technical brief to learn about Mixed-Order Mesh Curving using Cadence Fidelity Pointwise. When performing numerical simulations on complex systems, discretization schemes are necessary for the governing equations and geometry. In computational fluid dynamics (CFD) si...
Dec 7, 2023
Explore the different memory technologies at the heart of AI SoC memory architecture and learn about the advantages of SRAM, ReRAM, MRAM, and beyond.The post The Importance of Memory Architecture for AI SoCs appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

Universal Verification Methodology Coverage for Bluespec RISC-V Cores

Sponsored by Synopsys

This whitepaper explains the basics of UVM functional coverage for RISC-V cores using the Google RISCV-DV open-source project, Synopsys verification solutions, and a RISC-V processor core from Bluespec.

Click to read more

featured chalk talk

PIC32CX-BZ2 and WBZ451 Multi-Protocol Wireless MCU Family
Sponsored by Mouser Electronics and Microchip
In this episode of Chalk Talk, Amelia Dalton and Shishir Malav from Microchip explore the benefits of the PIC32CX-BZ2 and WBZ45 Multi-protocol Wireless MCU Family and how it can make IoT design easier than ever before. They investigate the components included in this multi-protocol wireless MCU family, the details of the software architecture included in this solution, and how you can utilize these MCUs in your next design.
May 4, 2023
26,757 views