The recent explosion of FPGA-based compute acceleration has created an enormous new market opportunity for programmable logic. With demanding new applications such as neural networks on the rise, specialized hardware is required for offloading computation loads for newer applications – both in data centers and in edge devices and systems. Traditional FPGA companies like Xilinx and Intel/Altera are making enormous pushes to capture this rapidly emerging market.
Traditionally, FPGAs have been relegated to mostly low-volume and prototyping roles, with ASSPs and custom chips coming along and snatching the high volume sales as applications rolled into the mainstream. The interesting thing about the compute acceleration market, however, is that hardware programmability is actually required long term in order to swap various acceleration loads. For FPGA companies, that means the risk of being supplanted by later ASSP implementations is much lower. FPGAs could actually potentially go the distance and hold onto the sockets long term.
At the same time, however, eFPGAs represent a serious challenge to FPGAs in capturing the long, big-money tail of the compute acceleration wave. eFPGAs are IP blocks that can be used by custom chip designers to add FPGA fabric to any custom IC design. That means that custom devices and ASSPs could be created for compute acceleration that include on-chip customizable FPGA fabric without the need for an external FPGA.
Integrating the FPGA fabric with other custom hardware on chip brings some serious advantages over parking an FPGA near an SoC or conventional processor. The connectivity available for moving data between the programmable logic and other hardware is dramatically increased, with greater bandwidth, lower latency, and much lower power consumption. If you were designing a custom SoC or other application-specific chip and planned to park an FPGA next to it to gain the flexibility of programmable logic, you can now put that FPGA fabric inside your chip and save on the BOM, power, board area, and complexity while improving your system performance. If you’re designing a chip anyway, eFPGA seems like a clear win from an integration point of view.
One of the major challenges with FPGAs has always been balancing the amount and type of hardened logic with the LUT fabric. In addition to LUTs, FPGAs typically have hardened/optimized resources such as DSP/arithmetic blocks, various configurations of distributed on-chip RAM, specialized IO blocks, and even processors and peripherals. But, for any given application, finding an FPGA with the particular mix of those hardened resources you need is a challenge, and ultimately most designers end up with a surplus of one or more resources in order to get the required amount of another.
Achronix has just announced “Speedcore Custom Blocks” for their family of eFPGAs, and it appears to solve that problem and more. Speedcore Custom Blocks allow you to build exactly the embedded FPGA you need for your application, with the optimal amount of each type of hardened resource. Furthermore, it allows you to create custom cells inside the FPGA fabric area specific to your application.
Here’s how it works. Achronix will first profile your design to identify functions that would be optimal candidates for custom block implementation. They are looking for regularly repeated functions where a custom implementation would yield significant advantages in area, performance, and power. They then create a new Speedcore instance to try out in the ACE tool suite. You can iterate on this process until you have a block architecture that works best for your design, and then your final FPGA blocks will be created with your new custom blocks included. Additionally, the Achronix ACE design tools will be automatically customized to take advantage of the new hardware when implementing target designs for your FPGA.
Standard Speedcore blocks that can be tuned include expected functions such as LUTs with ALUs, block RAM, LRAM, and DSP blocks. Examples include a die-size reduction for a convolutional neural network: 576 instances of 18×27 multipliers were replaced with instances of 16×8 multipliers for a 50% area savings, and 288 instances of 32×128 RAM were replaced by 144 instances of 48×1024 RAM for a 25% area savings. This yielded an overall die size reduction of 35%, simply by using the flexibility to “right size” the DSP and RAM resources in the FPGA fabrid.
Customer-specific blocks can run a huge gamut, including things like packet inspection (PX) blocks, TCAMs, and parallel string search. In each case, rolling the custom blocks into the FPGA fabric yielded significant advantages in power, performance, and area. Optimizing the FPGA fabric in this way brings huge rewards on top of the already-significant reduction in total silicon area from using an eFPGA (versus a stand-alone FPGA) in the first place. Speedcore Custom Blocks clearly have the potential to bring massive advantages to eFPGA solutions for many (if not most) applications.
It is important to point out that the job doesn’t end with just throwing a column or two of specialized hardware into the FPGA fabric. In order to be useful, new blocks need to be easy to design in, and they need to be fully supported by the FPGA design tool suite. Achronix works with the customer to design a specialized GUI for each custom block. This GUI includes validation rules, and it automatically creates the component for use in RTL designs. These blocks are also then visible in the floorplan view and critical path views in the ACE tools. The timing-driven place-and-route takes advantage of the custom blocks, and the timing information for the new blocks is presented in the standard way in the timing reports.
This ability to customize eFPGAs for the application, both by adjusting the mix of standard hard IP and by rolling in new custom function blocks, brings significant new value to the eFPGA approach. For ASSP designers or system design teams whose project already includes creating a custom chip, eFPGA with custom blocks offers a solution that brings much more than simple in-the-field flexibility for a custom chip design. It has the potential to be a serious game changer for high-value compute acceleration applications.