feature article
Subscribe Now

Future Proofing Your FinFET Design

Flex Logix Adds LUTs to your 16nm Chip

Building a custom chip on the latest 16nm FinFET processes is a mind-bogglingly challenging, risky, and expensive proposition. Non-recurring engineering costs have skyrocketed in recent years, and the complexity of doing a design on the latest process nodes is sobering for even the most experienced design teams. Schedules have stretched, and fear of the dreaded respin has ramped up to abject terror as the stakes have continued to rise.

But what if you could hedge your bets? What if you could bring the power and flexibility of programmable logic fabric to the critical parts of your custom design – doing things like accelerating computation tasks, adding ways to customize your design for multiple application scenarios, future-proofing your interfaces, and bringing options for adding or altering custom logic in the field – long after the chip design is done.

Then, you wouldn’t have to wait for that interface standard to be locked down before you started your multi-year design project. You wouldn’t have to do multiple tape-outs for different variations of your design. You wouldn’t have to be quite so worried about a minor logic or specification error causing a respin. And, you might be able to add some ultra-high-performance accelerators that would boost key computing tasks while reducing power consumption. How awesome would that be for everyone involved?

Flex Logix (we’ve written about them before) is an IP company that offers FPGA fabric that you can incorporate into any custom design. Their EFLX cores come in 100 LUT and 2,500 LUT chunks, and you can snap them together to form an almost-arbitrarily large MxN array (from 120 LUTs to 122.5K LUTs), customizing the amount of FPGA fabric you need to fit your application and your available silicon real estate. If you are planning to use the fabric for compute acceleration, you can choose a “DSP” flavored block that includes MACs, or you can select the “all logic” block to maximize your LUTs.

Flex Logix also boasts patented technology that reduces the number of routing resources required to achieve a particular utilization in their LUT fabric, so you can expect a high utilization even on large LUT arrays, an issue that would normally be problematic, because scaling LUT arrays typically requires increasingly large percentages of the area to be allocated for interconnect-related resources. Flex Logix claims that their technology results in a 45% reduction in interconnect area required.

Flex Logix has proven their IP on popular TSMC processes, including TSMC 40 ULP/EF/LP, TSMC 28 HPM/HPC/HPC+, and TSMC 16 FF+/FFC. The 40nm families were rolled out in August 2016, and the current announcement is the debut of the TSMC 16nm technologies. The implementation requires only 5 metal layers for 40nm and 6 metal layers for 28/16nm (remember, this is physical IP, whereas most commercial IP offerings are synthesizable RTL), so adding LUTs to your design shouldn’t add any special process steps or extra masks. 

Of course, large fields of LUTs aren’t going to do much for anybody without a proven tool flow. Flex Logix has partnered with Synopsys to develop support for the popular Synplify FPGA synthesis tool, and they have developed their own place-and-route to finalize your implementation. The tool flow should be straightforward, particularly considering that many design teams are already using Synopsys tools for synthesis on other parts of their design.

From our perspective, acceleration is the killer app for this technology. Most teams today designing custom SoC devices are embedding various high-performance applications processors such as 64-bit ARM cores. But in many application domains, heavy lifting compute tasks can be offloaded into FPGA LUT fabric, sometimes gaining orders of magnitude in performance while drastically cutting power consumption. In other cases, entire algorithms can be executed in the FPGA fabric without powering up the applications processors at all. 

Designing an SoC where arbitrary FPGA-based accelerators can be loaded in at runtime brings a whole new level of performance and power efficiency to SoC designs in areas like embedded vision, cryptography, sensor fusion, software-defined radio, and many others. And, in many of these areas, algorithms are still in a high state of flux, so hard-wiring them into hardened accelerators is not a practical option.

And, putting the fabric on the same die with the applications processor gives some significant power and performance advantages when compared with using discrete FPGAs as accelerators. On-chip connections between processor, memory, and FPGA fabric are far faster, much lower latency, lower power, and less expensive than off-chip interfaces. And all those IOs that you don’t have to use connecting your custom chip to an FPGA can be used for other tasks or can result in a smaller package and simpler board design. 

While there have been numerous attempts in the past to put FPGA fabric onto custom chips, Flex Logix seems to have an approach that is getting traction. Perhaps the combination of cost and performance of current semiconductor processes, and the extreme demands of popular applications in terms of performance and power consumption requirements is causing a perfect storm where embedded FPGA fabric makes both technological and financial sense. 

In numerous cases, custom SoCs and ASICs are seen on boards with FPGAs parked next to them, with the FPGA adding flexibility, performance, connectivity, or sometimes just fixing a problem with the original chip design. If building that FPGA fabric into the chip in the first place makes the design useful over a wider range of applications, makes it viable in the market longer, or brings a level of performance and power efficiency that would otherwise be unattainable, that would be a very compelling use of some comparatively inexpensive silicon real estate.

Leave a Reply

featured blogs
Feb 20, 2024
Graphics processing units (GPUs) have significantly transcended their original purpose, now at the heart of myriad high-performance computing applications. GPUs accelerate processes in fields ranging from artificial intelligence (AI) and machine learning to video editing and ...
Feb 15, 2024
This artist can paint not just with both hands, but also with both feet, and all at the same time!...

featured video

Shape The Future Now with Synopsys ARC-V Processor IP

Sponsored by Synopsys

Synopsys ARC-V™ Processor IP delivers the optimal power-performance-efficiency and extensibility of ARC processors with broad software and tools support from Synopsys and the expanding RISC-V ecosystem. Built on the success of multiple generations of ARC processor IP covering a broad range of processor implementations, including functional safety (FS) versions, the ARC-V portfolio delivers what you need to optimize and differentiate your SoC.

Learn more about Synopsys ARC-V RISC-V Processor IP

featured paper

How to Deliver Rock-Solid Supply in a Complex and Ever-Changing World

Sponsored by Intel

A combination of careful planning, focused investment, accurate tracking, and commitment to product longevity delivers the resilient supply chain FPGA customers require.

Click here to read more

featured chalk talk

Maximizing High Power Density and Efficiency in EV-Charging Applications
Sponsored by Mouser Electronics and Infineon
In this episode of Chalk Talk, Amelia Dalton and Daniel Dalpiaz from Infineon talk about trends in the greater electrical vehicle charging landscape, typical block diagram components, and tradeoffs between discrete devices versus power modules. They also discuss choices between IGBT’s and Silicon Carbide, the advantages of advanced packaging techniques in both power discrete and power module solutions, and how reliability is increasingly important due to demands for more charging cycles per day.
Dec 18, 2023
8,801 views