feature article
Subscribe Now

Semi-Programmable

New Architectures Optimize the Mix

It stands to reason.

Some components of system-on-chip design are static. You’re not going back and re-engineering them every two weeks. The multiplier was designed long ago and doesn’t really need to be designed again every time the moon changes phase. Neither does the PCI core, for that matter. They’re both stable and well-debugged. It’s unlikely that you’re ever going to need to modify or reconfigure them.

Why, then, does it make sense for these common functions to be built out of programmable logic, subject to the performance, area, and power penalties of LUT-based implementations, and at risk for the random timing and layout problems that can creep into large FPGA designs with soft macros? Of course, it does not.

The major FPGA vendors figured this out some time ago and began putting the very common functions (like multipliers) in hard, cell-based-like implementations on their FPGAs. For a small silicon investment, stable functions could be accelerated to ASIC performance and power, leaving more LUTs for the logic that needed them. This architectural addition makes good sense and has become an accepted feature of most high- (and now even some low-) end FPGAs.

Multipliers are all fine and good, but wouldn’t the same reasoning lead one to hard implementations of larger, more complex, stable functions? It turns out that this line of logic leads us down a slippery slope. Every time an FPGA vendor builds another hard macro onto their FPGA, they make life better for the people who use it, but worse for the people who don’t. When Xilinx first introduced Virtex II Pro, many (maybe the majority) of early users didn’t take advantage of the PowerPC cores lovingly laid out amidst the LUT fabric. For them, the processors were just so much wasted cost and chip area that could have been put to better use in their application.

The more goodies a vendor packs on, the narrower they make the optimal audience for the device. Xilinx has fought back against this problem with the ASMBL (Advanced Silicon Modular BLock) architecture of their newly announced Virtex-4 series. The new architecture allows designers to choose from four different mixtures of “special” features on their FPGA according to their needs. The PowerPC is included in only one of those mixes. This moves things in the right direction, but choosing the “vegetarian,” “meat-lover’s,” or “4-cheese” pizza won’t let you create exactly the “half-pepperoni, half-mushroom, sauce-on-the-side, mozzarella-and-cheddar” concoction that you sometimes actually crave.

Alternatively, Altera has gone the structured ASIC route with its HardCopy solution. HardCopy allows an FPGA design to be re-spun into a completely mask-programmed implementation for higher performance, lower-power, and substantially lower cost. The penalty is a small NRE, a few weeks of cycle time, and loss of re-programmability.

What if you could have your cake and eat it too? What if you could implement any function you wanted in mask-programmable logic, and save the inefficiencies of programmability for the portions of your design that really need it?

Leopard Logic is offering just that with their newly announced “Gladiator” series. While using Gladiator does entail the dreaded “handoff” and “NRE” components that FPGA designers disdain, Leopard Logic has worked to make these feared phases as easy and inexpensive as possible. What Gladiator does offer is both high-performance mask-programmed and LUT-based programmable fabrics on the same chip. For the portions of your design that are stable, unchanging, and fixed, you can leverage the mask-programmed portions of the fabric for performance, power, and area efficiency. For the pieces of your design that vary, you have LUT-based logic that can be configured and re-configured on-the-fly. With a little forethought, you can build your own, super-customized FPGA-like platform with your legacy IP nicely tucked away and working in the mask-customized section, then use the programmable fabric to create variants for multiple products, or to update as standards and protocols change.

Gladiator’s core logic comes in two flavors, HyperBlox MP (metal-programmed with a single via layer) and HyperBlox FP (field-programmable using FPGA-style SRAM structures). They’ve also gone for full cell-based implementations of Multiply-Accumulate (MAC) blocks and RAM for maximum efficiency in these commonly used functions. The hybrid mixture is what Leopard Logic calls a “CLD” or “Configurable Logic Device”. CLDs differ from structured ASICs in that they offer FPGA-like programmable logic fabric along with the mask-customized portion. They truly represent a middle ground between ASIC and FPGA.

Gladiator aims at a spot near the high-end of FPGA with substantially lower unit-costs and improved power and performance. The top-end of the Gladiator family, the CLD25000 will boast over 25 million “System Gates” with 256K mask-programmed cells, 16K FPGA cells, 256 each of the 36K DPRAM and 18X18MAC blocks, and 16 PLL/DLLs. The smallest member of the family, the CLD1600 is rated at about 1.6M system gates with proportionally fewer of each feature.

Leopard Logic claims that the NRE has been brought down to around $50K(USD) for a single-layer mask required for the via-based metal customization, and a snappy 4-week lead time. While this may be a slight inconvenience compared to an FPGA-based implementation, the benefits are substantial and many design teams may be happy with the trade-off.

The Gladiator design flow is built around a simple-to-use “ToolBlox” cockpit integrating industry-standard tools like those from Synopsys and Mentor Graphics. Leopard Logic also has worked with IP vendors to offer a library of ready-to-use logic blocks for systems designers starting out with Gladiator-based designs.

As a proof-of-concept application, Leopard Logic created a control plane PowerPC bridge for network, storage, and wireless applications. The design used IP from multiple vendors implemented in the mask-programmed fabric, and used the programmable fabric for IP-IP interfaces (where most problems usually occur) as well as Ethernet MAC bus interface and other newly designed blocks that might require changes. They claim that the resulting solution gives superior performance and price with minimal design effort, risk, and NRE overhead.

While the popularity of hybrid solutions such as Gladiator remains to be seen, the fact remains that there is a vast, underserved gulf between cell-based ASIC and programmable logic that should become a dynamic and lucrative market in the next few years. The innovation that will eventually fill that gap is probably just beginning.

Leave a Reply

featured blogs
Dec 2, 2024
The Wi-SUN Smart City Living Lab Challenge names the winners with Farmer's Voice, a voice command app for agriculture use, taking first place. Read the blog....
Nov 22, 2024
I just saw a video on YouTube'”it's a few very funny minutes from a show by an engineer who transitioned into being a comedian...

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

SLM Silicon.da Introduction
Sponsored by Synopsys
In this episode of Chalk Talk, Amelia Dalton and Guy Cortez from Synopsys investigate how Synopsys’ Silicon.da platform can increase engineering productivity and silicon efficiency while providing the tool scalability needed for today’s semiconductor designs. They also walk through the steps involved in a SLM workflow and examine how this open and extensible platform can help you avoid pitfalls in each step of your next IC design.
Dec 6, 2023
62,156 views