feature article
Subscribe Now

Towards Silicon Convergence

Altera’s CTO Weighs In on the Future

We have often discussed the many ramifications of Moore’s Law in these pages. Of course, chips continue to get exponentially cheaper, faster, more capable, and more efficient. Also of course, the fixed costs of making a new chip continue to get exponentially higher. If one combines these two trends, one sees that we must be increasingly careful on what chips we choose to make. Any company setting out to design a new chunk of leading-node silicon these days must be quite certain that they are building something that either works in a wide range of diverse applications or solves a critical problem in a single application with enormous production volume. Otherwise, the amortized fixed costs make the project infeasible. 

We chatted at length with Altera CTO Misha Burich recently about the past and future of FPGA technology and the increasing trend toward silicon convergence. As one of the few companies that meets the criteria above and is designing new products on the newest process nodes (28nm and beyond), Altera spends a lot of time and energy making sure they’re building silicon that people actually need and will use. As CTO, a big part of Burich’s job is to look forward in the technological crystal ball and extrapolate current trends into future direction.

Burich segmented the world of logic devices into three categories – General Processors, PLDs, and Application-Specific. He expressed the relationships between these as a contiunuum, with the “flexible” end of the spectrum held down by microprocessors, microcontrollers, and DSPs. The “efficient” devices were on the other end of the scale – with ASSPs and ASICs. In the middle, Burich explains, are programmable logic devices like FPGAs. Choosing a spot on this continuum, therefore, amounts to choosing a tradeoff between flexibility and computational efficiency. 

Over the past several years, with Moore’s Law driving down the price of transistors to near-free, we’ve seen an increasing trend toward new chips with many of these traits. Just about every device coming off the fab these days is some kind of SoC – with one or more processing engines such as microprocessors, microcontrollers, or DSPs. At tiny geometries like 28nm, a complete processing subsystem including processor, bus, peripherals, and some memory amounts to a trivial fraction of the total area of a chip. That means that it’s almost silly not to put a hardened, optimized, multi-core processing system of some type on just about every chip that goes out the door. 

However, there are an increasing number of applications that require standardized, optimized hardware of other types as well. Functions like H.264 and encryption can technically be accomplished in software, but they are hundreds to thousands of times faster and more power efficient when executed by optimized task-specific hardware. That means more design teams are throwing in a number of these ASSP-like blocks onto their latest-generation chips as well.

So – our typical fancy-pants SoC on a leading-edge process node is likely to have both ends of Burich’s spectrum covered – some processing blocks for highest flexibility, and many hardened functions for maximum performance and power efficiency. Many SoC designs will look strikingly similar – a few ARM cores, some sort of AMBA interconnect protocol hardware, a bunch of peripherals, some hardware accelerators and special-purpose blocks, memory, and IO.

The question becomes – why spend $40M-$100M designing a chip that’s almost exactly like the other guy’s? If there’s a chip that does exactly what you want – you should just buy it, of course. But, what if there’s a chip that does almost exactly what you want? Historically, that would be called an ASSP, and you’d plop an FPGA next to it to customize your design with your own “particulars”. As Burich points out – that’s what people have been doing for years. More recently – the FPGA companies have announced hybrid FPGA/SoC devices that may do what you need with the FPGA and the optimized, high-performance processing subsystem already built on one chip.

Interestingly, this “SoC with FPGA fabric” idea is not new. Several ASIC companies (like IBM for example) have offered FPGA fabric to their customers as a standard cell block. A few customers designed the blocks in. Over time, however, they found that their customers didn’t use the FPGA fabric, and they ended up removing it during subsequent design revisions. 

The ASIC-with-FPGA-blocks experience brings up some interesting questions: Is FPGA fabric on a complex SoC useful and practical? What will it be used for? Does the failure of FPGA blocks as standard cells spell doom for the new SoC-FPGA hybrids?

Burich says no. Making practical and efficient use of FPGA fabric is a lot more complex than just throwing some LUTs down on a chip. If your SoC has FPGA fabric on it, you need the ecosystem offered by the FPGA companies to make it sing – tools, IP, and support make all the difference between a bunch of unused LUTs soaking up silicon and coulombs and a practical, flexible fabric that can differentiate and enable your design.

While the obvious use of FPGA fabric in a complex SoC is adding that last bit of differentiation and customization that makes your design different from the pack, Burich points out that there is enormous potential in using programmable logic for compute acceleration in a flexible manner. For years, supercomputer companies have parked big blocks of FPGAs next to conventional processors and offloaded massively parallel compute operations to on-the-fly-designed hardware accelerators in FPGAs. What worked for reconfigurable supercomputers could also work on the chip scale. The challenge is programming. Most software engineers don’t have the time, patience, or skill set to pull out a performance-intensive piece of functionality and custom code a bunch of VHDL or Verilog to create an optimized FPGA-based hardware accelerator.

That’s where languages like OpenCL come in. 

OpenCL is a language designed to allow software to be targeted to GPUs, or more generally to “hetereogeneous computing platforms.” Languages like OpenCL try to overcome the problem of parallelism across multiple, heterogeneous processing elements in a standardized way. In theory, some of those processing elements could be custom processors and datapaths implemented in FPGA fabric. Altera has reportedly been working on an OpenCL solution for a while now. Such a solution would facilitate the critical software/hardware partitioning and tradeoff in complex systems that could make some truly spectacular things possible on a single chip – without breaking the bank on power budget. 

Another thing that would make some spectacular things possible on a single device is the current trend toward heterogeneous 3D architectures, Burich explains. While we’re building our ideal “do everything” device here – with processor, memory, custom blocks, FPGA fabric, and more – we start to run into a problem with monolithic ICs. That problem is that different process technologies are best for different parts of our system. Processors don’t like to be made with the same process as memory or FPGA fabric. IOs and analog blocks have different process requirements as well. If we try to make a huge SoC with all of these parts on a single, monolithic die, we have to choose a process that is probably sub-optimal for everything. However, if we take advantage of the ability to combine heterogeneous die on a single silicon interposer or with stacked die and through-silicon-vias, we gain the ability to build an SoC where each part of the system is made with the best possible process technology for that function.

By stacking all these elements together in a single package, we could dramatically increase the bandwidth and number of connections between subsystems and similarly dramatically reduce the amount of power required to move all those signals between blocks. It’s a win-win scenario. A heterogeneous 3D SoC with a high-performance processing subsystem, FPGA fabric, analog, ample memory, and optimized special-purpose blocks could do amazing things in a very small footprint – with a shockingly small power budget.

Getting to that future vision, however, requires a major rework of today’s semiconductor ecosystem – and the assumptions that go with it. With all these dies being integrated into a single 3D package, standards will need to be established so dies from various suppliers will play nicely together. Then, the big question will be “who is the integrator?” because the integrator will hold enormous economic advantages in taking the final device to market.

Burich’s vision seems on track with trends we are already seeing in both the market and the technology. Of course, Burich sees FPGAs and the companies who make them playing major roles as developers, integrators, and marketers of these future devices. That’s a natural assumption for the visionary CTO-types at any company – to paint themselves and their own industry segment into the picture. If Burich is right about the need for programmable hardware – both for system customization and for hardware/software co-design and compute acceleration – in tomorrow’s SoCs, then the future should look a lot like what he sketched.

We probably won’t have to wait long to find out.

3 thoughts on “Towards Silicon Convergence”

  1. There seems to be widespread agreement that we are headed into a period of silicon convergence. But, who will produce that converged silicon? Will it contain programmable logic fabric? What do you think?

Leave a Reply

featured blogs
Jul 13, 2020
As I write this in early July, we are looking at the calendar of events and trade shows for this year, and there are few survivors.  The Coronavirus pandemic of 2020 has seen almost all public events cancelled, from the Olympics to the Eurovision Song Contest.  Less...
Jul 10, 2020
[From the last episode: We looked at the convolution that defines the CNNs that are so popular for machine vision applications.] This week we'€™re going to do some more math, although, in this case, it won'€™t be as obscure and bizarre as convolution '€“ and yet we will...
Jul 10, 2020
I need a problem that lends itself to being solved using a genetic algorithm; also, one whose evolving results can be displayed on my 12 x 12 ping pong ball array....

Featured Video

Product Update: DesignWare® TCAM IP -- Synopsys

Sponsored by Synopsys

Join Rahul Thukral in this discussion on TCAMs, including performance and power considerations. Synopsys TCAMs are used in networking and automotive applications as they are low-risk, production-proven, and meet automotive requirements.

Click here for more information about DesignWare Foundation IP: Embedded Memories, Logic Libraries & GPIO

Featured Paper

Improving Performance in High-Voltage Systems With Zero-Drift Hall-Effect Current Sensing

Sponsored by Texas Instruments

Learn how major industry trends are driving demands for isolated current sensing, and how new zero-drift Hall-effect current sensors can improve isolation and measurement drift while simplifying the design process.

Click here for more information

Featured Chalk Talk

Automotive MOSFET for the Transportation Market

Sponsored by Mouser Electronics and Infineon

MOSFETS are critical in automotive applications, where long-term reliability is paramount. But, do we really understand the failure rates and mechanisms in the devices we design in? In this episode of Chalk Talk, Amelia Dalton sits down with Jeff Darrow of Infineon to discuss the role of MOSFETS in transportation, solder inspection, qualification.

Click here for more information about Infineon Technologies OptiMOS™ 5 Power MOSFETs