Shaking Up Embedded Processing

There is no question that programmability is the key enabling feature of just about every electronic product today. If you’re a regular reader of this publication, and you’re designing systems that don’t contain programmable elements, what the heck are you doing.

We are all very familiar with software programmability. From communications infrastructure to digital cameras, we are constantly hearing that software development is the critical path in the schedule, the biggest engineering effort, and the key element of product differentiation. However, the pace of development today is making hardware programmability equally important. Semiconductor companies have enormous catalogs of processor devices – differing mainly in the collection of peripherals surrounding standard (usually ARM architecture) processors.

Often, it is this collection of peripherals that is the complicated constraint in our hardware design. Despite a catalog of bazillions of processors – none of them seems to have exactly the things we need. There is either too much or too little, and the super-special extra thing we need is not available at all. For most of us, the go-to solution has been to park an FPGA next to our processor chip. We use the FPGA to make up for what is missing or not right with our processing subsystem.

There are limitations to the FPGA-plus-processor approach, however. It takes more board space to put the two chips (plus the associated support hardware for each) on our board. It takes more power to run both devices. The number of connections between the two, the bandwidth of those connections, and the power consumed by those connections is constrained by the IO capabilities of each device. What we really want is a device that combines the two – processor plus FPGA. Then, we save board space, we can take advantage of far more available connections between FPGA fabric and processing subsystem, we save power, and we probably save on total system BOM – depending on how much these little gems cost.

Altera apparently agrees with this line of thinking, because they are announcing their new processor-with-FPGA (or FPGA-with-processor) family this week. The new devices – dubbed “SoC FPGAs” – are built on the company’s wildly successful Cyclone and Arria FPGA platforms. The company will make special versions of both the Cyclone V and Arria V 28-nm FPGA that contain a hard processor system (HPS). This hard processor system is made up of a dual-core ARM Cortex A9 MPCore processor, the most common peripherals, and a multi-port memory controller. The FPGA fabric will be the same as we would expect from the corresponding Cyclone V or Arria V FPGAs, and the on-chip connection between the two will facilitate far more robust communication paths than we could achieve with separate discrete devices.

It’s hard to imagine why any system designer would not want a device with this combination of capabilities.

This isn’t the first device we’ve seen that combines these ideas, of course. Xilinx (Altera’s arch rival in the FPGA business) announced their version of the processor-plus-FPGA device (called Zynq) a few months ago. But Zynq, also, was far from the first. About ten years ago, Altera actually pioneered the “System on Programmable Chip” idea with a family called “Excalibur.” (I know, sounds tantalizingly presumptuous, doesn’t it?) Like the Newton, the Osborne, the Lisa, and so many other before-their-time devices, Excalibur was a monumental flop. The FPGA technology of the time wasn’t really ready to be parked on the same die as a state-of-the-art-at-the-time RISC core. The processor was optimized for low cost and low power. The FPGA fabric was fast, power hungry, and expensive. The nascent design tools and ecosystem for SoC-like design with FPGAs were weak and cobbled together.

That was a decade ago – when people thought Lou Bega’s “Mambo No 5” was cool. We’re past that now. Move on along.

Altera’s new FPGA SoC line is absolutely NOT Excalibur – in the same way that an iPhone is NOT a Newton and a modern laptop is NOT an Osborne. These new FPGA SoCs are well-designed, proven packages of technology that include a decade worth of evolution in processor-fabric integration, hardware/software design tools, and embedded subsystem development – plus five full bus stops on the Moore’s Law express line.

The dual-core ARM Cortex-A9 MPCore processor runs at up to 800MHz and is capable of both symmetric and asymmetric multiprocessing (SMP and AMP). Each core includes 32K of L1 instruction cache and L1 data cache, the NEON media processing engine, a single- or double-precision floating point unit, interval and watchdog timers, and 512KB of shared L2 cache – plus a host of other speed-boosting goodies. The memory interface supports DDR2/3, LPDDR ½, ECC, NAND flash with DMA and optional ECC, QSPI, and SD/SDIO/MMC with DMA.

Interface peripherals include two 10/100/1000 Ethernet MACs, two USB 2.0 OTG controllers, four I2C controllers, two CAN networks, SPI master and slave, UART, and up to 71 GPIOs and 15 input-only IOs. There are also four general-purpose timers, two watchdog timers, an 8-channel DMA controller, an FPGA manager (more on this in a bit), clock and reset managers, 64KB on-chip RAM, and a boot ROM.

All taken together, that is a formidable ARM-based embedded computing platform, competitive with the most capable stand-alone embedded processor devices. While this is awesome, it’s not what we’re here to see.

The real power of Altera’s SoC FPGAs, of course, is the FPGA part. The FPGA fabric is either Cyclone V or Arria V, with all the capabilities, bells, and whistles that come along with this remarkable generation of 28nm devices. So – the key question is – if you bought a high-end dual-core ARM-based processor chip, and put a Cyclone V or Arria V on your board right next to it, what would you be missing that this device gives you?

There are several obvious things. First, you have one chip instead of two – with the associated savings in board space and board design complexity. Your BOM and manufacturing processes would be simplified. Your inventory and procurement would be easier. Your power consumption would be lower. These are all easy to guess without even fuzzing your eyes.

However, some not-so-obvious advantages come from the way the FPGA part is connected to the processor part (HPS). Between the two sides, Altera claims over 100-Gbps peak bandwidth with integrated data coherency between the processor cores and the FPGA. Try pumping THAT through the IOs connecting your two chips together. Because there are so many more connections available between the FPGA fabric and the processor, and because those connections aren’t eating up all of your external IO pins (as they would be with two discrete devices), you’ll have lots more IO left over for doing the things you really want to be doing with IO. Furthermore, doing the things you’d really like to do with FPGA fabric becomes much easier – like building hardware accelerators for things like DSP datapaths, or bringing in data from a variety of evolving-standard interfaces into your processing environment.

Coming from the processor side, there is processor and DMA access to FPGA-based peripherals, and a configurable 32-, 64- or 128-bit AXI (Advanced Microcontroller Bus [AMBA] Advanced Extensible Interface) – (yep, that’s a hierarchically-nested acronym.) The FPGA side can access processor-subsystem peripherals, the same AXI bus, the processor subsystem’s SDRAM controller, DRAM and shared memory. There can be up to 6 masters – 4×64 read and 4×64 write data.

The FPGA and processor parts can be powered up and operated independently. That may be a short sentence, but it’s a big deal on capability. It means that the processor system can operate things like FPGA configuration – which opens up a world of possibilities for in-system upgrades and partial reconfiguration – stretching the capabilities of the progammable fabric even farther.

A Tale of Two Marketings

Astute observers may not be able to spot the important technical and performance differences between Altera SoC FPGAs and Xilinx Zynq. We think we’re pretty astute observers, and we’re not really sure yet either. We obviously have not yet seen, touched, or tried either device. What we all have right now is access to datasheets and white papers. It’s not really productive to try comparing MMIPS (Marketing MIPS), MMFLOPS (Marketing Megaflops), Mgates, MGbps, or any of the other PowerPoint-based performance metrics at this stage of the game. When it comes time for developers to choose one of these two platforms, we’re betting it won’t be because of technical minutiae anyway.

We think it will be for one of two reasons, depending on which group of customers you belong to. If you’re already an FPGA designer, and a fan or Altera or Xilinx, you’ll most likely keep right on using the brand you’re accustomed to. In our surveys, the #1 reason designers choose one of these vendors over the other is consistently “previous design success with this vendor.” That means that once you’ve learned one vendor’s tools – and how to avoid that vendor’s usual “gotchas” – you’re comfortable and hesitant to go through that whole process again with a different vendor for the sake of a few Marketing Microwatts Per Megahertz (MMPHz). In reality, though, it isn’t you that these devices were made for. You are allowed to use them, of course, but the real target audience for processor-plus-FPGA devices is the legions of embedded designers that are NOT fluent in FPGA design.

For those embedded designers new to FPGA design – Altera and Xilinx have taken notably different marketing tacks. Altera bills these devices as SoC FPGAs. That’s a reasonable name. Most designers know what an SoC is, and what an FPGA is. When they hear that name, they can probably guess approximately what the device will do. This is consistent with Altera’s nomenclature since the Excalibur days – “System on Programmable Chip”.

Xilinx’s approach has been to de-FPGA the marketing. This also makes sense. For designers who are used to working with stand-alone embedded processors, FPGA design can be a bit scary. They’ve heard tales of LUTs and HDLs and other frightening beasts lurking in the murky technical wilderness of FPGAs. Xilinx has therefore dubbed their entries “Extensible Processing Platforms.” See? No F-word in there anywhere.

As much as we wish this were not true, the victor between in the battle between Xilinx and Altera in this space is much more likely to be determined by marketing than by technology. Both companies are producing very capable devices – that have clear and substantial differentiation compared with traditional stand-alone discrete embedded processors. For the expansive market of embedded designers, the company with the best marketing approach may win – and we don’t have the crystal ball to predict who that will be. Do you?

Altera’s introduction certainly heats up the challenge in the embedded processing world, and offers a set of compelling benefits for embedded designers. We can’t wait to see what you all build with SoC FPGAs.

7 thoughts on “Shaking Up Embedded Processing”

kevin says:

October 11, 2011 at 1:24 pm

What would you design with an Altera SoC FPGA?

Log in to Reply
kevin says:

October 11, 2011 at 6:21 pm

It’s been pointed out that I omitted a few important things from the article – particularly discussing the advantages of a single-chip solution like this over a 2-chip (processor plus FPGA) solution:

– Monolithic integration of the FPGA and the HPS with a single, multi-port DRAM controller. Having a single controller provides numerous advantages and simplifies the FPGA parts of the design significantly – particularly where both subsystems are sharing contents of memory.

– Closely related to that – If you want to create accelerators in the FPGA fabric, the ARM Accelerator Coherency Port (ACP) manages cache coherency for external (to the processor) accelerators. This means your FPGA-based accelerator can use memory as if it were the only game in town – and not have to be doing constant cache flushes and other defensive design procedures to be sure that the cache isn’t quietly changing memory behind your back. This is a significant advantage in both design complexity and in system performance.

– Virtual Platform (this is a biggie). Altera worked with Synopsys to develop and deploy a full-blown virtual platform supporting these SoC FPGAs. For software development, this offers a huge productivity boost – particularly for software debug. Compared with debugging on the actual hardware – the Virtual Platform gives significant improvements in debug capability – 100% visibility, better stepping and breakpoint control, better retention of history… If you’ve used virtual platforms you understand all these advantages. The cool part here is that a virtual platform for this SoC has already been built for you. That’s an enormous time and energy savings.

Log in to Reply
cplante says:

October 11, 2011 at 6:30 pm

This is music to my ears: “…the victor…is much more likely to be determined by marketing than by technology.” 🙂

But I’m afraid technology is what will make or break this new category, whatever it ought to be called: cSOC, PSoC, SoC FPGA, PSC, EPP, etc.

And by this, I mean the ‘tools’ technology will either delight or enrage the end user trying to design with pieces of silicon like SmartFusion, Arria V SX/ST, Zynq or even PSoC.

It would be nice if Kevin or Jim (or why not Clive?) could do a piece on the merits and limitations of tools offered by Altera, Cypress, Microsemi/Actel or Xilinx…

What marketing ought to do here is create awareness outside of the typical FPGA market, with embedded designers more familiar with MCUs than FPGAs. And then will we see if the tools are akin to polished iPhones or rough Newtons…

Log in to Reply
rosinkrans says:

October 11, 2011 at 8:04 pm

Humm I find those devices extremely interesting as I’m using a configuration with a FPGA+CPU. This how ever does not come at no cost. I just went over the advanced product brief:

http://www.altera.com/literature/hb/soc-fpga/aib-01017-soc-fpga-overview.pdf

and it looks like those hard blocks take almost half the available space they have (in terms of available LEs).
They don’t mention anything about some of the I/Os being LVDS compliant. Hope they will be…

Log in to Reply
ericzepp says:

October 14, 2011 at 9:00 pm

The answer to your question is: What I couldn’t implement on ZYNQ, whatever that might be.

The Altera announcement resembles that of a freshman, once again late for finals, who obtained his term paper off the web. Please refer to an April 2010 article in RTC magazine “ASP’s: New Device Class. ” For someone who seems to imply, more accurately seems to claim, they invented this entire field, Altera’s name is oddly omitted.

Log in to Reply
kevin says:

October 15, 2011 at 4:46 pm

Ah well Eric, That’s what you get for reading RTC… 🙂

I really do believe Altera invented this device class – as mentioned in the article – way back in June 2000, with Excalibur – about 10 years before the RTC article.

Some publications seem to have a nasty habit of omitting non-advertising companies from editorial. We try very hard not to do that.

Log in to Reply
Karl Stevens says:

December 26, 2018 at 8:08 am

A full 7 years has passed and neither have real acceptance have they?
“Altera’s introduction certainly heats up the challenge in the embedded processing world, and offers a set of compelling benefits for embedded designers. We can’t wait to see what you all build with SoC FPGAs.”
Why? Because computers are not a natural fit for embedded systems, even though there is need for “programmability”(not the P as in FPGA).
And this other “biggie”:
– Virtual Platform (this is a biggie). Altera worked with Synopsys to develop and deploy a full-blown virtual platform supporting these SoC FPGAs. For software development, this offers a huge productivity boost – particularly for software debug. Compared with debugging on the actual hardware – the Virtual Platform gives significant improvements in debug capability – 100% visibility, better stepping and breakpoint control, better retention of history… If you’ve used virtual platforms you understand all these advantages. The cool part here is that a virtual platform for this SoC has already been built for you. That’s an enormous time and energy savings.”
Meanwhile a non-FPGA company has developed a real development/debug platform that focuses on defining and connecting functional blocks which is what SoC design is about.
Anything that Synopsys and Altera came up with will certainly focus on synthesis, place and route, and timing analysis which a all implementation rather than design.
System and Logic design are my hobbies and I have some experience(dating back to 1963).
The Visual Studio CSharp syntax API exposes the AST(AbstractSyntaxTree) and SyntaxWalker that makes it practical to design an FPGA that does not need a CPU at all.

Maybe I will go open source since that is the ridiculous secret to RISCV’s success.

Log in to Reply

Shaking Up Embedded Processing

Related

7 thoughts on “Shaking Up Embedded Processing”

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk