feature article
Subscribe Now

C to FPGA

Altera Accelerates Nios II

FPGAs are making big inroads in the embedded systems space as system-on-chip platforms. The recipe is solid – whip together a processor with some peripherals all connected to an on-chip bus or switch fabric. Add a little off-chip RAM and Presto – instant embedded system, ready to be changed at a moment’s notice, even after it’s in your customers’ hands. From 50,000 feet, it looks like the ideal solution for leading-edge embedded development. Of course, like any seemingly idealized solution, it has its limitations. That’s why there’s still competition.

The major limitation of FPGA-based systems-on-chip with soft-core embedded processors is processor performance. Today, there are fairly sophisticated 32-bit RISC processors available as configurable IP for FPGAs that take only a small percentage of the LUT fabric of a typical device. Altera’s NIOS II is perhaps the leading example. Using such a core and an environment like Altera’s SOPC Builder, system designers can quickly stitch together a customized system, complete with peripherals, in a few minutes with a few mouse clicks. The vanilla version of such a system, however, leaves much of the potential advantage of the FPGA untapped.

The reason embedded processor performance in FPGAs is not a big issue is that performance-critical functions can be accelerated in hardware. Besides the programmability available from the processor or microcontroller, hardware accelerators can be plumbed in, massively parallelizing critical functions. For example, many FPGAs have large numbers of hard-wired multipliers or multiply-accumulate units that can be connected into parallel datapaths to make short work of math-intensive routines such as those in many digital signal processing (DSP) algorithms. The challenge in taking advantage of this capability, however, is that design of these hardware accelerators is typically low-level digital hardware work, requiring the expertise of an engineer proficient with hardware description languages (HDLs), logic synthesis, and complex timing design. The development of the hardware acceleration portion of the design begins to become a bottleneck in the schedule, requiring considerable additional expertise.

This week, Altera announced a new solution that addresses this problem. Aimed specifically at users of their highly capable Nios II soft-core processors in conjunction with Stratix II and Cyclone II FPGAs, Altera’s new Nios II C-to-Hardware Acceleration Compiler (C2H) does exactly what its name implies. It allows C routines to be plucked from the normal Nios II software flow and compiled into high-performance hardware, boosting compute throughput by a significant factor.

While there are numerous products and projects today claiming C-to-hardware compilation, Altera’s entry stands out in several important ways. First, it is not intended to be a general-purpose algorithm-to-hardware compiler. It specifically targets Altera’s FPGA fabric, and it specifically generates its external connectivity through Altera’s Avalon interconnect fabric and I/O. Second, it runs straight from garden-variety ANSI C. It doesn’t rely on special libraries or non-standard C constructs to convert C into a virtual HDL. Finally, it automatically connects the generated hardware into the main program running on a Nios II processor.

These three distinctions have the potential to make all the difference in delivering easy-to-use, practical software acceleration instead of simply providing a new methodology for custom hardware development. Since Altera’s C2H uses generic C (with only a very few limitations such as floating point and recursion), programmers can move routines in and out of hardware quickly, experimenting to find the best mix of performance, power consumption, and hardware utilization.

One of the most challenging aspects of bringing a C-to-hardware system to life is narrowing the almost endless list of possibilities. Do you want to unroll your loops and use hundreds of multipliers for super-acceleration? Do you want a slow clock with more chaining, or a fast clock with more pipelining? Altera’s approach of targeting a specific hardware fabric nails down many of those variables. The system can make some well-placed assumptions about the hardware accelerators such as clock frequency, number and type of resources (like multipliers), and available interconnect fabric.

There are typically two possible target audiences for C-to-hardware technology – the hardware engineer looking for more productivity and the software developer looking to accelerate algorithms. Often, it is somewhat difficult to tell which audience a product is targeting. In Altera’s case, however, they have made it sparkling clear that C2H is going after the software accelerator. This is important because the two groups have distinctly different criteria for success and very different tolerances for inconveniences like manual intervention in the compilation process. A hardware engineer is generally looking for generated hardware that will perform on-par with hand-crafted RTL code. In order to achieve that goal, the hardware engineer is willing to endure a substantially more interactive compilation process, and he wants much more control over the micro-architecture of the generated hardware.

The software engineer, on the other hand, wants the generated hardware to run faster than equivalent code on a processor. He is not typically concerned with whether the hardware lives up to hand-crafted optimality. He values automation over control and would generally like the tool to behave like a software compiler – source code in, hardware accelerator out. He also doesn’t want to pay ASIC design tool prices for the capability.

Altera’s C2H is perfectly positioned for the latter audience. Using C2H, you select which C functions you want to accelerate, then right-click to accelerate them to hardware. The system takes care of the rest. Altera’s Avalon switch fabric addresses one of the critical bottlenecks typically found in such acceleration schemes, the processor-to-accelerator connection. By giving the accelerator direct access to memory, data can move through the acceleration path without being handed off by the processor. The resources used by the accelerator can then be matched to the memory latency, eliminating the case where the accelerator is over-designed and can’t be fully utilized because of I/O bandwidth.

Early customers report good results with C2H. Like any new methodology, it will probably take some time to mature and will work better for some applications than others. Regardless, it is a major step forward in the evolution of FPGAs as viable, high-performance embedded computing platforms. If tools like C2H can eliminate the engineering barriers to widespread adoption, the cost, performance, power, and flexibility characteristics of FPGAs will shine through. FPGA-based systems-on-chip will then compare very favorably with traditional discrete-processor-based embedded technology and will probably capture significant market share.

Leave a Reply

featured blogs
Sep 28, 2022
Learn how our acquisition of FishTail Design Automation unifies end-to-end timing constraints generation and verification during the chip design process. The post Synopsys Acquires FishTail Design Automation, Unifying Constraints Handling for Enhanced Chip Design Process app...
Sep 28, 2022
You might think that hearing aids are a bit of a sleepy backwater. Indeed, the only time I can remember coming across them in my job at Cadence was at a CadenceLIVE Europe presentation that I never blogged about, or if I did, it was such a passing reference that Google cannot...
Sep 22, 2022
On Monday 26 September 2022, Earth and Jupiter will be only 365 million miles apart, which is around half of their worst-case separation....

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

"Scalable Power Delivery" for High-Performance ASICs, SoCs, and xPUs

Sponsored by Infineon

Today’s AI and Networking applications are driving an exponential increase in compute power. When it comes to scaling power for these kinds of applications with next generation chipsets, we need to keep in mind package size constraints, dynamic current balancing, and output capacitance. In this episode of Chalk Talk, Mark Rodrigues from Infineon joins Amelia Dalton to discuss the system design challenges with increasing power density for next generation chipsets, the benefits that phase paralleling brings to the table, and why Infineon’s best in class transient performance with XDP architecture and Trans Inductor Voltage Regulator can help power  your next high performance ASIC, SoC or xPU design.

Click here for more information about computing and data storage from Infineon