Are You Designing with Too Many Significant Figures?

Achieving timing closure in today’s increasingly large and complex digital integrated circuit designs – irrespective of whether they are realized using FPGA, Structured ASIC, or even Standard Cell ASIC fabric – is becoming evermore problematic with the latest design targets running at aggressive clock speeds.

The majority of today’s designers typically code in RTL using Verilog or VHDL. Also, there is some use of C/C++/SystemC coupled with behavioral synthesis technology in certain application areas. Unfortunately, both of these approaches have specific, but different, disadvantages associated with them. As we shall see, there is a new solution that uses an approach whose capabilities bridge these two domains.

Problems with Traditional RTL

When coding RTL in Verilog or VHDL, design engineers commence their portion of the development process by defining the micro-architecture of the design, including detailed control structures, bus structures, and primary data path elements. This micro-architecture also covers which operations are to be performed in parallel and which are to be executed sequentially, which portions of the design will be pipelined and the number of pipeline stages, and which resources (such as adders and multipliers) are to be shared between multiple operations.

Thus, design engineers using this approach have complete control over every aspect of the design. Theoretically, this means that the resulting implementations will be optimal in terms of using the smallest amount of silicon real-estate (or logic resources in the case of an FPGA implementation), providing the highest performance, consuming the smallest amount of power, and so forth.

Unfortunately, modifying (and re-verifying) the RTL in order to perform a series of “what if” evaluations on alternative micro-architectures is difficult, time-consuming, and prone to error. Simply increasing the width of a bus, adding an extra pipeline stage, or changing the priority of a set of control processes, for example, can take an inordinate amount of time and effort. In the real world, this means that the number of evaluations the design team can perform are limited; in turn, this often results in a less-than-optimal implementation.

Ultimately, in the case of an RTL-based design, the micro-architecture you start out with is very often the one you end up with, but different micro-architectures can return very different results. This past year, an experiment performed by a graduate-level hardware design course at MIT provided a dramatic illustration of the impact of alternative micro-architectures. For the purposes of this experiment, fifteen students were provided with the identical highly-constrained specification for a two-stage processor. All of these students used Verilog and identical tool suites, including Design Compiler from Synopsys. As shown in Figure 1, the silicon real-estate associated with their different micro-architectures varied by up to a factor of two, while the timing results varied by up to a factor of three. Although synthesis tools do have an impact on area and timing, the material differences hinge on the micro-architecture implementation decisions.

Figure 1. Silicon area and performance values
associated with alternative micro-architectures

C/C++/SystemC

To date, the predominant high-level alternative to RTL-based design has been to use sequential programming-based C/C++/SystemC representations in conjunction with some form of behavioral synthesis. These approaches can raise the level of abstraction of the design – but, for synthesizable designs, these approaches are only effective for digital signal processing (DSP)-type algorithms that are characterized by having tightly nested loops coupled with simple array based indices. By comparison, such approaches do not do well with complex control structures or algorithms that involve sophisticated mixtures of data processing and control. (It is also possible to code RTL-type descriptions in SystemC; but this doesn’t provide any level of abstraction over Verilog or VHDL.)

Another common concern with this form of high-level synthesis is a perceived loss of control with regard to the process of timing closure. When compiling a “behavioral” C-like description into hardware, the tool is responsible for the hardware architecture and micro-architecture and the designer loses transparency to the implementation. This means that it is difficult for the designer to imagine what should be changed in the source to effect a particular desired improvement in the hardware – and, is dependent on the tool for making these changes.

The Solution: Use a High-Level of Abstraction that Retains Designer Control

The solution is to use an approach that raises the level of abstraction significantly above that of traditional Verilog/VHDL-based RTL; thereby providing the designer with the ability to quickly and easily make micro-architecture changes to perform different “what-if” evaluations while still retaining full control.

An emerging approach that leverages this approach is Bluespec SystemVerilog (BSV), which augments standard SystemVerilog with rules and rules-based interfaces that support complex concurrency and control across multiple shared resources and across modules. BSV also features high-level abstract types; powerful parameterization, static checking, and static elaboration; and advanced clock specification and management facilities.

One of the key advantages of BSV is that the semantic model of the source (guarded atomic state transitions) maps very naturally into the semantic model of clocked synchronous hardware. This transparency allows the designer to make controlled changes to the source with predictable effects on timing. Furthermore, due to the extensive static checking in BSV, these changes can be more dramatic than the localized “tweaking” techniques favored when working with standard RTL, thereby allowing the designer to achieve timing goals sooner without compromising correctness.

A Simple Example

In order to provide a straightforward comparison between the efficiencies in coding and modifying standard RTL (Verilog in this case), SystemC, and BSV, consider a simple example featuring a mixture of concurrency and shared resources as illustrated in Figure 2.

Figure 2. A simple system with concurrency and shared resources.

Process 0 increments register X under some condition cond0; process 1 transfers a unit from register X to register Y under some condition cond1; and process 2 decrements register Y under some condition cond2. For the purposes of this example, we will assume that each register can be updated by only one process on each clock, and that the priority of the processes is defined as 2 > 1 > 0.

First let’s consider an operation-centric Verilog implementation as follows:

The advantage of this above operation-centric representation is that it is conceptually close to the specification; the disadvantage is that, although it is simulatable, it is not recommended for synthesis. The preferable alternative would be to create a state-centric representation that is further from the specification, but that is both simulatable and synthesizable as follows:

Now consider an equivalent representation in SystemC as follows (note the similarity to the state-centric Verilog representation):

In both the Verilog and SystemC representations, the main complexity derives from capturing the relative priority of the processes. By comparison, consider the equivalent BSV representation as follows:

In this case, the three processes are defined as simple rules, while the process priority is captured using the decending_urgency statement. The functional correctness of the BSV code follows directly from the rule semantics, the operation-centric BSV closely follows the original specification, and the scheduling logic will be automatically synthesized by the BSV compiler.

Modifying the Process Priorities

The result from the synthesized BSV representation shown in the previous example will be the same hardware as would have resulted from the state-centric Verilog RTL and SystemC representations. The difference is that the rule-based specification of BSV was much easier to capture and it is much easier to modify. For example, let’s assume that the specification changes, and the new process priority is defined as 1 > 2 > 0. First consider the changes to the Verilog (differences to the original design are highlighted in bold):

In this case, deciding on the changes to be made takes quite some thought, and actually convincing yourself that you got it right and that everything works as required will take some verification effort. By comparison, consider the equivalent change made to the BSV code as shown below:

Modifying the Process Actions

As an alternative scenario, let’s return to our original design and process priorities, but let’s assume that we now require process 2 to also increment register X in addition to it decrementing register Y. First let’s consider the changes to the Verilog code as follows ( once again, differences to the original design are highlighted in bold):

As before, deciding on the changes to be made takes some thought, and actually convincing yourself that you got it right and that everything works as required will take some verification effort. By comparison, consider the equivalent change made to the BSV code as shown below:

Adding a New Process

As one final example, we will again return to our original design and process priorities, but this time we are going to add a new process called proc3 that will decrement 2 from register Y and increment 2 to register X under some condition cond3. In this case, the process priorities are defined as 2 > 3 > 1 > 0. Initially, we’ll consider the changes to the Verilog code as follows ( as usual, differences to the original design are highlighted in bold):

Contrast the complexity of the above to the simplicity of the modifications required to capture the same functionality in the BSV representation as shown below:

Conclusion

Although the examples shown above were very simple, it is clear that using rule-based design makes implementing micro-architecture changes fast and easy. Furthermore, the scheduling logic, with this approach, is automatically synthesized by the BSV compiler while the functional correctness of the BSV code follows directly from the rule semantics. It’s also important to note that the examples shown here reflect only a few simple cases; using BSV also facilitates designers perform a variety of tasks that would be extremely complicated, time-consuming, and error prone if using standard RTL; these tasks include:

• Adding a pipeline stage to an existing pipeline
• Adding a pipeline stage where pipelining was not anticipated
• Spreading a calculation over more clocks (longer iteration)
• Moving logic across a register stage (rebalancing)
• Restructuring combinational clouds for shallower logic
• Incorporating hand-optimized logic

Bluespec SystemVerilog (BSV) is unique amongst high-level synthesis approaches in that, while significantly raising the level of abstraction and providing correctness-by-construction, it retains the traditional hardware model of cooperating FSMs and it also retains a high degree of transparency and predictability between the source and the generated RTL. The end result is that designers using BSV can continue to use tried-and-true techniques for improving timing, with the additional advantages that the high level of the language makes change much easier and the extensive static checking ensures that these changes do not destroy the design’s “correctness.” Furthermore, the combination of BSV’s high level of abstraction and correctness-preservation properties enable more dramatic attacks on the timing problem than “local hill-climbing”.

by George Harper, Bluespec, Inc.

March 21, 2006

About the Author

George Harper is VP of Marketing at Bluespec, Inc. (www.bluespec.com), developers of the only ESL synthesis toolset for control logic and complex datapaths. With his first job in the semiconductor field implementing MIPS processors at LSI Logic, George has spent the bulk of his career at chip companies. He has a BSEE/MSEE from Stanford University and an MBA from Harvard University. George can be reached at gharper@bluespec.com.

Are You Designing with Too Many Significant Figures?

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk