Sometimes you just need to go fast.
In the arena of processor IP for system-on-chip embedded designs, there is always an artful dance done by designers trying to match the perfect processor to their problem. System engineers seek to find the precise balance of performance, price, and power for their particular application. Sometimes, a tiny, efficient core is called for when simple control is needed. Other times, integer computations are key, and floating point performance is sacrificed to save silicon area.
Sometimes, however, compromise is not the order of the day. For some applications, you need all the processor core moxie you can muster. For those special “don’t spare the megahertz” problems, MIPS has just introduced their fastest processor core ever – the new 74K, a synthesizable high-performance 32-bit general-purpose processor core capable of over 1GHz operation. The new 74K combines high-frequency operation with an asymmetric dual-issue architecture, DSP-specific instructions, and a number of optional and tunable features to provide a versatile processor for high-performance embedded SoC designs.
MIPS groups their processor IP into three tiers. The lowest level consists of “Entry-level cores,” including the M4K, a space-sparing 30K gate low-footprint processor, the 4KE which brings 233MHz to 130nm process geometry, and the 4KSd which adds security features to the 4K family. The next tier up – the “Mid-performance cores” — include the 24K, a 600+ MHz core (when implemented in 90nm technology), the 24Ke – which adds DSP extensions for high-performance data-crunching operations, and the 34K (which we wrote about last year) that adds multi-threading to the DSP-enhanced 24KE. Now, the new 74K starts a third “High-performance” category that includes the 74K and has room for potential enhanced versions of that core – perhaps to include multi-threading extensions as well.
MIPS says the new core was created for applications like HD DVD, H.264, HD Audio, 802.11n, PON, and WiMax. Their goal was to produce a chip that could be easily implemented using standard synthesis flows and regular, free physical IP. By making a core that could reach lofty performance targets using standardized SoC tools, the company says they’re able to reduce time to market and development cost for their customers. SoC convergence is pulling more functionality and therefore more performance demands into many devices, and high-end processor cores like 74K may be required to respond to those challenges.
For many multimedia and entertainment devices, it has been a challenge to deliver advanced audio processing unless a dedicated DSP core was included in the design. For many applications that previously required a DSP, the 74K may have enough performance so that the DSP can be dropped, saving significantly in IP licensing costs and integration complexity. Also, with the increase in the number and complexity of applications on a single device, more sophisticated operating systems are finding their way into embedded SoCs. With the arrival of more full-featured operating systems comes an even larger appetite for performance.
To reach its impressive performance potential, the 74K incorporates a whopping 17-stage pipeline with a superscalar, asymmetric dual-issue architecture. Asymmetry in a dual-issue architecture offers some advantage because each side of the pipeline branch can be optimized for specific tasks, and operations can be routed to the side that is best suited to execute them. The challenge with that optimization is that you can end up with uneven loading of the two sides and potentially under-utilize processor resources. To compensate for this effect, the 74K uses out-of-order instruction dispatch and completion with some additional logic to identify instructions that could be pre-executed without compromising dependencies. To keep this logic from getting out of hand on the complexity front, the out-of-order instruction is bounded to an 8-instruction window per pipeline.
The 74K is asymmetric because one pipeline does address generation (loads, stores, and branches) and the other pipeline is the ALU pipeline which does pretty much everything else. MIPS claims that about half of the instructions are load-store and half are ALU, so the asymmetric architecture can be fully utilized. As a result, the 74K uses less area and operates at a higher frequency than a comparably-designed symmetric dual-issue unit. In addition, the 74K uses 128-bit L1 caches to improve packet forwarding, data processing, and memory copy. The processor is also binary compatible with previous MIPS cores.
The 74K has a number of optional or configurable units that can be adjusted or excluded, depending on your requirements. These include an FPU, an MMU, I- and D-cache, Scratchpad RAM, and CorExtend. The ability to include and size these elements by application makes the processor core tunable in performance, area, and power consumption over a broad range.
Because the 74K is intended to subsume DSP responsibilities in many applications, it comes with a beefed-up set of DSP capabilities. The asymmetric dual-issue architecture works well for DSP because DSP loops typically combine load/store with ALU operations. MIPS claims that this architecture gives a 26% speedup for inner loops compared with a single-issue design. Combined with the 30% frequency advantage that 74K enjoys over 24KE, the company estimates that overall DSP improvement should be around 64% net. The processor also supports DSP ASE Revision 2 – which is a superset of the DSP ASE Rev 1 that was included in the 24KE and 34K cores. Revision 2 includes 27 new instructions used in video- and image-processing algorithms, and these special instructions provide additional performance improvements.
The 74K family comes in two flavors – the 764Kc base integer core, and the 74Kf integer core with floating point. Both cores include the CorExtend capability for adding user-defined instructions that could be implemented in hardware accelerators. The FPU has two pipelines that support the asymmetric dual-issue architecture – a to/from pipe and an arithmetic pipe. The processor uses three 256-entry branch history tables and an 8-entry return prediction stack for branch prediction. 74K also has clock-gating features at several levels of granularity for power optimization in power-sensitive designs.
When implemented in 65nm technology such as TSMC’s 65nm GP with TSMC standard cells and Low Vt, the core compiles to a total area of 1.7 mm2 when compiled for speed and 1.3mm2 when compiled for area. The two versions perform at 1.04 GHz and 830 MHz respectively. The high-speed version has a dynamic power of 0.76mw/MHz, and the area-optimized version burns a slightly lower 0.63 mW/MHz. The architectural performance is estimated at 1.8 DMIPS/MHz.
With many of the new embedded applications drinking up all the processing power they can get, MIPS is sure to find a market for this new super-fast core. The combination of size and performance of the core with the recently-available benefits of 65nm ASIC technology create a dramatic overall improvement in the processing per cost, per power, and per silicon area that can be dropped into your system-on-chip design.