feature article
Subscribe Now

MIPS Goes Multithreaded

Boosting Performance the New-Fashioned Way

Although most designers don’t often consider it, there are different formulas for best overall system performance from embedded and stand-alone processors, too. Even though there’s no governing league making and changing the racing regulations, parameters like total system cost, power consumption, memory bandwidth, silicon area, and process profile rule the day when choosing a processor for your system design. The tradeoffs that make the best mix of performance on standalone processors can be completely different than those that give the best results in an embedded processor core.

When MIPS designed their new 34K core, which was announced this week, they clearly knew they were working under the usually unspoken embedded core racing formula. In an embedded core, cranking up the clock frequency runs up system cost and power consumption for your entire device, not just the processor portion. Heavily pipelined, superscalar, and very-long-instruction-word (VLIW) architectures directly consume more logic and are difficult to optimize, particularly in embedded applications. Multi-core methods are already implemented by default, simply because you’re dealing with embedded processors, but the licensing fees for multiple cores can run up the system cost tab again. Many of these solutions that work well are not well suited for the embedded core environment.

MIPS opted for a multi-threaded approach with the 34K, for seemingly sound reasons. Typically, an embedded core has a lot of down time waiting around for other parts of your embedded system to finish their jobs. Also typically, there is more than one embedded process needing attention at any given time. Of course, you can use an OS to handle process scheduling for you in software, but multithreading at the processor level can make much more efficient use of logic resources, leading to higher system throughput with lower cost and power consumption.

MIPS’s new 34K is based on their 24KE architecture. The 34K has a nine-stage pipeline “coupled with a small amount of hardware to handle the virtual processors, the thread contexts, and the quality of service (QoS) prioritization,” according to MIPS. Increasing system performance with multi-threading is all about optimizing the resource utilization in the execution pipeline. When one thread is stalled waiting for memory, hanging out and killing time, maybe listening to some tunes on its little sub-micron “iPod femto,” another thread can charge ahead, keeping the hardware busy. If the scheduling process is efficient, and if the processor can swap contexts with little or no overhead (this is key), significant performance gains can be made over a single-thread processor of the same architecture.

The 34K can be configured with up to five sets of thread context (TC) hardware. The TC hardware has a separate instruction buffer for each thread, with pre-fetching, a set of registers, and a program counter. This allows the 34K to switch between threads on a clock-by-clock basis, virtually eliminating context-swap overhead. Each TC shares some resources with other TCs within a larger structure called a “Virtual Processing Element” (VPE). A 34K core can be configured with up to two VPEs.

VPEs can be assigned to completely different environments, even running different operating systems, as the CP0 register used by OS kernels is shared between the TCs in a single VPE. This allows us to partition our application into sections that are completely disparate, but that use the same processor core. One might be a DSP- or quality of service (QoS)-critical application and another might be a user-time application running on a complex OS like Linux. The ability to share such disparate tasks on a single processor core saves system cost, power, and die area when compared with a multi-core solution.

The VPE concept could even allow two different embedded operating systems to be used in a single system and on a single 34K processor. One OS might handle user interface issues while the other managed hard-real-time processes such as digital signal processing. Each VPE would be managed differently, and each OS would work as if it were using its own dedicated processor. Based on your particular configuration options, you can configure a core to provide almost exactly the resources you need.

Speaking of QoS, the 34K comes with a QoS engine that interleaves instructions from multiple threads for maximum throughput. If some threads have specific QoS requirements, it can allocate specific dedicated processor time to those threads, assuring that loading on non-QoS-critical tasks doesn’t interfere. This ensures that QoS requirements are met, while maintaining maximum throughput overall. The VPE also assures that you don’t subject all of your threads to QoS restrictions just because some threads require them.

MIPS claims that the 34K shows an application speedup of 60% over their previous-generation 24KE core, with a 14% increase in die size. If you’re looking to improve your application performance in your system without adding another core, or if you want to consolidate a design that currently has multiple cores such as a DSP and a general-purpose processor, the 34K might be a very attractive option. The improved hardware utilization created by its multi-threaded architecture will save total system cost, improve power consumption, and reduce die area compared with just about any alternative solution. Who could argue with that?

Until someone comes along and changes the formula for embedded processors, MIPS seems to be onto something. Remember, the embedded processor race goes not to the swift, but to those who utilize the logic resources the most efficiently. As long as the formula calls for minimum system cost, lowest power consumption, maximum overall performance, and fastest time-to market, MIPS’s new 34K core is probably a safe bet.

Leave a Reply

featured blogs
Apr 24, 2024
Diversity, equity, and inclusion (DEI) are not just words but values that are exemplified through our culture at Cadence. In the DEI@Cadence blog series, you'll find a community where employees share their perspectives and experiences. By providing a glimpse of their personal...
Apr 23, 2024
We explore Aerospace and Government (A&G) chip design and explain how Silicon Lifecycle Management (SLM) ensures semiconductor reliability for A&G applications.The post SLM Solutions for Mission-Critical Aerospace and Government Chip Designs appeared first on Chip ...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Addressing the Challenges of Low-Latency, High-Performance Wi-Fi
In this episode of Chalk Talk, Amelia Dalton, Andrew Hart from Infineon, and Andy Ross from Laird Connectivity examine the benefits of Wi-Fi 6 and 6E, why IIoT designs are perfectly suited for Wi-Fi 6 and 6E, and how Wi-Fi 6 and 6E will bring Wi-Fi connectivity to a broad range of new applications.
Nov 17, 2023
20,865 views