feature article
Subscribe Now

Itanium Deathwatch Finally Over

Intel’s Itanium Receives its Official Death Warrant

“You miss 100% of the shots you don’t take.” – Wayne Gretzky

It’s not as though we didn’t see this coming, but it’s still a bittersweet moment. Intel’s Itanium, which has been on life support for years, was just given its official end-of-life (EOL) papers. Go ahead and engrave the date of January 30, 2020 on its massive silicon tombstone.

It’s easy with hindsight to poke fun at Intel and Hewlett Packard for creating Itanium in the first place. Armchair quarterbacks from every quarter (myself included) called it the “Itanic” and gleefully reported every hit below the waterline leading to its slow-motion sinking.

But you know what? I don’t want to do that. Sure, Itanium was an enormously expensive and embarrassing failure for both companies, but it was also a big, hairy, audacious endeavor. We need more of those. Good for Intel. Good for Hewlett Packard (now HPE). Bummer that your moonshot project didn’t reach its goal. But it didn’t blow up on the launchpad, either, and Intel, HPE, and we all learned something in the process. As the motivational posters say, you’re not failing so long as you’re learning.

Itanium was created for a lot of reasons, chief among them a desire to leapfrog the x86 in performance and sophistication. Even back in the era of Napster and LiveJournal, the x86 architecture was looking mighty tired. Intel and HPE both needed something better to replace it.

Fortunately, there was no shortage of better ideas. RISC, VLIW, massive pipelining, speculative execution, out-of-order dispatch, compiler optimization, big register sets, data preloading, shadow registers, commit buffers, multilevel caches, and all the other tricks of the CPU trade were surfacing around the same time. Pick any three and create your own CPU! It’ll be faster, cooler, and more academically stimulating than anything else out there. In the 1990s, it was hard not to design a new CPU architecture.

One underlying philosophy behind Itanium (and many other CPU children of the ’90s) was that software is smarter than hardware. Seems simple enough. Have you ever seen inside the branch-prediction logic of a modern CPU? We throw hundreds, then thousands, then millions of transistors at the task of flipping a coin. Will this branch be taken or not taken? Circuitry has just a few nanoseconds to decide.

How much simpler it would be to shift that task to the software. Compared to complex hardware in the critical path, a compiler has an infinite amount of time to deliberate. Compilers can see the whole program at once, not just a tiny runtime window. Compilers can take hints provided by the programmer. Compilers and analysis tools can model program flow and locate bottlenecks. And best of all, compilers are easier than hardware to change, improve, and update.

The same theology applies to parallelism. Hardware struggles to eke out a bit of parallelism where it can. But software can see the whole picture. Software can schedule loads, stores, arithmetic operations, branches, and the whole gamut of instructions for optimal performance. Software can sidestep bottlenecks before they even happen. It’s ludicrous to force runtime hardware to thread that needle when the compiler can do so at its leisure.

Yup, that settles it: we’re doing our optimization in software from now on. Pull that stuff out of the hardware’s critical path and fire up the compiler tools. Let me know when that new optimizing compiler is ready.

We’re still waiting.

And therein lies the problem. The compilers for Itanium never got good enough to deliver the leap in performance that we all just knew was there for the taking. C’mon, where’s my factor-of-ten performance jump?

It’s not coming from the compilers, that’s for sure, nor is it lurking in VLIW, EPIC, or superscalar hardware tricks. Yes, compilers have all the time in the world (compared to runtime hardware) to tease out hazards like data dependencies, load/use penalties, branch probabilities, and other details. And yes, the compiler can see more of the program than hardware can. The compiler does know more than the hardware, but it doesn’t know much more.

Some things are unknowable, and compilers aren’t omniscient. Even with the entire program to analyze, most branch prediction comes down to an educated coin toss. Even with all the source code as its disposal, finding parallelism is tricky beyond a small window of instructions, and the hardware can already do that.

The plan to throw the tough problems at the compiler guys was doomed from the start. Itanium’s compiler writers weren’t slacking off or underperforming. They didn’t need just a little more time, or just one more release. They were saddled with impossibly high expectations.  

Turns out, that convoluted branch-prediction hardware was already doing about as good as job as it’s possible to do. Sure, you can shift that task from hardware to software, but that’s just an implementation tradeoff. There’s no big gain to be had.

Same goes for wide and deep register sets, or bigger caches, or wide instruction words. Itanium bundled instructions and executed them in parallel where possible, through a combination of compiler directives and runtime hardware. Surely the software-directed parallelism will yield big results? Nope. Itanium’s compilers can format beautifully dense and efficient instruction blocks – but only if the program lends itself to such solutions. Apparently, few real-world programs do.

Data dependencies and load/use penalties are just as hard to predict in software as they are in hardware. Will the next instruction use the data from that previous one? Dunno; depends on the value, which isn’t known until runtime. Can the CPU “hoist” the load from memory to save time? Dunno; it depends on where the data is stored. Some things aren’t knowable until runtime, where hardware knows more than even the smartest compiler.

Itanium is like a rocket-powered Hot Wheels car running on its orange plastic track. It had (sorry, still has) awesome power, vast resources, and elaborate control systems. It’s just criminally hampered by its environment. It can do cool loops if it gets a running start but is otherwise stuck on its narrow track.

To borrow a baseball analogy, if you never swing the bat, you’ll never hit the ball. Sometimes you strike out. There’s no shame in that. If you don’t, you’re not trying hard enough, and Intel and HPE were certainly trying hard with Itanium. So long, Itanium, and good luck to its creators.

4 thoughts on “Itanium Deathwatch Finally Over”

  1. Still have over a dozen of these dual core quad processor, huge cache, computational servers in our cluster that are a decade+ old. It was worth updating all them to dual core a few years back, just because for some problems they beat other computational servers in our cluster hands down. Some are Intel built, most are Dell built using the same reference design.

    They are hot though, and do spin the power meter.

  2. I remember having to deal with PA-RISC, I didn’t like it. Someone said it only survived because the big cache gave it enough performance. I think those folks moved on to Itanium and it suffered a similar problems – the base architecture was flawed and irrecoverable (according to a compiler writer friend).

    Hardware guys can do stuff that is pretty cool and should work, but software engineers writing compilers have a different mindset, and if the two aren’t in sync you are out of luck.

    SPARC & Solaris, were infinitely preferable to PA-RISC & HPUX, and I never got close to Itanium in any form – and God knows people like to torture me with stupid processors…

    I think the main takeaway is that Intel will keep drinking their Kool-Aid well past it’s sell-buy date, and refuse to acknowledge things are failing. Hard to sell them new solutions if they don’t admit they’re in a hole…

    1. Agreed. Although Itanium lasted ~20 years, I think the outcome was clear after the first ten. The rest was a long, slow, glide path. It’s tough to decide whether to “fish or cut bait” when there’s that much money on the line.

Leave a Reply

featured blogs
Oct 21, 2020
You've traveled back in time 65 million years with no way to return. What evidence can you leave to ensure future humans will know of your existence?...
Oct 21, 2020
We'€™re concluding the Online Training Deep Dive blog series, which has been taking the top 15 Online Training courses among students and professors and breaking them down into their different... [[ Click on the title to access the full blog on the Cadence Community site. ...
Oct 20, 2020
In 2020, mobile traffic has skyrocketed everywhere as our planet battles a pandemic. Samtec.com saw nearly double the mobile traffic in the first two quarters than it normally sees. While these levels have dropped off from their peaks in the spring, they have not returned to ...
Oct 16, 2020
[From the last episode: We put together many of the ideas we'€™ve been describing to show the basics of how in-memory compute works.] I'€™m going to take a sec for some commentary before we continue with the last few steps of in-memory compute. The whole point of this web...

featured video

Demo: Low-Power Machine Learning Inference with DesignWare ARC EM9D Processor IP

Sponsored by Synopsys

Applications that require sensing on a continuous basis are always on and often battery operated. In this video, the low-power ARC EM9D Processors run a handwriting character recognition neural network graph to infer the letter that is written.

Click here for more information about DesignWare ARC EM9D / EM11D Processors

featured paper

An engineer’s guide to autonomous and collaborative industrial robots

Sponsored by Texas Instruments

As robots are becoming more commonplace in factories, it is important that they become more intelligent, autonomous, safer and efficient. All of this is enabled with precise motor control, advanced sensing technologies and processing at the edge, all with robust real-time communication. In our e-book, an engineer’s guide to industrial robots, we take an in-depth look at the key technologies used in various robotic applications.

Click here to download the e-book

Featured Chalk Talk

Keeping Your Linux Device Secure

Sponsored by Mentor

Embedded security is an ongoing process, not a one-time effort. Even after your design is shipped, security vulnerabilities are certain to be discovered - even in things like the operating system. In this episode of Chalk Talk, Amelia Dalton chats with Kathy Tufto from Mentor - a Siemens business, about how to make a plan to keep your Linux-based embedded design secure, and how to respond quickly when new vulnerabilities are discovered.

More information about Mentor Embedded Linux®