feature article
Subscribe Now

Tachyum Demos Four-Way Software Translation

Universal “Prodigy” Processor Runs x86, ARM, RISC-V in Emulation

“There is no prodigy in our profession.”  –  Luciano Pavarotti

So far, so good. That’s the status report from Tachyum, the ambitious startup creating Prodigy, a chip design it calls “the world’s first universal processor.” Prodigy is billed as being faster, cheaper, smaller, and more efficient than any other CPU. Oh, and it runs those other processors’ binaries, too. Is there anything this chip can’t do? 

Well, it’s a long way from shipping yet, so we’ll have to wait and see. But the company has made some progress on the software front, demonstrating its on-the-fly runtime binary translation. 

Tachyum is targeting server manufacturers with its wonder chip, so it knows that server-side software stacks will be important. For those customers, the famous LAMP set – Linux, Apache, MySQL, and PHP/Perl/Python – is the acid test. If you can run the LAMP stack you’re most of the way toward building a useful server. It’s the Lotus 1-2-3 or Microsoft Flight Simulator of compatibility tests. 

Fortunately, LAMP software is all open-sourced, which means it can be compiled for absolutely any processor. Unfortunately, not all server operators are interested in compiling, linking, and maintaining their own code. They want shrink-wrapped software, and they want it to run without hiccups. Besides, LAMP is just the start. There are usually proprietary applications running on top of LAMP, and those often can’t be recompiled. Result: backward compatibility with today’s x86 servers is a must-have. 

Tachyum started work on its x86 translator almost as soon as Prodigy itself. The company knew that curious customers would want to see their own code running on Prodigy before placing any big bets on the unusual processor. The translator is designed to be transparent, detecting x86 binaries at load time and converting on the fly. It’s dynamic, not static, translation, so no recompiled version is ever stored. It operates much like Apple’s PowerPC-to-x86 translation worked, or Sun’s x86-to-SPARC. The hardest part, says Tachyum’s CEO Rado Danilak, was tweaking the Linux port to make the whole operation appear transparent. As far as the user knows, the machine is executing Prodigy binaries. 

But the company didn’t stop there. It’s also developed ARM and RISC-V translators for Prodigy, too. There aren’t too many ARM-based servers out there, but the architecture is getting just enough traction in the market to warrant building a separate translator for it. Customers looking at something like Ampere’s Altra or Marvell’s ThunderX2 (both ARM-based) can now add Prodigy into the mix without having to rejigger any code. 

The RISC-V translator seems a bit odd, though. Why go to all that trouble for a CPU with virtually no server presence? Danilak blames politics. A few academic research teams and government projects have “open source” or “no x86” clauses written into their funding. This tends to drive people toward RISC-V, almost by default. To make the shortlist for those projects, Prodigy needs to make itself look like a RISC-V processor. 

The first Prodigy chips are easily a year away. The company hasn’t even built its massive FPGA-based prototype yet; that comes in November if all goes well. So, even native Prodigy code is running in emulation on development systems, which makes the other three all emulations of an emulation. 

Are these binary translators intended to be permanent, or just a short-term fix for customer evaluation? Both, says Danilak. Having translators at the ready allows customers to deploy new hardware with existing software, and he intends for the emulators to reflect favorably on Prodigy’s prowess. Eventually, he says, customers will recompile the parts they can and get better native performance. The parts they can’t recompile will continue to run in emulation for as long as necessary. His team is developing each binary translator just enough to be useful and reliable but without tweaking them endlessly to get maximum performance. That rapidly reaches the point of diminishing returns, and their talents could be better spent elsewhere, he says. 

Something is always lost in translation, and Tachyum’s x86 translator loses about 35% in performance versus the same code running on Intel hardware, according to the company. A one-third drop in performance isn’t bad, given how complicated x86 binary translation can be. You could even argue that Prodigy runs x86 code faster than a low-end Xeon processor. It all depends on what you choose to benchmark yourself against. On the other hand, all of these numbers are simulated, including Prodigy’s own native performance, so it’s hard to say whether that emulated emulation efficiency will translate to real hardware. Issues like memory latency or pipeline interlocks can lie hidden until well past the simulation stage. 

Emulating ARM’s instruction set is easier (no surprise there), so Prodigy loses only about 25–30% in emulation. No word on how well the RISC-V translator performs. 

In its current state, the Prodigy processor – or at least, a box simulating its design – can run code from four wildly different CPU architectures, all at the same time. So far, the team seems pleased with their progress. Emulated behavior matches their C models. It passes short 1000-cycle tests. An in-house FPGA simulation box is on the horizon. Verification and integration of peripherals still lie ahead. Who knows? Maybe in a year we’ll see how prodigious this Prodigy really is. 

Leave a Reply

featured blogs
Dec 1, 2023
Why is Design for Testability (DFT) crucial for VLSI (Very Large Scale Integration) design? Keeping testability in mind when developing a chip makes it simpler to find structural flaws in the chip and make necessary design corrections before the product is shipped to users. T...
Nov 27, 2023
See how we're harnessing generative AI throughout our suite of EDA tools with Synopsys.AI Copilot, the world's first GenAI capability for chip design.The post Meet Synopsys.ai Copilot, Industry's First GenAI Capability for Chip Design appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

3D-IC Design Challenges and Requirements

Sponsored by Cadence Design Systems

While there is great interest in 3D-IC technology, it is still in its early phases. Standard definitions are lacking, the supply chain ecosystem is in flux, and design, analysis, verification, and test challenges need to be resolved. Read this paper to learn about design challenges, ecosystem requirements, and needed solutions. While various types of multi-die packages have been available for many years, this paper focuses on 3D integration and packaging of multiple stacked dies.

Click to read more

featured chalk talk

Embedded Storage in Green IoT Applications
Sponsored by Mouser Electronics and Swissbit
In this episode of Chalk Talk, Amelia Dalton and Martin Schreiber from Swissbit explore the unique set of memory requirements that Green IoT designs demand, the roles that endurance, performance and density play in flash memory solutions, and how Swissbit’s SD cards and eMMC technologies can add value to your next IoT design.
Oct 25, 2023
4,244 views