One of the things I really enjoy is bacon sandwiches, but that’s not what I wanted to talk about here. Another thing I enjoy is watching a startup company evolve from being a twinkle in its founder’s eye to purveying its first product.
Way back in the mists of time we used to call May 2020 (which is two long years ago as I pen these words), Jim Turley wrote a column on the topic of Creating the Universal Processor here on EE Journal. The focus of this column was a new type of processor called Prodigy that was under development by a startup called Tachyum.
As Jim said at that time with respect to the folks at Tachyum: “Their goals are nothing if not audacious. Prodigy will be faster than Intel’s Xeon but consume one-tenth the power. It will have 4× lower cost of ownership (TCO) than current server processors. It will occupy less silicon than an ARM design. It will perform both AI and hyperscale server workloads with equal aplomb. A single version of the chip will scale between 16 and 128 cores, all in the same 6400-ball package. And it will be cheaper than Intel or AMD chips by a factor of three. That last one shouldn’t be hard to achieve. The others? We’ll have to wait and see.”
Just one year after Jim’s column — and one year before the piece you are currently perusing — I penned my own column posing the question: Are We Ready for Human Brain-Scale AI? I was just refreshing my memory regarding that column. I have it open on one of my screens as we speak. (When I say “speak,” I’m speaking metaphorically, metaphorically speaking, the saying of which reminds me of the old programmer’s saying: “In order to understand recursion, one must first understand recursion.”) In that column I may have mentioned Johnny and the Bomb by the late, great, Terry Pratchett (but “I think I got away with it,” as Basil Fawlty famously said).
The reason for my broaching this bodacious book is that it contains a character called Mrs. Tachyon — an old lady who pushes a supermarket trolley around while muttering unexpected utterances and uttering enigmatic mutterings that no one understands. It’s not going too far to say that Mrs. Tachyon puts the “eff” in “ineffable.” We eventually discover that her trolley acts as a time machine, which explains a lot of things that were puzzling us in the tale, but let’s try not to wander off into the weeds. Suffice it to say that whenever I hear news of Tachyum, it causes Mrs. Tachyon to pop into (what I laughingly call) my mind — just be grateful you aren’t in here with me (cue maniacal laughter).
The reason for my waffling here is that the wait is almost over and Tachyum has formally launched Prodigy, which is described as “the world’s first universal processor.” The underlying idea is that people are currently using different types of processors to perform different types of tasks — central processing units (CPUs) for general-purpose processing, graphics processing units (GPUs) for graphics and hardware acceleration of algorithms that process large blocks of data in parallel, and AI accelerators for artificial intelligence (AI) applications.
A really (REALLY) simplified way of visualizing this is that CPUs do their best work on scalar values, GPUs do their best work on vector values, and AI accelerators do their best work on matrix values. What Prodigy does is to unify the functionality of a CPU, GPU, and TPU into a single architecture that’s implemented on a single monolithic device.
Tachyum’s Prodigy is the world’s first universal processor
(Image source: Tachyum)
This is probably a good time to note that, in the context of compute, the TLA (three-letter abbreviation) TPU is typically assumed to represent “Tensor Processing Unit.” This is an AI accelerator application-specific integrated circuit (ASIC) developed by Google specifically for neural network machine learning, particularly using Google’s own TensorFlow software. Google began using TPUs internally in 2015, making them available for third-party use in 2018. In the context of this column, however, we understand TPU to stand for “Tachyum Processing Unit.”
Implemented in an advanced 5nm process technology, Tachyum’s first commercial product — the Prodigy Cloud/AI/HPC supercomputer processor chip — offers 4x the performance of the fastest Xeon, has 3x more raw performance than NVIDIA’s H100 on HPC, and has 6x more raw performance on AI training and inference workloads, with up to 10x performance at the same power.
FPGA-based Prodigy prototype (Image source: Tachyum)
The folks at Tachyum say that Prodigy is poised to overcome the challenges of increasing data center power consumption, low server utilization, and stalled performance scaling. Some of the highlights of the newly launched Prodigy processor are as follows:
- 128 high-performance unified 64-bit cores running up to 5.7 GHz
- 16 DDR5 memory controllers
- 64 PCIe 5.0 lanes
- Multiprocessor support for 4-socket and 2-socket platforms
- Rack solutions for both air-cooled and liquid-cooled data centers
- SPECrate 2017 Integer performance of around 4x Intel 8380 and around 3x AMD 7763HPC
- Double-Precision Floating-Point performance is 3x NVIDIA H100
- AI FP8 performance is 6x NVIDIA H100
Unlike other CPU and GPU solutions, Prodigy has been designed to handle vector and matrix processing from the ground up, rather than as an afterthought. Among Prodigy’s vector and matrix features are support for a range of data types (FP64, FP32, TF32, BF16, Int8, FP8, and TAI); 2×1024-bit vector units per core; AI sparsity and super-sparsity support; and no penalty for misaligned vector loads or stores when crossing cache lines. This built-in support offers high performance for AI training and inference workloads, increases performance, and reduces memory utilization.
As the guys and gals at Tachyum delight in telling anyone who will listen, “Prodigy is significantly better than the best-performing processors currently available in hyperscale, HPC and AI markets. Prodigy delivers up to 3x the performance of the highest-performing x86 processors for cloud workloads, up to 3x compared to the highest-performing GPUs for HPC, and up to 6x for AI applications. By increasing performance while using less power, Prodigy solves the problem of sustainable data center growth by offering unparalleled carbon footprint reduction. This is especially important as the universality of AI continues to gain traction: Prodigy will enable unprecedented data center TCO savings as part of this new-world market.”
To back all of this up, the chaps and chapesses at Tachyum spent a lot of time presenting me with a mind-boggling array of charts and graphics, including the following:
- Prodigy vs. x86 (AMD 7763 & Intel 8380): FP64 Floating-Point Raw Performance.
- Prodigy vs. Nvidia H100 GPU (H100 DP & H100 AI): HPC and AI
- Prodigy vs. AMD MI250X GPU (MI250X DP & MI250 AI): HPC and AI
- Prodigy vs. x86: SPECrate 2017 Integer (AMD 7763 Performance & Intel 8380 Performance)
- Prodigy vs. Nvidia H100: Rack-Level Comparison (H100 DGX POD vs Prodigy Air cooled rack and liquid cooled)
I’m not as stupid as I look (but, there again, who could be). Using my prodigious wetware processor (which is one of my three favorite organs), I detected the subtle pattern that threaded its way through all of the bar charts, namely that all of the columns representing Tachyum’s Prodigy stand proud in the crowd compared to competitive offerings.
Sampling for Prodigy will begin later this year with volume production taking place in the 1H 2023. I don’t know about you, but I cannot wait to meet and greet my first Prodigy chip in the flesh, as it were. How about you? Do you have any thoughts you’d care to share?