feature article
Subscribe Now

Are We Ready for Human Brain-Scale AI?

My poor old noggin is currently full of ideas ricocheting around like corn kernels in an over-enthusiastic popcorn machine. As a tempting teaser, are you aware that: “Tachyum is enabling human brain-scale AI and advancing the entire world to a greener era by delivering the world’s first universal processor”? If not, then I will be delighted to expound, explicate, and elucidate, but first…

I’m currently gnashing my teeth and rending my garb because I just missed the chance to commemorate my one millionth birthday. I did celebrate my one hundredth birthday this past holiday weekend (see also Eeek! It’s My 100th Birthday!), assuming — of course — that we are working in base-8 (octal) and we are not desperately clinging to our grandparents’ base-10 (decimal) number system. But then my chum Bob Zeidman pointed out that it was also my 1,000,000th birthday if we choose to work in base-2 (binary).

Ah well, that ship has sailed for this year, but now I have my sights set on a palindromic celebration next year when I’ll be 101 (base-8) or 1000001 (base-2). Of course, 64 is the new 40 if we happen to be working in base-16 (hexadecimal), but I’m going to hold hexadecimal in reserve until I’m 66 (base-10), which will be 42 (base-16). By which time I have every hope of knowing the answer to Life, the Universe, and Everything.

As one more aside, I recently re-read Johnny and the Dead and Johnny and the Bomb by the late, great, Terry Pratchett. Both of these books are centered on a young 12-to-13-year-old boy called Johnny Maxwell who has unusual gifts, like the ability to see the dead (in a good way).

Until this past weekend, I had no idea that these had been made into TV mini-series-movies, the former in 1995 and the latter in 2006. I watched Johnny and the Dead on YouTube yesterday evening as I pen these words. On the one hand, it has to be admitted that television techniques and technology have come a long way since 1995; also, it’s easy to see that this was a low-budget production. On the other hand, the video plotline was really close to the book and it was awesome to see the characters I’d read about spring to life, which is ironic in the case of many of the cast who are somewhat challenged in this department.

Now I’m looking forward to watching Johnny and the Bomb on YouTube, not the least because it was filmed in 2006 and — from the snippet I just looked at — I have high hopes that the budget and the cinematography will be a feast for my orbs. Furthermore, this tale involves time travel (which I like to contemplate) because Johnny and his friends return to WWII to prevent the deaths caused by a misaimed bomb in an air raid.

The reason the Johnny Maxwell stories popped into my mind is that a recurring character is an old bag lady called Mrs. Tachyon who pushes a supermarket trolley around while muttering unexpected utterances and uttering enigmatic mutterings that no one understands. It turns out that the trolley is a time machine, which goes a long way to explain the jars of pickles and the “fish and chips” being wrapped in decades-old newspaper, but “that’s all I have to say about that,” as Forrest Gump might say.

What do you mean, “What’s all this got to do with anything?” Aren’t you paying attention? How could a company called Tachyum not make one think about Mrs. Tachyon?

The reason this all came about is when the folks at Tachyum contacted me to tell me about the upcoming availability of their FPGA-based Prodigy Universal Processor FPGA-Based Emulation Prototype. Of course, you may not find this news to be tremendously exciting unless you are already aware of the power of the Prodigy Universal Processor.

Let’s take a step back and ponder a few tidbits of trivia, such as the fact that data centers currently consume about 3% of the world’s total electricity supply (that’s 60% more power than the entire UK). At the current 27% rate of growth, unless something disruptive happens to change things (like the Prodigy Universal Processor, for example), this will increase to 33% by 2030 and 50% by 2040. Eeek!

Where is all this power going? Well, I don’t know about you, but as one simple example, when my wife (Gina the Gorgeous) and I are watching a program on television, we are constantly fact-checking things. Whenever I have a quick Google, I think of the vast numbers of servers in data centers around the world that are engaged in retrieving the information that I and countless others are constantly requesting. I must also admit to feeling a little guilty about the amount of energy that’s being consumed to satisfy my nonessential informational requests.

A really good book to help you wrap your brain around everything that is involved in this sort of thing is Tubes: A Journey to the Center of the Internet by Andrew Blum. And if you really want to get a feel for what’s happening in cyberspace (where no one can hear you scream), you may want to check out the Internet Live Stats website where you can watch a depiction of the amount of “stuff” that is happening on the internet each and every second (it’s alarming to watch the data-seconds slip by).

Now, I don’t plan to talk about the Prodigy Universal Processor in depth here because (a) that would require a column in its own right and (b) it doesn’t actually exist yet (more on this momentarily). Suffice it to say that it’s going to be a multicore device with 16-, 32-, 64-, and 128-core versions in the pipeline (no pun intended). Also, that each core is claimed to be smaller than an ARM and faster than the fastest Xeon while consuming one-tenth of the power.

Targeted at hyperscale data centers, the Prodigy Universal Processor architecture is predicted to out-perform central processing units (CPUs), graphics processing units (GPUs), and tensor processing units (TPUs) for data center, artificial intelligence (AI), and high-performance computing (HPC) applications. For example, Prodigy will outperform NVIDIA’s fastest GPU in HPC, as well as AI training and inference tasks (125 HPC Prodigy racks can deliver 32 tensor EXAFLOPS). As the folks from Tachyum say:

Tachyum’s Prodigy can run HPC applications, convolutional AI, explainable AI, general AI, bio AI, and spiking neural networks, plus normal data center workloads, on a single homogeneous processor platform using existing and standard programming models. Without Prodigy, data center customers must use a combination of CPUs, GPUs, TPUs and other accelerators for these different workloads, creating inefficiency, expense and the complexity of maintaining separate hardware infrastructures. Using specific hardware dedicated to each type of workload (e.g., data center, AI, HPC) results in the significant underutilization of hardware resources and more challenging programming, support and maintenance environments. Prodigy’s ability to seamlessly switch among these various workloads dramatically changes the competitive landscape and drastically improves data center economics.

One thing I found to be particularly interesting is the fact that, as part of fulfilling its claim to be “the world’s first universal processor,” Prodigy will run legacy x86, ARM, and RISC-V binaries in addition to its native Prodigy code.

Some interesting reading from 2020 tells how Tachyum’s Reference Design Will Be Used in a 2021 AI/HPC Supercomputer and Tachyum Joins I4DI to Design World’s Fastest AI Supercomputer in Slovakia. Furthermore, just two days ago at the time of this writing, Tachyum announced that its Prodigy Universal Processor Has Successfully Transitioned to a 5 Nanometer Process.

Now, this is where we have to be a bit careful because — if you visit the Products Page on Tachyum’s website — your knee-jerk impression might well be that Prodigy devices are sitting on the shelf waiting for you to order them. In reality, however, we might be better off thinking of this as “A forward-looking prospectus presented for our perusal by means of a web-delivered medium” (sometimes I amaze even myself). What Tachyum actually have at the moment is a simulation-verified architecture that has now been physically verified by means of test devices.

The final step prior to full-blown production is physical emulation, which returns us to the fact that the folks at Tachyum are currently bouncing off the walls in excitement regarding the upcoming availability of their FPGA-based Prodigy Universal Processor FPGA-Based Emulation Prototype.

Chi To, Director of Solutions Engineering, Tachyum, with the Prodigy Universal Processor FPGA-based Emulation Prototype (Image source: Tachyum)

The board in the image above is 14.5 inches by 16 inches (368.3mm x 406.4mm). This 24-layer bodacious beauty carries 5,948 components that are mounted on both sides of the substrate. In particular, the four large components with cooling fans are Intel Stratix 10 GX FPGAs, each containing 10+ million logic elements (LEs). This single board can be used to emulate eight Prodigy processor cores, including their vector and matrix fixed- and floating-point processing units. The complete hardware emulator consists of multiple FPGA and I/O boards connected by cables in a rack.  

In addition to allowing the chaps and chapesses at Tachyum to perform final verification prior to full production tape-out, customers will be able to use Prodigy’s fully functional FPGA emulation for product evaluation and performance measurements, as well as for software development, debug, and compatibility testing. The Prodigy FPGA emulation system will help customers smooth the adoption curve for Prodigy in their existing or new data center and/or HPC systems that demand the combination of high performance, high utilization, and low power.

The human brain is an awesome organ (at least, mine is) that contains somewhere between 100 and 150 trillion neural connections (synapses) — let’s say 125 trillion “give or take” to provide a point of reference. Simply counting connections is a meaningless exercise, but meaningless exercises are what I do best. In 2018, Google introduced an AI model called BERT with 340 million connections for use in natural language processing (NLP). Today, just three years later, models like Open AI’s GPT-3 have up to 175 billion connections, which is sufficient to act as an intelligent chatbot. By 2023, it is expected that high-end AI models will boast 100 trillion connections capable of performing 100+ AI exaflops.

What we are talking about here is human brain-scale AI. The more I mull over all this, (a) the more scared I get (see also The Artificial Intelligence Apocalypse — Is It Time to Be Scared Yet?) and (b) the more I think that the guys and gals at Tachyum are in the right place at the right time with their Prodigy Universal Processor. What say you? Are you excited or terrified by what appears to be heading our way?

2 thoughts on “Are We Ready for Human Brain-Scale AI?”

  1. Hi Max, my initial reaction is “how will this increase in perfomance actually be achieved?” I’m not expecting an answer because that answer would reveal the comany’s IP or “secret sauce”! So something that you may well be able to answer is where we currently stand in MIPS per Watt (or other metric that defines the energy requires to perform a single, simple unit of computation)? And following on, how far is left to go before we hit some fundamental limit? Where does Tachyum sit/stand or otherwise pose on this scale?
    Am I scared? Probably – history is full of examples of where potentially beneficial technology is not used to the best for humanity. Also – just take a glance around at the current political landscape!! Better health diagnosis and weather/climate change forecasting or more targetted advertising?

    1. Hi RBD — great questions as always — I will ask the folks at Tachyum if they would care to comment. For myself, I’m hoping to get an in-depth briefing and then write a full-up column on the Prodigy processor — I’ll ask the folks at Tachyum about that also.

Leave a Reply

featured blogs
Jul 22, 2021
The HotFix 019 (QIR 3, indicated as 2021.1 in the application splash screens) update for OrCAD® and Allegro® is now available at Cadence Downloads . This blog post contains important links... [[ Click on the title to access the full blog on the Cadence Community si...
Jul 21, 2021
It's a funny old thing to find yourself in possession of a USB-C dock when you don't have a host machine that sports a USB-C connector with which to drive it....
Jul 21, 2021
We explain how virtual prototyping eliminates ASIC design bugs before RTL, and how chip architecture design modeling correlates key performance attributes. The post Take the Guesswork Out of Designing Your New Product Architecture appeared first on From Silicon To Software....
Jul 9, 2021
Do you have questions about using the Linux OS with FPGAs? Intel is holding another 'Ask an Expert' session and the topic is 'Using Linux with Intel® SoC FPGAs.' Come and ask our experts about the various Linux OS options available to use with the integrated Arm Cortex proc...

featured video

Adopt a Shift-left Methodology to Accelerate Your Product Development Process

Sponsored by Cadence Design Systems

Validate your most sophisticated SoC designs before silicon and stay on schedule. Balance your workload between simulation, emulation and prototyping for complete system validation. You need the right tool for the right job. Emulation meets prototyping -- Cadence Palladium and Protium Dynamic Duo for IP/SoC verification, hardware and software regressions, and early software development.

More information about Emulation and Prototyping

featured paper

PrimeLib Next-Gen Library Characterization - Providing Accelerated Access to Advanced Process Nodes

Sponsored by Synopsys

What’s driving the need for a best-in-class solution for library characterization? In the latest Synopsys Designer’s Digest, learn about various SoC design challenges, requirements, and innovative technologies that deliver faster time-to-market with golden signoff quality. Learn how Synopsys’ PrimeLib™ solution addresses the increase in complexity and accuracy needs for advanced nodes and provides designers and foundries accelerated turn-around time and compute resource optimization.

Click to read the latest issue of Designer's Digest

featured chalk talk

Building Your IoT Toolbox

Sponsored by Mouser Electronics and Digi

December 17, 2020 - IoT design is a complex task, involving numerous disciplines and domains - including embedded design, software, networking, security, manufacturability, and the list goes on and on. Mastering all those moving parts is a daunting challenge for design teams. In this episode of Chalk Talk, Amelia Dalton chats with Andy Reiter of Digi International about development, deployment, manufacturing, and management tools for IoT development that could help get your next design out the door.

Click here for more information about DIGI XBee® Tools