feature article
Subscribe Now

Are We Ready for Human Brain-Scale AI?

My poor old noggin is currently full of ideas ricocheting around like corn kernels in an over-enthusiastic popcorn machine. As a tempting teaser, are you aware that: “Tachyum is enabling human brain-scale AI and advancing the entire world to a greener era by delivering the world’s first universal processor”? If not, then I will be delighted to expound, explicate, and elucidate, but first…

I’m currently gnashing my teeth and rending my garb because I just missed the chance to commemorate my one millionth birthday. I did celebrate my one hundredth birthday this past holiday weekend (see also Eeek! It’s My 100th Birthday!), assuming — of course — that we are working in base-8 (octal) and we are not desperately clinging to our grandparents’ base-10 (decimal) number system. But then my chum Bob Zeidman pointed out that it was also my 1,000,000th birthday if we choose to work in base-2 (binary).

Ah well, that ship has sailed for this year, but now I have my sights set on a palindromic celebration next year when I’ll be 101 (base-8) or 1000001 (base-2). Of course, 64 is the new 40 if we happen to be working in base-16 (hexadecimal), but I’m going to hold hexadecimal in reserve until I’m 66 (base-10), which will be 42 (base-16). By which time I have every hope of knowing the answer to Life, the Universe, and Everything.

As one more aside, I recently re-read Johnny and the Dead and Johnny and the Bomb by the late, great, Terry Pratchett. Both of these books are centered on a young 12-to-13-year-old boy called Johnny Maxwell who has unusual gifts, like the ability to see the dead (in a good way).

Until this past weekend, I had no idea that these had been made into TV mini-series-movies, the former in 1995 and the latter in 2006. I watched Johnny and the Dead on YouTube yesterday evening as I pen these words. On the one hand, it has to be admitted that television techniques and technology have come a long way since 1995; also, it’s easy to see that this was a low-budget production. On the other hand, the video plotline was really close to the book and it was awesome to see the characters I’d read about spring to life, which is ironic in the case of many of the cast who are somewhat challenged in this department.

Now I’m looking forward to watching Johnny and the Bomb on YouTube, not the least because it was filmed in 2006 and — from the snippet I just looked at — I have high hopes that the budget and the cinematography will be a feast for my orbs. Furthermore, this tale involves time travel (which I like to contemplate) because Johnny and his friends return to WWII to prevent the deaths caused by a misaimed bomb in an air raid.

The reason the Johnny Maxwell stories popped into my mind is that a recurring character is an old bag lady called Mrs. Tachyon who pushes a supermarket trolley around while muttering unexpected utterances and uttering enigmatic mutterings that no one understands. It turns out that the trolley is a time machine, which goes a long way to explain the jars of pickles and the “fish and chips” being wrapped in decades-old newspaper, but “that’s all I have to say about that,” as Forrest Gump might say.

What do you mean, “What’s all this got to do with anything?” Aren’t you paying attention? How could a company called Tachyum not make one think about Mrs. Tachyon?

The reason this all came about is when the folks at Tachyum contacted me to tell me about the upcoming availability of their FPGA-based Prodigy Universal Processor FPGA-Based Emulation Prototype. Of course, you may not find this news to be tremendously exciting unless you are already aware of the power of the Prodigy Universal Processor.

Let’s take a step back and ponder a few tidbits of trivia, such as the fact that data centers currently consume about 3% of the world’s total electricity supply (that’s 60% more power than the entire UK). At the current 27% rate of growth, unless something disruptive happens to change things (like the Prodigy Universal Processor, for example), this will increase to 33% by 2030 and 50% by 2040. Eeek!

Where is all this power going? Well, I don’t know about you, but as one simple example, when my wife (Gina the Gorgeous) and I are watching a program on television, we are constantly fact-checking things. Whenever I have a quick Google, I think of the vast numbers of servers in data centers around the world that are engaged in retrieving the information that I and countless others are constantly requesting. I must also admit to feeling a little guilty about the amount of energy that’s being consumed to satisfy my nonessential informational requests.

A really good book to help you wrap your brain around everything that is involved in this sort of thing is Tubes: A Journey to the Center of the Internet by Andrew Blum. And if you really want to get a feel for what’s happening in cyberspace (where no one can hear you scream), you may want to check out the Internet Live Stats website where you can watch a depiction of the amount of “stuff” that is happening on the internet each and every second (it’s alarming to watch the data-seconds slip by).

Now, I don’t plan to talk about the Prodigy Universal Processor in depth here because (a) that would require a column in its own right and (b) it doesn’t actually exist yet (more on this momentarily). Suffice it to say that it’s going to be a multicore device with 16-, 32-, 64-, and 128-core versions in the pipeline (no pun intended). Also, that each core is claimed to be smaller than an ARM and faster than the fastest Xeon while consuming one-tenth of the power.

Targeted at hyperscale data centers, the Prodigy Universal Processor architecture is predicted to out-perform central processing units (CPUs), graphics processing units (GPUs), and tensor processing units (TPUs) for data center, artificial intelligence (AI), and high-performance computing (HPC) applications. For example, Prodigy will outperform NVIDIA’s fastest GPU in HPC, as well as AI training and inference tasks (125 HPC Prodigy racks can deliver 32 tensor EXAFLOPS). As the folks from Tachyum say:

Tachyum’s Prodigy can run HPC applications, convolutional AI, explainable AI, general AI, bio AI, and spiking neural networks, plus normal data center workloads, on a single homogeneous processor platform using existing and standard programming models. Without Prodigy, data center customers must use a combination of CPUs, GPUs, TPUs and other accelerators for these different workloads, creating inefficiency, expense and the complexity of maintaining separate hardware infrastructures. Using specific hardware dedicated to each type of workload (e.g., data center, AI, HPC) results in the significant underutilization of hardware resources and more challenging programming, support and maintenance environments. Prodigy’s ability to seamlessly switch among these various workloads dramatically changes the competitive landscape and drastically improves data center economics.

One thing I found to be particularly interesting is the fact that, as part of fulfilling its claim to be “the world’s first universal processor,” Prodigy will run legacy x86, ARM, and RISC-V binaries in addition to its native Prodigy code.

Some interesting reading from 2020 tells how Tachyum’s Reference Design Will Be Used in a 2021 AI/HPC Supercomputer and Tachyum Joins I4DI to Design World’s Fastest AI Supercomputer in Slovakia. Furthermore, just two days ago at the time of this writing, Tachyum announced that its Prodigy Universal Processor Has Successfully Transitioned to a 5 Nanometer Process.

Now, this is where we have to be a bit careful because — if you visit the Products Page on Tachyum’s website — your knee-jerk impression might well be that Prodigy devices are sitting on the shelf waiting for you to order them. In reality, however, we might be better off thinking of this as “A forward-looking prospectus presented for our perusal by means of a web-delivered medium” (sometimes I amaze even myself). What Tachyum actually have at the moment is a simulation-verified architecture that has now been physically verified by means of test devices.

The final step prior to full-blown production is physical emulation, which returns us to the fact that the folks at Tachyum are currently bouncing off the walls in excitement regarding the upcoming availability of their FPGA-based Prodigy Universal Processor FPGA-Based Emulation Prototype.

Chi To, Director of Solutions Engineering, Tachyum, with the Prodigy Universal Processor FPGA-based Emulation Prototype (Image source: Tachyum)

The board in the image above is 14.5 inches by 16 inches (368.3mm x 406.4mm). This 24-layer bodacious beauty carries 5,948 components that are mounted on both sides of the substrate. In particular, the four large components with cooling fans are Intel Stratix 10 GX FPGAs, each containing 10+ million logic elements (LEs). This single board can be used to emulate eight Prodigy processor cores, including their vector and matrix fixed- and floating-point processing units. The complete hardware emulator consists of multiple FPGA and I/O boards connected by cables in a rack.  

In addition to allowing the chaps and chapesses at Tachyum to perform final verification prior to full production tape-out, customers will be able to use Prodigy’s fully functional FPGA emulation for product evaluation and performance measurements, as well as for software development, debug, and compatibility testing. The Prodigy FPGA emulation system will help customers smooth the adoption curve for Prodigy in their existing or new data center and/or HPC systems that demand the combination of high performance, high utilization, and low power.

The human brain is an awesome organ (at least, mine is) that contains somewhere between 100 and 150 trillion neural connections (synapses) — let’s say 125 trillion “give or take” to provide a point of reference. Simply counting connections is a meaningless exercise, but meaningless exercises are what I do best. In 2018, Google introduced an AI model called BERT with 340 million connections for use in natural language processing (NLP). Today, just three years later, models like Open AI’s GPT-3 have up to 175 billion connections, which is sufficient to act as an intelligent chatbot. By 2023, it is expected that high-end AI models will boast 100 trillion connections capable of performing 100+ AI exaflops.

What we are talking about here is human brain-scale AI. The more I mull over all this, (a) the more scared I get (see also The Artificial Intelligence Apocalypse — Is It Time to Be Scared Yet?) and (b) the more I think that the guys and gals at Tachyum are in the right place at the right time with their Prodigy Universal Processor. What say you? Are you excited or terrified by what appears to be heading our way?

2 thoughts on “Are We Ready for Human Brain-Scale AI?”

  1. Hi Max, my initial reaction is “how will this increase in perfomance actually be achieved?” I’m not expecting an answer because that answer would reveal the comany’s IP or “secret sauce”! So something that you may well be able to answer is where we currently stand in MIPS per Watt (or other metric that defines the energy requires to perform a single, simple unit of computation)? And following on, how far is left to go before we hit some fundamental limit? Where does Tachyum sit/stand or otherwise pose on this scale?
    Am I scared? Probably – history is full of examples of where potentially beneficial technology is not used to the best for humanity. Also – just take a glance around at the current political landscape!! Better health diagnosis and weather/climate change forecasting or more targetted advertising?

    1. Hi RBD — great questions as always — I will ask the folks at Tachyum if they would care to comment. For myself, I’m hoping to get an in-depth briefing and then write a full-up column on the Prodigy processor — I’ll ask the folks at Tachyum about that also.

Leave a Reply

featured blogs
May 24, 2024
Could these creepy crawly robo-critters be the first step on a slippery road to a robot uprising coupled with an insect uprising?...
May 23, 2024
We're investing in semiconductor workforce development programs in Latin America, including government and academic partnerships to foster engineering talent.The post Building the Semiconductor Workforce in Latin America appeared first on Chip Design....

featured video

Introducing Altera® Agilex 5 FPGAs and SoCs

Sponsored by Intel

Learn about the Altera Agilex 5 FPGA Family for tomorrow’s edge intelligent applications.

To learn more about Agilex 5 visit: Agilex™ 5 FPGA and SoC FPGA Product Overview

featured paper

Achieve Greater Design Flexibility and Reduce Costs with Chiplets

Sponsored by Keysight

Chiplets are a new way to build a system-on-chips (SoCs) to improve yields and reduce costs. It partitions the chip into discrete elements and connects them with a standardized interface, enabling designers to meet performance, efficiency, power, size, and cost challenges in the 5 / 6G, artificial intelligence (AI), and virtual reality (VR) era. This white paper will discuss the shift to chiplet adoption and Keysight EDA's implementation of the communication standard (UCIe) into the Keysight Advanced Design System (ADS).

Dive into the technical details – download now.

featured chalk talk

Improving Chip to Chip Communication with I3C
Sponsored by Mouser Electronics and Microchip
In this episode of Chalk Talk, Amelia Dalton and Toby Sinkinson from Microchip explore the benefits of I3C. They also examine how I3C helps simplify sensor networks, provides standardization for commonly performed functions, and how you can get started using Microchips I3C modules in your next design.
Feb 19, 2024