feature article
Subscribe Now

Are We Ready for Human Brain-Scale AI?

My poor old noggin is currently full of ideas ricocheting around like corn kernels in an over-enthusiastic popcorn machine. As a tempting teaser, are you aware that: “Tachyum is enabling human brain-scale AI and advancing the entire world to a greener era by delivering the world’s first universal processor”? If not, then I will be delighted to expound, explicate, and elucidate, but first…

I’m currently gnashing my teeth and rending my garb because I just missed the chance to commemorate my one millionth birthday. I did celebrate my one hundredth birthday this past holiday weekend (see also Eeek! It’s My 100th Birthday!), assuming — of course — that we are working in base-8 (octal) and we are not desperately clinging to our grandparents’ base-10 (decimal) number system. But then my chum Bob Zeidman pointed out that it was also my 1,000,000th birthday if we choose to work in base-2 (binary).

Ah well, that ship has sailed for this year, but now I have my sights set on a palindromic celebration next year when I’ll be 101 (base-8) or 1000001 (base-2). Of course, 64 is the new 40 if we happen to be working in base-16 (hexadecimal), but I’m going to hold hexadecimal in reserve until I’m 66 (base-10), which will be 42 (base-16). By which time I have every hope of knowing the answer to Life, the Universe, and Everything.

As one more aside, I recently re-read Johnny and the Dead and Johnny and the Bomb by the late, great, Terry Pratchett. Both of these books are centered on a young 12-to-13-year-old boy called Johnny Maxwell who has unusual gifts, like the ability to see the dead (in a good way).

Until this past weekend, I had no idea that these had been made into TV mini-series-movies, the former in 1995 and the latter in 2006. I watched Johnny and the Dead on YouTube yesterday evening as I pen these words. On the one hand, it has to be admitted that television techniques and technology have come a long way since 1995; also, it’s easy to see that this was a low-budget production. On the other hand, the video plotline was really close to the book and it was awesome to see the characters I’d read about spring to life, which is ironic in the case of many of the cast who are somewhat challenged in this department.

Now I’m looking forward to watching Johnny and the Bomb on YouTube, not the least because it was filmed in 2006 and — from the snippet I just looked at — I have high hopes that the budget and the cinematography will be a feast for my orbs. Furthermore, this tale involves time travel (which I like to contemplate) because Johnny and his friends return to WWII to prevent the deaths caused by a misaimed bomb in an air raid.

The reason the Johnny Maxwell stories popped into my mind is that a recurring character is an old bag lady called Mrs. Tachyon who pushes a supermarket trolley around while muttering unexpected utterances and uttering enigmatic mutterings that no one understands. It turns out that the trolley is a time machine, which goes a long way to explain the jars of pickles and the “fish and chips” being wrapped in decades-old newspaper, but “that’s all I have to say about that,” as Forrest Gump might say.

What do you mean, “What’s all this got to do with anything?” Aren’t you paying attention? How could a company called Tachyum not make one think about Mrs. Tachyon?

The reason this all came about is when the folks at Tachyum contacted me to tell me about the upcoming availability of their FPGA-based Prodigy Universal Processor FPGA-Based Emulation Prototype. Of course, you may not find this news to be tremendously exciting unless you are already aware of the power of the Prodigy Universal Processor.

Let’s take a step back and ponder a few tidbits of trivia, such as the fact that data centers currently consume about 3% of the world’s total electricity supply (that’s 60% more power than the entire UK). At the current 27% rate of growth, unless something disruptive happens to change things (like the Prodigy Universal Processor, for example), this will increase to 33% by 2030 and 50% by 2040. Eeek!

Where is all this power going? Well, I don’t know about you, but as one simple example, when my wife (Gina the Gorgeous) and I are watching a program on television, we are constantly fact-checking things. Whenever I have a quick Google, I think of the vast numbers of servers in data centers around the world that are engaged in retrieving the information that I and countless others are constantly requesting. I must also admit to feeling a little guilty about the amount of energy that’s being consumed to satisfy my nonessential informational requests.

A really good book to help you wrap your brain around everything that is involved in this sort of thing is Tubes: A Journey to the Center of the Internet by Andrew Blum. And if you really want to get a feel for what’s happening in cyberspace (where no one can hear you scream), you may want to check out the Internet Live Stats website where you can watch a depiction of the amount of “stuff” that is happening on the internet each and every second (it’s alarming to watch the data-seconds slip by).

Now, I don’t plan to talk about the Prodigy Universal Processor in depth here because (a) that would require a column in its own right and (b) it doesn’t actually exist yet (more on this momentarily). Suffice it to say that it’s going to be a multicore device with 16-, 32-, 64-, and 128-core versions in the pipeline (no pun intended). Also, that each core is claimed to be smaller than an ARM and faster than the fastest Xeon while consuming one-tenth of the power.

Targeted at hyperscale data centers, the Prodigy Universal Processor architecture is predicted to out-perform central processing units (CPUs), graphics processing units (GPUs), and tensor processing units (TPUs) for data center, artificial intelligence (AI), and high-performance computing (HPC) applications. For example, Prodigy will outperform NVIDIA’s fastest GPU in HPC, as well as AI training and inference tasks (125 HPC Prodigy racks can deliver 32 tensor EXAFLOPS). As the folks from Tachyum say:

Tachyum’s Prodigy can run HPC applications, convolutional AI, explainable AI, general AI, bio AI, and spiking neural networks, plus normal data center workloads, on a single homogeneous processor platform using existing and standard programming models. Without Prodigy, data center customers must use a combination of CPUs, GPUs, TPUs and other accelerators for these different workloads, creating inefficiency, expense and the complexity of maintaining separate hardware infrastructures. Using specific hardware dedicated to each type of workload (e.g., data center, AI, HPC) results in the significant underutilization of hardware resources and more challenging programming, support and maintenance environments. Prodigy’s ability to seamlessly switch among these various workloads dramatically changes the competitive landscape and drastically improves data center economics.

One thing I found to be particularly interesting is the fact that, as part of fulfilling its claim to be “the world’s first universal processor,” Prodigy will run legacy x86, ARM, and RISC-V binaries in addition to its native Prodigy code.

Some interesting reading from 2020 tells how Tachyum’s Reference Design Will Be Used in a 2021 AI/HPC Supercomputer and Tachyum Joins I4DI to Design World’s Fastest AI Supercomputer in Slovakia. Furthermore, just two days ago at the time of this writing, Tachyum announced that its Prodigy Universal Processor Has Successfully Transitioned to a 5 Nanometer Process.

Now, this is where we have to be a bit careful because — if you visit the Products Page on Tachyum’s website — your knee-jerk impression might well be that Prodigy devices are sitting on the shelf waiting for you to order them. In reality, however, we might be better off thinking of this as “A forward-looking prospectus presented for our perusal by means of a web-delivered medium” (sometimes I amaze even myself). What Tachyum actually have at the moment is a simulation-verified architecture that has now been physically verified by means of test devices.

The final step prior to full-blown production is physical emulation, which returns us to the fact that the folks at Tachyum are currently bouncing off the walls in excitement regarding the upcoming availability of their FPGA-based Prodigy Universal Processor FPGA-Based Emulation Prototype.

Chi To, Director of Solutions Engineering, Tachyum, with the Prodigy Universal Processor FPGA-based Emulation Prototype (Image source: Tachyum)

The board in the image above is 14.5 inches by 16 inches (368.3mm x 406.4mm). This 24-layer bodacious beauty carries 5,948 components that are mounted on both sides of the substrate. In particular, the four large components with cooling fans are Intel Stratix 10 GX FPGAs, each containing 10+ million logic elements (LEs). This single board can be used to emulate eight Prodigy processor cores, including their vector and matrix fixed- and floating-point processing units. The complete hardware emulator consists of multiple FPGA and I/O boards connected by cables in a rack.  

In addition to allowing the chaps and chapesses at Tachyum to perform final verification prior to full production tape-out, customers will be able to use Prodigy’s fully functional FPGA emulation for product evaluation and performance measurements, as well as for software development, debug, and compatibility testing. The Prodigy FPGA emulation system will help customers smooth the adoption curve for Prodigy in their existing or new data center and/or HPC systems that demand the combination of high performance, high utilization, and low power.

The human brain is an awesome organ (at least, mine is) that contains somewhere between 100 and 150 trillion neural connections (synapses) — let’s say 125 trillion “give or take” to provide a point of reference. Simply counting connections is a meaningless exercise, but meaningless exercises are what I do best. In 2018, Google introduced an AI model called BERT with 340 million connections for use in natural language processing (NLP). Today, just three years later, models like Open AI’s GPT-3 have up to 175 billion connections, which is sufficient to act as an intelligent chatbot. By 2023, it is expected that high-end AI models will boast 100 trillion connections capable of performing 100+ AI exaflops.

What we are talking about here is human brain-scale AI. The more I mull over all this, (a) the more scared I get (see also The Artificial Intelligence Apocalypse — Is It Time to Be Scared Yet?) and (b) the more I think that the guys and gals at Tachyum are in the right place at the right time with their Prodigy Universal Processor. What say you? Are you excited or terrified by what appears to be heading our way?

2 thoughts on “Are We Ready for Human Brain-Scale AI?”

  1. Hi Max, my initial reaction is “how will this increase in perfomance actually be achieved?” I’m not expecting an answer because that answer would reveal the comany’s IP or “secret sauce”! So something that you may well be able to answer is where we currently stand in MIPS per Watt (or other metric that defines the energy requires to perform a single, simple unit of computation)? And following on, how far is left to go before we hit some fundamental limit? Where does Tachyum sit/stand or otherwise pose on this scale?
    Am I scared? Probably – history is full of examples of where potentially beneficial technology is not used to the best for humanity. Also – just take a glance around at the current political landscape!! Better health diagnosis and weather/climate change forecasting or more targetted advertising?

    1. Hi RBD — great questions as always — I will ask the folks at Tachyum if they would care to comment. For myself, I’m hoping to get an in-depth briefing and then write a full-up column on the Prodigy processor — I’ll ask the folks at Tachyum about that also.

Leave a Reply

featured blogs
Sep 21, 2021
Placing component leads accurately as per the datasheet is an important task while creating a package footprint symbol. As the pin pitch goes down, the size and location of the component lead play a... [[ Click on the title to access the full blog on the Cadence Community si...
Sep 21, 2021
Learn how our high-performance FPGA prototyping tools enable RTL debug for chip validation teams, eliminating simulation/emulation during hardware debugging. The post High Debug Productivity Is the FPGA Prototyping Game Changer: Part 1 appeared first on From Silicon To Softw...
Sep 18, 2021
Projects with a steampunk look-and-feel incorporate retro-futuristic technology and aesthetics inspired by 19th-century industrial steam-powered machinery....
Aug 5, 2021
Megh Computing's Video Analytics Solution (VAS) portfolio implements a flexible and scalable video analytics pipeline consisting of the following elements: Video Ingestion Video Transformation Object Detection and Inference Video Analytics Visualization   Because Megh's ...

featured video

Maxim Integrated is now part of Analog Devices

Sponsored by Maxim Integrated (now part of Analog Devices)

What if we didn’t wait around for the amazing inventions of tomorrow – and got busy creating them today?

See What If: analog.com/Maxim

featured paper

An Engineer's Guide to Designing with Precision Amplifiers

Sponsored by Texas Instruments

This e-book contains years of circuit design recommendations and insights from Texas Instruments industry experts and covers many common topics and questions you may encounter while designing with precision amplifiers.

Click to read more

featured chalk talk

i.MX RT1170

Sponsored by Mouser Electronics and NXP Semiconductors

Dual Core microcontrollers can bring a lot of benefits to today’s modern embedded designs in order to keep all of our design requirements in balance. In this episode of Chalk Talk, Amelia Dalton chats with Patrick Kennedy from NXP about why newer design requirements for today’s connected embedded systems are making this balancing act even harder than ever before and how the i.MX RT1170 can help solve these problems with its heterogeneous dual cores, MIPI interface, multi-core low power strategy and SRAM PUF technology can make all the difference in your next embedded design.

Click here for More information about NXP Semiconductors i.MX RT1170 crossover microcontrollers