feature article
Subscribe Now

There’s Exciting News on the Multi-Modal AI SoC Front

As is often the case, I’m amazed by how so many things seem to be interrelated and interconnected. I’m sorry… I feel an aside coming on… I cannot help myself… just saying “interconnected” reminds me of the book “Dirk Gently’s Holistic Detective Agency,” which was written by the late great Douglas Adams.

As you may recall, Dirk is an unconventional detective who believes in the “fundamental interconnectedness of all things.” The idea is that everything in the universe is interrelated, meaning that even seemingly random events or trivial details can have a meaningful connection. Dirk employs this approach to solve cases by embracing bizarre coincidences, odd insights, and intuition, which leads him to surprising and often cosmic truths that would otherwise seem unrelated.

On the off chance you were wondering, there have been a couple of TV interpretations that are loosely based (or not) on the original books, which were Dirk Gently’s Holistic Detective Agency and The Long Dark Tea-Time of the Soul. The 2010-2012 TV series originated in Britain and starred Stephen Mangan as holistic detective Dirk Gently and Darren Boyd as his sidekick Richard MacDuff. By comparison, the 2016-2017 TV series (well, two series, really) originated in the United States and starred Samuel Barnett as Dirk and Elijah Wood as his reluctant sidekick Todd.

But we digress…

The first thing that triggered my meandering musings on the interconnectedness of things is that, in a recent column, Arrggghhh! Now I Want an NI mioDAQ! (Ignore the ‘!’), I made mention of the fact that oscilloscopes back in the day were big, clunky, and horrendously expensive.

Well, I just read Steve Leibson’s column: The Rise and Fall of Heathkit – Part 1: Early Days. All I can say is that it’s fascinating to hear how the Heath company evolved into the form we used to know and love when I was coming of age. Steve’s column is based on his interview with Chas Gilmore, who joined the Heath Company in 1966 as a design engineer. Chas explained how it was that the first kit from Heath was an oscilloscope called the O-1 that sold for only around $39.50 circa 1947. As Chas says, “… an oscilloscope at that stage of the game was one expensive instrument, and you know, $39.50? You’ve got to be kidding me. I mean, that must have been a tenth to a hundredth the cost of most oscilloscopes at that stage of the game.”

The second thing that caused my cogitations and ruminations on the interconnectedness of things involved a trio, triad, or troika, if you will, in the form of a column, a case study, and a press release. Let’s take these one at a time: 

The Column: I recently realized that, although anyone involved in the design of large digital silicon chips is familiar with the term Network-on-Chip (NoC), relatively few people are cognizant of the underlying concepts, which caused me to write a column for the Ojo-Yoshida Report titled Welcome to the Wonderful World of NoCs.

I ended that column by introducing a new NoC-based soft tiling capability that was recently launched by the folks at Arteris IP. This is of particular interest for people designing system-on-chip (SoC) devices targeted at artificial intelligence (AI) and machine learning (ML) applications.

The idea is that these SoCs often involve 2D arrays of processor clusters (where each cluster contains multiple processor cores) as part of the main SoC. These processor clusters will be connected using a coherent NoC. Also, any AI or ML blocks like neural processing units (NPUs) may involve 2D arrays of processing elements (PEs). These PEs will be connected by a non-coherent NoC.

Let’s use the term processing units (PUs) to embrace both processor clusters and PEs. The traditional way of implementing an array of PUs is to create the initial PU by hand, then to replicate (think “cut-and-paste”) this PU into an array of PUs, then to generate the NoC, then to hand-configure the network interface units (NIUs) associated with the PUs (each PU has an NIU, and each NIU requires a unique ID/address so that the packets of data flying around the NoC know where they are coming from and where they are going to).

All this hand configuring is resource-intensive, prone to error, and frustrating, especially if—just when you’ve finished—the boss says something like, “we’ve decided to make a small modification to the original PU” (to which one might be forgiven for responding “Arrggghhh!”).

The idea behind NoC-based soft tiling is that, after creating the original PU, you simply tell the NoC tools the required X-Y dimensions for your array, at which point it auto-replicates the PUs, auto-generates the NoC (either coherent or non-coherent, as required), and auto-configures the NIUs, all in a matter of seconds or minutes.

The Case Study: There’s a very interesting SiMa.ai Case Study on the Arteris website. This describes how—way back in the mists of time we used to call 2022—the folks at SiMa.ai developed and released the world’s first software-centric, purpose-built machine learning system-on-chip (MLSoC) platform that delivered an astounding 10X better performance per watt than its nearest competitive solution.

To be honest, I was so enthused by the contents of this case study that (and I know you are going to be surprised when you hear this) I wrote my How to Build a Multi-Billion-Transistor SoC column about it.

The point here is that, in order to create their MLSoC, the guys and gals at SiMa.ai used NoC technology provided by the chaps and chapesses at Arteris. In particular, the case study ended with a quote that caught my eye: “We’ve already started work on our next-generation device, and—with respect to the NoC—we didn’t even think of looking elsewhere because FlexNoC from Arteris was an automatic and obvious choice!” — Srivi Dhruvanarayan, VP of Hardware Engineering, SiMa.ai

The Press Release: All the above leads us to a recent press release: SiMa.ai Expands ONE Platform for Edge AI with MLSoC Modalix, a New Product Family for Generative AI. 

This press release informs us that industry’s first multi-modal edge AI product family, SiMa.ai’s MLSoC Modalix, supports CNNs, Transformers, LLMs, LMMs, and Generative AI (GenAI) at the edge and delivers industry leading performance—more than 10X the performance per watt of alternatives.

Also, we are informed that: “SiMa.ai MLSoC Modalix is the second generation of the successful, commercially deployed first generation MLSoC. MLSoC Modalix is offered in 25 (Modalix 25 or “M25”), 50 (Modalix 50 or “M50”), 100 (Modalix 100 or “M100”) and 200 (Modalix 200 or “M200”) TOPS configurations, in multiple form factors, and is purpose-built to provide effortless deployment of Generative AI for the embedded edge ML market. Fully software compatible with first generation MLSoC, the MLSoC Modalix product family was designed to enable the capability to run DNNs, as well as advanced Transformer models, including LLMs, LMMs and Generative AI. Samples of MLSoC Modalix will be available to customers in Q4 of 2024.”

Meet the MLSoC Modalix family (Source: SiMa.ai)

When we visit the MLSoC Modalix page on the SiMa.ai website, we discover that this truly is, as they say, “A Complete System-on-Chip.” In addition to a “super-secret sauce” machine learning accelerator, this device boasts (nay, flaunts) a cornucopia of high- and low-speed I/O subsystems to interface with external devices and sensors; multimedia processing with video encode, decode, and a programmable DSP; boot security, system management, and debugging; huge amounts of on-chip memory along with access to humongous amounts of off-chip memory; an Arm A65 x 8 application processor and an image signal processor; and a network-on-chip and TrustZone security extensions.

The MLSoC Modalix is a complete system-on-chip (Source: SiMa.ai)

Now I’m wondering if the ML accelerator in this device is implemented as an array of processing elements connected by a mesh NoC. If so, I bet its creators are looking at the new Arteris soft tiling technology with awe and desire (perhaps accompanied by some gnashing of teeth and rending of garb), wishing it had been available when they were working on their Modalix devices. Oh well, perhaps they will avail themselves of this technology on their next-generation designs.

In Conclusion

The aforementioned press release made note of the fact that the rise of generative AI is changing the way humans and machines work together. Also, that “The next wave of the AI technology revolution will advance multi-modal machines with the ability to understand and process multiple forms of inputs across text, image, audio and visual. This shift will ripple across every industry, from agriculture and logistics, to medicine, defense, transportation and more.”

I totally agree. I’m also blown away by how fast the folks at SiMa.ai are moving. And, as usual, I’m left wanting to know more. On what technology node are these devices implemented? How many transistors are in an M200? What will the world look like in 10-, 20-, 50-, and 100-years’ time? And—most importantly, how much will a bacon sandwich cost me in 2050? How about you? Do you have any thoughts you’d care to share on any of this?

Leave a Reply

featured blogs
Dec 2, 2024
The Wi-SUN Smart City Living Lab Challenge names the winners with Farmer's Voice, a voice command app for agriculture use, taking first place. Read the blog....
Dec 3, 2024
I've just seen something that is totally droolworthy, which may explain why I'm currently drooling all over my keyboard....

Libby's Lab

Libby's Lab - Scopes Out Littelfuse's SRP1 Solid State Relays

Sponsored by Mouser Electronics and Littelfuse

In this episode of Libby's Lab, Libby and Demo investigate quiet, reliable SRP1 solid state relays from Littelfuse availavble on Mouser.com. These multi-purpose relays give engineers a reliable, high-endurance alternative to mechanical relays that provide silent operation and superior uptime.

Click here for more information about Littelfuse SRP1 High-Endurance Solid-State Relays

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

Easily Connect to AWS Cloud with ExpressLink Over Wi-Fi
Sponsored by Mouser Electronics and AWS and u-blox
In this episode of Chalk Talk, Amelia Dalton, Lucio Di Jasio from AWS and Magnus Johansson from u-blox explore common pitfalls of designing an IoT device from scratch, the benefits that AWS IoT ExpressLink brings to IoT device design, and how the the NORA-W2 AWS IoT ExpressLink multiradio modules can make retrofitting an already existing design into a smart AWS connected device easier than ever before.
May 30, 2024
34,328 views