Solving the Memory Wall: A Deep Dive into AI Inference with Sandra Rivera

This week, I’m excited to welcome Sandra Rivera from VSORA! We dive into a discussion on why AI inference is essential for deployment at scale, specifically focusing on how VSORA’s patented software architecture addresses the “memory wall” by collapsing memory layers. We explore their recent tape-out, the need for low latency and high determinism, future plans for OEM modules and MLPerf benchmarking, and even get a brief look into Sandra’s family llama farm.

Links for April 10, 2026

More information about VSORA

Amelia’s Favorite Fish Fry Episodes

Click here to check out the Fish Fry Archive.

Click here to subscribe to Fish Fry via Podbean

Click here to get the Fish Fry RSS Feed

Click here to subscribe to Fish Fry via Apple Podcasts

Click here to subscribe to Fish Fry via Spotify

Amelia’s Weekly Fish Fry – Episode 676

Release Date: April 10, 2026

Hello there, everyone, and welcome to episode number 676 of this electronic engineering podcast, Amelia’s Weekly Fish Fry, brought to you by EEJournal.com and written, produced, and hosted by yours truly, Amelia Dalton.

Folks, I have been excited to share this interview for some time. My guest is an absolute rock star in the world of electronic engineering—Sandra Rivera joins me this week. Sandra and I chat about what drew her to VSORA, an emerging AI accelerator startup from France that is taking the world of AI inference by storm, the importance of inference for AI proliferation, and what sets VSORA apart from the rest of the pack. Oh yeah—and a little bit about llamas, too.

So without further ado, please welcome Sandra to Fish Fry.

Amelia Dalton: Hi Sandra, thank you so much for joining me.

Sandra Rivera: Thank you for having me on, Amelia.

Amelia Dalton: Absolutely. Okay, so let’s talk about VSORA. What attracted you to this emerging AI accelerator startup from France that wants to take on the world of AI inference?

Sandra Rivera: Well, I think you just said it best—an emerging AI inference chip company based in France. Those things don’t typically go together. When I was first introduced to the company and its CEO, it was really out of curiosity. I’ve built these kinds of chips in my previous roles at large semiconductor companies, and I had never heard of VSORA.

When I met the team, I was fascinated. Here’s a company working on bleeding-edge process technology, with leading-edge packaging capabilities, building what is arguably a chip for the most exciting growth area of AI—deployment in the inference space. Training is limited to a few who can afford those massive models, but inference is where AI gets used.

What really stood out was that this small, nimble team had been working together for years and had already delivered around 14 successful chips to market. That’s highly differentiated. Many startups have brilliant engineers, but not a track record of delivering products together. VSORA does.

Amelia Dalton: So why has inference become so important to AI proliferation, and what makes VSORA’s architecture especially valuable for inference?

Sandra Rivera: Great question. Inference is the fastest-growing part of the AI continuum because it’s what happens when AI models are actually deployed—whether in data centers, enterprise environments, or edge devices like robots, autonomous vehicles, and drones.

These environments bring very different constraints: cost per token, power, area, weight, latency, and determinism. Unlike training, which can rely on massive compute and energy resources, inference must operate efficiently in constrained environments.

Latency—time to first response—and determinism—consistent, predictable outcomes—are critical. For example, in robotic surgery, you need the same response every time.

This is why inference is so different from training. There’s also a broader demand for customized, application-specific solutions. General-purpose GPUs can handle inference, but they’re less efficient, more power-hungry, and more expensive compared to purpose-built architectures.

Amelia Dalton: This inference challenge in data center chips is closely watched and highly competitive. So how is VSORA different?

Sandra Rivera: VSORA has focused heavily on solving the “memory wall” problem. In AI systems, you have compute and memory, and moving data between them creates latency, power inefficiency, and cost.

What VSORA has done—through a patented software architecture—is collapse many of the memory layers between the processor and external memory. By fusing operations and optimizing instruction flow through their compiler, memory behaves more like near-memory or registers.

This reduces data movement distance, improves efficiency, and minimizes idle compute time. The result is faster execution, lower power consumption, and better overall performance.

Amelia Dalton: Can you give us an update on VSORA’s recent successful tape-out?

Sandra Rivera: Absolutely—this is a big year for us. The team has a strong history of delivering chips, both in previous ventures and since founding VSORA in 2015. Originally focused on automotive and autonomous vehicles, we pivoted to AI inference in 2022.

We taped out our latest chip at the end of last year and expect silicon back from TSMC in early May. Based on our simulations, we’re anticipating up to 3× performance improvement over leading GPUs at half the power.

This year, we’ll integrate the chip into OEM modules and work with server and rack providers to deliver deployable solutions. We’ll also publish measured results and MLPerf benchmarks later in the year, with customer deployments expected by year’s end.

So yes—2026 is a critical and exciting year for VSORA.

Amelia Dalton: I love it. Okay, before I let you go, let’s talk about your family llama farm. How did that come about?

Sandra Rivera: My husband is a big outdoorsman, and our family loves camping and hiking. On a hunting trip in Colorado, he realized how challenging it was to carry all the gear—especially on the way back. That’s when he discovered llamas as pack animals.

We started with four llamas—“the Fab Four,” as we called them. The kids named them: Dolly Llama, Llama Bean, Drama Llama, and Michello Llama. They carried our tents, food, and water on long hikes through the Sierra Nevada mountains. It was fantastic.

But what started as a hobby turned into an obsession. Today, we have 32 llamas on our farm in the East Bay of Silicon Valley. They all have names and personalities. My husband knows them all—I try to keep up!

The community loves them too. We bring them to local schools, company events, and more. They’re wonderful, gentle animals and surprisingly easy to care for.

Amelia Dalton: That is amazing. Well, Sandra, it was a pleasure having you on. Thank you so much for joining me.

Sandra Rivera: Thank you for having me, Amelia. It was delightful talking with you.

Well folks, I must say—this interview is one of my favorite episodes of all time.

Did you know I have an “Amelia’s Favorite Episodes” playlist on YouTube? It includes my 500th episode with Mayman Aerospace CEO David Mayman, where we discuss jetpacks and the evolution of the Speeder VTOL air utility vehicle.

You’ll also find my conversation with Evan Coopermith on advancements in brain-computer interface technology, including the Neural Latents benchmark challenge and AI Studio’s work in machine learning.

And of course, there’s the Great Shark Café episode, where Rich Stump from Fathom and I explore 3D-printed tracking devices for great white sharks in collaboration with the Monterey Bay Aquarium Research Institute.

Another favorite is the Elephant Edge Challenge episode with Adam Benzion, where we discuss the Open Collar Initiative and efforts to protect vulnerable elephant populations.

Plus, there’s an oldie but a goodie—Vision Tech’s Dirty Dealings in ICs, where I investigate a counterfeit component scandal that impacted over a thousand customers.

You can find all of these and more in the playlist linked on this week’s Fish Fry page or on the EE Journal YouTube channel.

And if you’d like more information about VSORA, I’ve included links in the show notes as well.

Hey—have you checked out EEJournal on social media? You can find us on Facebook, LinkedIn, BlueSky, Mastodon, and of course YouTube, where you’ll find tons of tech content, including our popular Chalk Talk series.

Thanks for tuning in! If you’ve got a hot tip on new technology—or just want to chat—drop me a line at amelia@eejournal.com or leave a comment on EEJournal.

For the week of April 10th, 2026, I’m Amelia Dalton—and you’ve been fried.

Solving the Memory Wall: A Deep Dive into AI Inference with Sandra Rivera

Related

Leave a Reply Cancel reply

featured chalk talk