feature article
Subscribe Now

First Text-to-Speech, Then Text-to-Image, Now Text-to-3D-Animation

After seeing what I’ve just seen, you can consider (what I laughingly call) my mind to be well and truly blown. My poor old noggin is now full of ideas, each one triggering a cascade of considerations. Some of these meandering musings may even be germane to what I’m about to reveal. As usual, of course, we will all have to weed through my rambling waffling and make our own decisions as to what is relevant… or not… as the case might be.

When I was a kid, the height of sophistication with respect to children’s television entertainment was  The Flower Pot Men. This featured two little men called Bill and Ben who were made from flowerpots. Each lived in a big flowerpot at the bottom of an English suburban garden. Between their flowerpot homes was a third character called Little Weed. All three were puppets. Even though you could see the strings, I still thought they were real, living creatures. This show was presented in glorious black-and-white. If you are English and in a nostalgic frame of mind, you can pause, peruse, and ponder the very first episode, Seeds, on YouTube.

As an aside, I bet the creators of this program in 1952 would never have thought in a thousand years that it would still be available for anyone in the world (apart from people’s paradises like China, Russia, and North Korea, of course) to watch 72 years in their future on devices like smartphones and personal computers connected to the globe-spanning internet.

Later, circa the early 1960s, I used to love cartoons like Popeye the Sailor, The Flintstones, The Bugs Bunny Show, Top Cat, The Yogi Bear Show, Deputy Dawg, and The Jetsons, to name but a few. I don’t know if these all started in black-and-white or if they were in color. This is because anyone we knew who had a TV at all had only a black-and-white model.

The thing about these cartoons was that they were all painstakingly hand-created on a frame-by-frame basis. This took lots of people, with some drawing the backgrounds, others creating the outlines of the characters, and still others filling/shading (or coloring) those outlines. I’m a little fluffy about the details, but I think it was sometime in the 1970s that digital computers started to be used to “fill in the gaps.” By this I mean that an animator could draw a character at the beginning of a motion, like jumping in the air, and again at the end of the motion, and then a computer could be used to automatically interpolate and generate the intermediate frames.

I still like 2D cartoons as an art form—I’m not sure if they are easier for young kids to understand than their 3D counterparts—but I have to say that I really love 3D animations. The first fully 3D animated TV series was Veggie Tales, which came out in 1993. I just watched the first episode Where’s God When I’m S-Scared? on YouTube.

Meanwhile, the first 3D animated feature film was Toy Story, which took the public consciousness by storm in 1995. There was a lot of behind-the-scenes wrangling about this film, including production shutdowns and a complete transformation of many of the characters. Most of what I vaguely remember about this I learned in the Steve Jobs biography by Walter Isaacson.

I know that computers were so limited in memory and raw computational power at that time (which is only around 30 years ago as I pen these words) that the animators had 100+ computers running 24 hours a day. Each frame could take anywhere from 45 minutes to 30 hours to render depending on how complex it was. As a result, Pixar was able to render less than 30 seconds of film per day. Furthermore, they didn’t have the computational capability or time to generate shadows (did you even realize that there are no shadows in Toy Story 1?). Ultimately, Toy Story required 800,000 machine hours and 114,240 frames of animation in total. These were divided across 1,561 shots that totaled over 77 minutes of finished film. 

Now, of course, we are used to seeing 3D graphics—with shadows—being rendered on the fly for such applications as computer games, virtual reality (VR), and mixed reality (MR) (see Are You Ready for Mixed Reality?).

And, of course, artificial intelligence (AI) is now making its presence felt all over the place. For example, when I attended Intel Architecture Day 2021 (see Will Intel’s New Architectural Advances Define the Next Decade of Computing?), out of the myriad things that boggled my brain, one that really stuck out was that—instead of rendering computer games at 4K resolution—they had a graphics chip that could render at 1080p and then use on-chip AI to upscale to 4K on a frame-by-frame basis… in real-time!!! As you can see in this video, the results are astonishingly good.

One of the first in a growing suite of “text-to-xxx” applications was text-to-speech. Noriko Umeda et al. developed the first general English text-to-speech system in 1968 at the Electrotechnical Laboratory in Japan. On the one hand, this was amazing; on the other hand, it was only a hint of sniff of a whiff of what was to come. Consider today’s Generative Voice AI offering from the guys and gals at llElevenLabs, for example. Bounce over to their website and play a few samples. Personally, I wouldn’t be able to tell whether this was a person or a program doing the talking.

A more recent development is text-to-image, such as the Generative AI Stable Diffusion model. Almost unbelievably, at least to me, is the fact that, as I wrote in Generative AI Is Coming to the Edge, it’s now possible to get your own personal Stable Diffusion running on a USB-based “stick” equipped with 16GB of memory and an Ara-2 AI chip from the chaps and chapesses at Kinara. I’m hoping to lay my hands on one of these bodacious beauties to help me create pencil sketch illustrations for the Life of Clive book I’m currently writing to tell the tale of my formative years.

And so, finally, we come to the crux of this column.  3D animation content consumption has seen a drastic increase over the past few years. In fact, the 3D animation market is expected to grow at a CAGR of nearly 12%, reaching more than $62B by 2032. Furthermore, the way in which 3D animation content is created is experiencing significant evolution due to new 3D animation technologies, such as those that can be used by anyone from any device through video or simple text in the form of text-to-3d-animation.

This is the point where I’d like to introduce you to a company called DeepMotion. I was just chatting with Kevin He, who is the Founder and CEO. Prior to DeepMotion, Kevin served as CTO of Disney’s mobile game studio, Technical Director of ROBLOX, and Senior Engine Developer of World of Warcraft at Blizzard, so he knows a thing or two.

The tagline on DeepMotion’s website is “Bringing Digital Humans to Life With AI.” Their first offering was Animate 3D, which uses AI to create 3D animations from video. All I can say is that you must see this to believe it, so it’s fortunate that I’m in the position to show you a video.

To be honest, if this was all DeepMotion had to offer, I’d still say it’s more than enough. I’m gasping in astonishment and squealing in delight, but there’s more. The folks at DeepMotion have recently announced their text-to-3D-animation offering in the form of SayMotion. Yes, of course there’s a video.

This really is rather amazing. You select a character, type in a text prompt, and “Bob’s your uncle” (or aunt, depending on your family dynamic). I’m speechless, which isn’t something I expect to say often (no pun intended), so I’ll turn things over to you. Do you have any thoughts you’d care to share on any of this?

5 thoughts on “First Text-to-Speech, Then Text-to-Image, Now Text-to-3D-Animation”

  1. And “text-to-article” is what, maybe a couple weeks or months away ? 🙄

    AI engines are cropping up like mushrooms after the rain.
    I’ve just seen groq do this:
    [Groq Labs: Project Know-It-All](hhtps://www.youtube.com/watch?v=QE-JoCg98iU)

Leave a Reply

featured blogs
Apr 26, 2024
LEGO ® is the world's most famous toy brand. The experience of playing with these toys has endured over the years because of the innumerable possibilities they allow us: from simple textbook models to wherever our imagination might take us. We have always been driven by ...
Apr 26, 2024
Biological-inspired developments result in LEDs that are 55% brighter, but 55% brighter than what?...
Apr 25, 2024
See how the UCIe protocol creates multi-die chips by connecting chiplets from different vendors and nodes, and learn about the role of IP and specifications.The post Want to Mix and Match Dies in a Single Package? UCIe Can Get You There appeared first on Chip Design....

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

IoT Data Analysis at the Edge
No longer is machine learning a niche application for electronic engineering. Machine learning is leading a transformative revolution in a variety of electronic designs but implementing machine learning can be a tricky task to complete. In this episode of Chalk Talk, Amelia Dalton and Louis Gobin from STMicroelectronics investigate how STMicroelectronics is helping embedded developers design edge AI solutions. They take a closer look at the benefits of STMicroelectronics NanoEdge-AI® Studio and  STM32Cube.AI and how you can take advantage of them in your next design. 
Jun 28, 2023
34,706 views