Gruesome Gambols Gripping Generative AI (Part 3)

What the What (WTW)? This is my new favorite expression. I picked it up yesterday when my wife (Gina the Gorgeous) and I binge-watched the 4-part Happy Shiny People: Duggar Family Secrets documentary on Amazon Prime Video. This was a mix of happy (a little), sad (a lot), and frightening (a whole lot) for us.

Sad because Gina and I both enjoyed watching the original 19 Kids and Counting programs (we thought the Duggar family was a little wackadoodle, but in a nice enough way). Frightening because we now know how the family was embroiled in a controversial Christian fundamentalist organization and homeschooling empire called the Institute in Basic Life Principles (IBLP), which was founded by a sleazy slimeball called Bill Gothard. Just the small part of the curriculum I saw made me want to cry because any kids who were homeschooled using this program essentially had no useful education whatsoever. And happy (to a very small extent) because at least some people have managed to extract themselves from the grips of the IBLP, although I fear they will be forever damaged and traumatized by their experiences.

Speaking about being happy, sad, and frightened at the same time leads us to the main thrust of this column, because what I’m seeing with respect to generative artificial intelligence (AI) is eliciting all of these emotions in me. In Part 1 of this mini-series, we briefly touched on the origins of today’s AI systems, along with the fact that we are now in an era in which the computational requirements for high-end AI models are doubling every 3.4 months (eek!). Also, we introduced the concept of generative AI, which refers to an AI capable of generating text, images, or other media in response to prompts.

In Part 2, we briefly touched on some of the cool things and some of the worrying things associated with the generative AI called ChatGPT. One of the amazing things is that, although ChatGPT was first presented to the world in November 2022, which is only around six months ago as I pen these words, its impact was so great that—within only a couple of weeks following its release—almost everyone on the planet had heard about it and was talking about it, including my 92-year-old mother and her friends!

For the remainder of this column, I would love to be able to present a coherent story with an underlying thread that links all my points in a logical and seamless manner. Sadly, that’s not going to happen, because new items of AI-related news are now being announced multiple times a day. It’s as much as I can do to ride the crest of the wave and not get sucked under. Things aren’t helped by the fact that myriad thoughts are bouncing around my poor old noggin, each clamoring for their 15 seconds of fame, as it were. So, let’s simply plunge headfirst into the fray with gusto and abandon (and aplomb, of course) and see where the winds of fate take us.

Although this first tidbit of trivia isn’t strictly related to what we now think of as generative AI per se, I think it was one of the first things that set a small warning bell tinkling in the back of my mind. This was six years ago back in 2017 when I heard about Google’s experiments using neural networks to “enhance” pixelated images.

Examples of the “Pixel Recursive Super Resolution” process (Source: Google)

In this case, an AI model is trained on a bunch of higher resolution images. Later, when presented with highly pixelated (8×8) images, the model uses what it learned from the images on which it was trained to generate higher resolution versions (32×32 in this case). (My understanding is that it looks through countless real-world images to work out what combinations of higher resolution pixels it would take to generate the lower resolution pixels in the pixelated images.) When we compare these generated images to the “ground truth” images (the images of the real people), I must admit I would find it hard to say which was the original and which was the generated version. On the one hand, this is very clever indeed. On the other hand, I can easily imagine negative scenarios involving “Big Brother” governments (China, Russia, North Korea … even America, if we aren’t careful) in the not-so-distant future.

Also in 2017, I started to hear about an AI called Lyrebird that can listen to someone speaking (or a recording of someone speaking) for a couple of minutes, extract a “digital signature” for that voice, and then be used in a text-to-speech mode to sound like the targeted person saying whatever you want them to say. Early vocal versions purporting to be Barack Obama, Donald Trump, and Hilary Clinton can be heard in this TechCrunch column from 2017. I can only imagine how much more realistic they sound now.

I’m not sure when I first started hearing about deepfake videos. One incarnation of this technology involves digitally manipulating a video to replace one person’s likeness convincingly with that of another, as depicted in this YouTube video from 2021.

Another approach, which I feel to be more insidious, involves training an AI using multiple videos of one person, like a politician, for example. In addition to watching the video, the AI also listens to the words, deciding if the speaker is happy, angry, sad, and tying these emotions to expressions, eye blinks, muscle twitches, etc. The AI also determines which phonetic vocal sounds are associated with various muscle movements. Later, when given a new speech (which could have been generated by an AI like Lyrebird, for example), the deepfake AI can generate a corresponding video of the talking head on a frame-by-frame basis.

When photography was invented, people used to say, “A photo never lies.” It didn’t take long before they realized the error of their ways. More recently, the term “photoshopped” entered our vernacular. As a result, we now think nothing of seeing the strangest images. For example, look at The Winners of the Greatest Photoshop Battles Ever (100 Pics) article on Bored Panda from a few years ago. Today, as we discussed in Part 1, it’s possible to use AI-enhanced tools to swap out a dull and dreary sky for a spectacular skyscape, or quickly and easily remove objects and people from photos. And now we have deepfake videos. Where is this all going to end?

I don’t know about you, but I try to be polite to anyone who calls me on the phone, even if it’s someone who wishes to inform me that my car’s warranty has run out (it hasn’t) and they have a great deal to offer me. One problem is that it’s getting to be harder and harder to know if you are talking to a real person or a talkative chatbot. If I feel I may not be talking to a human, I tend to drop a surrealist question into the conversation, along the lines of: “Do you think the cabbages are flying south early this year?” (According to the Life Hacker column How to Tell If You’re Chatting with a Bot, this technique is called “Pulling a ‘Crazy Ivan’.”) If the voice on the other end pauses, and then picks up where it left off, I can be pretty sure I’m conversing with a chatbot. Alternatively, if the response is, “What on Earth are you waffling about?” Then I know I’m talking to a real person (well, that’s true today, but who knows how things will be tomorrow?).

Did you hear that, as reported by Gizmodo, Wendy’s Is Bringing a Google-Powered AI Chatbot to Its Drive-Thru? In addition to cutting back on expensive human workers, this chatbot will do its best to upsell you (“I bet some hash browns would go really well with that sausage biscuit” – I’m sorry, now I come to think about it, this is what I always say to myself).

A few years ago, I heard about an AI-powered application that could be integrated into your glasses. It would listen to the same things you were listening to and determine if you were conversing with a person or another AI. If the latter, it would alert you to this fact by using the Peltier effect to cool the temple tips of the arms on your glasses.

I love reading science fiction books and watching science fiction movies. These books and movie scripts were written by people. As I pen these words, the members of the Writers Guild of America are on strike. There are many reasons for this strike, including problems with pay and conditions. There’s also the fact that, although many of these writers admit to using chatbots as an aid, they are concerned about the possibility that companies will start to use AIs to generate complete scripts.

This isn’t as far-fetched as it sounds. Just a couple of weeks ago, I read a New York Post article titled Author Uses AI Generators, Including CHATGPT, to Write Nearly 100 Books in Less Than a Year (I feel we are using the word “author” in its loosest sense). My understanding is that these AI creations aren’t great in many areas, like character development, but they are getting better day-by-day. Suppose that, a couple of years from now, anyone will be able to use an AI to write a complete book in a couple of hours or generate a complete movie in a couple of days? Is it art? (What is art?) Would the Mona Lisa be as intrinsically wonderful as it is had it been generated by an AI (if not, why not?).

I had so many plans for these columns. I’ve been jotting copious notes down for the past couple of weeks. The scary thing is that I haven’t yet used any of these notes. I’m going to try to be strong (and brave) and make my next column the last in this series (at least, for now). In the meantime, do you have any thoughts you’d care to share?