The JOYCE Project to Equip Machines with Human-Like Perception

Did you ever watch the British television science fiction comedy Red Dwarf? The stage for this tale is the eponymous spaceship Red Dwarf, which is an enormous mining vessel that is 6 miles (10 km) long, 5 miles (8 km) tall, and 4 miles (6 km) wide. Series 1 through Series 8 originally aired on BBC 2 between 1988 and 1999 (somewhat reminiscent of Whac-A-Mole, there were reboots in 2009, 2020, 2016, 2017, and 2020).

The underlying premise follows low-ranking technician Dave Lister, who awakens after being in suspended animation for three million years to find he is the last living human. Dave’s only companions are Holly (the ship’s computer), Arnold Rimmer (a hologram of Lister’s incredibly annoying deceased bunkmate), Cat (a life form that evolved from Lister’s pregnant cat), and Kryten (a pathologically honest service mechanoid they meet on their travels).

All of the characters have… let’s call them foibles. For example, Holly prides himself on the fact he has an IQ of 6,000. Unfortunately, after three million years by himself, he’s become “computer senile,” or as he puts it, “a bit peculiar.”

In one episode, Kryten tries to be helpful by fixing Lister’s “Talkie Toaster,” only to discover that it’s the most annoying machine in the universe (even more so than Marvin the Paranoid Android in The Hitchhiker’s Guide to the Galaxy).

Red Dwarf quickly gained a cult following, which — technically — means I’m a member of a cult. I’m not sure my dear old mother is going to be best pleased to hear this news, so let’s not tell her.

The reason for my dropping Talkie Toaster into the conversation is that, when I’m presenting at a conference on embedded systems, one of the examples I typically use when I’m talking about the concept of augmenting household appliances with embedded speech and embedded vision capabilities is that of an electric toaster.

I don’t think it will be long before you can pick up a new speech and vision-equipped toaster from your local appliance store (or have it flown in by an Amazon drone — see Dystopian Dirigible Deploys Delivery Drones). When you unpack this device and power it on, the first thing it will do is introduce itself and ask your name.

When you eventually come to pop a couple of slices of bread into the toaster, it will recognize who you are, it will identify the type of bread product you are waving around, and it will ask you how you would like this product to be toasted, to which you might reply something like “a little on the darkish side.” Since this is your first foray into toasting together, when the machine returns your tasty delight, it might ask how satisfied you are with its efforts. In turn, you might reply “That’s just right,” or you may offer a suggestion like “Perhaps a hair lighter in the future” or “mayhap a shade darker next time.”

Thereafter, dialog between you and your toaster will be kept to a minimum unless you are bored and wish to strike up a conversation, or you decide to introduce a new element into the mix, like sneaking up on it with a bagel, a croissant, or even a frozen waffle in your hand, in which case it will once again engage you in a little light banter to ascertain your toasting preferences for these new food items.

Similarly, the toaster will do its best to learn the ways in which the various members of your family prefer their toasted consumables to be presented to them. No longer will I forget to check the current settings after my son (Joseph the Common Sense Challenged) has used the household toaster, only to be subjected to the indignity of barely warmed bread. As John Lennon famously said, “You may say that I’m a dreamer, but I’m not the only one.”

Did you happen to catch Amelia Dalton’s recent Fish Fry featuring Bomb-Sniffing Cyborg Locusts and the First Humanoid Robot with Intelligent Vision? The reason I ask is that the robot segment featured Immervision, which is a Montreal-based developer and licensor of patented, wide-angle optics and imaging technology. The thing is that I was recently chatting with Alessandro Gasparini and Alain Paquin from Immervision, where Alessandro is VP of operations and Alain is head of the JOYCE project.

Visualization of the namesake of the JOYCE project (Image source: Immervision)

“The JOYCE project,” I hear you say (metaphorically speaking) in a quizzical voice, “what’s a JOYCE project when it’s at home?” Well, I’m glad you asked, because I’m just about to tell you. First, however, it’s worth noting that Immervision has a 20-year history in optical system design and image processing, with more PhD’s per square foot than you can swing a stick at. In addition to the physicists working on the optics, they have a mix of specialists in math, GPUs, and DSPs working on the image processing.

Immervision’s lenses appear all over the place, such as in broadcast, space probes, robots, surveillance systems, smartphones, wearables, home appliances, and medical systems, including the endoscopes certain demented doctors delight in inserting into any obtainable orifice (I try not to be bitter).

The folks from Immervision say that many companies in the machine vision arena have brilliant people, but that they force them to work in silos. By comparison, since its inception, all of the scientists, technologists, and engineers at Immervision have worked collaboratively to solve challenges. Since they’ve been working this way on myriad designs for the past 20 years, the result is a crack team that can tackle any optical and image processing problem.

The exciting news at the moment is that the folks at Immervision are currently on a mission, which is to equip machines, including robots, with human-like perception. The idea is that to be truly useful to us, machines need to fully understand their environment. You only have to watch a Roomba Robot Vacuum bump into the same chair leg for the nth time to realize that we are currently far from this goal.

All of this leads us to the JOYCE project, which is going to be the first humanoid robot to be developed as a collaboration by the computer vision community.

Immervision unveiled its JOYCE-in-a box development kit at the recent Embedded Vision Summit, which took place 15-16 September 2020. This development kit is available to developers, universities, and technology companies to add additional sensors, software, and artificial intelligence (AI) algorithms to enhance JOYCE’s perception and understanding of her environment to solve computer vision challenges. To bring intelligent vision to computer vision, Joyce comes equipped with three ultra-wide-angle panomorph cameras calibrated to give 2D hemispheric, 3D stereoscopic hemispheric, or full 360 x 360 spherical capture and viewing of the environment.

Of course, we all know that the past 5+ years have seen a tremendous surge in artificial intelligence and machine learning, a prime example being machine vision and object detection and recognition, but we still have a long, long way to go (see also What the FAQ are AI, ANNs, ML, DL, and DNNs?).

And it’s not enough to just be able to see something — true human perception involves the fusion of sight, sound, taste, touch, smell, and all of the other senses available to us. What? You thought there were only five? You can’t trust everything your teachers tell you, is all I can say. In reality, we have at least nine senses, and possibly as many as twenty or more (see also People with More Than Five Senses).

In the case of machines, we can equip them with all sorts of additional sensing capabilities, including radar and lidar and the ability to see in the infrared and ultraviolet and… the list goes on. There’s also a lot of work going on with regard to sensor fusion — that is, the combining of sensory data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources are used individually.

For example, if you feel things go a little “wobbly,” you might worry that you were having a “bit of a turn,” but if you see all of the people around you are physically wobbling, then you might take a wild guess that you were experiencing an earthquake (I know whereof I speak on this one). Similarly, if a robot detects vibration via an accelerometer, it could be experiencing an internal error (e.g., bad servo, slipped gear, stripped cogs), but if it observes other things vibrating in its immediate vicinity, then it may come to a different conclusion.

One problem here is maintaining the chronological relationship between the data from the various sensors. You might think this is easy, but let me offer some perspective (no pun intended). I don’t know about you, but I remember the days when you could watch people talking on television and the words coming out of their mouths were synchronized to the movement of their lips. These days, by comparison, watching a program on TV is sometimes reminiscent of watching a badly dubbed Japanese action film. You would think that with things like today’s high-definition television systems and associated technologies, we could at least ensure that the sounds and images have some sort of temporal relationship to each other, but such is not necessarily the case. The problem is that the sound and image data are now processed and propagated via different channels and computational pipelines.

So, one very clever aspect of all this is the way in which JOYCE employs the latest and greatest in data-in-picture technology, in which meta-information is embedded directly into the pixels forming the images. By means of data-in-picture technology, each of JOYCE’s video frames can be enriched with data from a wide array of sensors providing contextual information that can be used by AI, neural networks, computer vision, and simultaneous localization and mapping (SLAM) algorithms to help increase her visual perception, insight, and discernment.

Immervision is encouraging members of the computer vision community to add their technologies to upgrade JOYCE in a series of international challenges that will help to bring her true value to life. If you wish to follow JOYCE’s progress and the collaboration within the community, you can visit her at JOYCE.VISION and follow her on her social channels.

On the one hand, I’m jolly excited by all of this. On the other hand, I’m reminded of my earlier columns: The Artificial Intelligence Apocalypse — Is It Time to Be Scared Yet? Part 1, Part 2, and Part 3. What do you think? Is the time drawing nigh for me to dispatch the butler to retrieve my brown corduroy trousers?

One thought on “The JOYCE Project to Equip Machines with Human-Like Perception”

Max Maxfield says:

October 8, 2020 at 7:08 am

The first episode of the new TV series “Next” aired just two days ago as I pen these words https://en.wikipedia.org/wiki/Next_(2020_TV_series) The story features a high-functioning AI that gets out “into the wild” — seriously, is it time to be scared yet?

Log in to Reply

The JOYCE Project to Equip Machines with Human-Like Perception

Related

One thought on “The JOYCE Project to Equip Machines with Human-Like Perception”

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk