feature article
Subscribe Now

The JOYCE Project to Equip Machines with Human-Like Perception

Did you ever watch the British television science fiction comedy Red Dwarf? The stage for this tale is the eponymous spaceship Red Dwarf, which is an enormous mining vessel that is 6 miles (10 km) long, 5 miles (8 km) tall, and 4 miles (6 km) wide. Series 1 through Series 8 originally aired on BBC 2 between 1988 and 1999 (somewhat reminiscent of Whac-A-Mole, there were reboots in 2009, 2020, 2016, 2017, and 2020).

The underlying premise follows low-ranking technician Dave Lister, who awakens after being in suspended animation for three million years to find he is the last living human. Dave’s only companions are Holly (the ship’s computer), Arnold Rimmer (a hologram of Lister’s incredibly annoying deceased bunkmate), Cat (a life form that evolved from Lister’s pregnant cat), and Kryten (a pathologically honest service mechanoid they meet on their travels).

All of the characters have… let’s call them foibles. For example, Holly prides himself on the fact he has an IQ of 6,000. Unfortunately, after three million years by himself, he’s become “computer senile,” or as he puts it, “a bit peculiar.”

In one episode, Kryten tries to be helpful by fixing Lister’s “Talkie Toaster,” only to discover that it’s the most annoying machine in the universe (even more so than Marvin the Paranoid Android in The Hitchhiker’s Guide to the Galaxy).

Red Dwarf quickly gained a cult following, which — technically — means I’m a member of a cult. I’m not sure my dear old mother is going to be best pleased to hear this news, so let’s not tell her.

The reason for my dropping Talkie Toaster into the conversation is that, when I’m presenting at a conference on embedded systems, one of the examples I typically use when I’m talking about the concept of augmenting household appliances with embedded speech and embedded vision capabilities is that of an electric toaster.

I don’t think it will be long before you can pick up a new speech and vision-equipped toaster from your local appliance store (or have it flown in by an Amazon drone — see Dystopian Dirigible Deploys Delivery Drones). When you unpack this device and power it on, the first thing it will do is introduce itself and ask your name.

When you eventually come to pop a couple of slices of bread into the toaster, it will recognize who you are, it will identify the type of bread product you are waving around, and it will ask you how you would like this product to be toasted, to which you might reply something like “a little on the darkish side.” Since this is your first foray into toasting together, when the machine returns your tasty delight, it might ask how satisfied you are with its efforts. In turn, you might reply “That’s just right,” or you may offer a suggestion like “Perhaps a hair lighter in the future” or “mayhap a shade darker next time.”

Thereafter, dialog between you and your toaster will be kept to a minimum unless you are bored and wish to strike up a conversation, or you decide to introduce a new element into the mix, like sneaking up on it with a bagel, a croissant, or even a frozen waffle in your hand, in which case it will once again engage you in a little light banter to ascertain your toasting preferences for these new food items.

Similarly, the toaster will do its best to learn the ways in which the various members of your family prefer their toasted consumables to be presented to them. No longer will I forget to check the current settings after my son (Joseph the Common Sense Challenged) has used the household toaster, only to be subjected to the indignity of barely warmed bread. As John Lennon famously said, “You may say that I’m a dreamer, but I’m not the only one.”

Did you happen to catch Amelia Dalton’s recent Fish Fry featuring Bomb-Sniffing Cyborg Locusts and the First Humanoid Robot with Intelligent Vision? The reason I ask is that the robot segment featured Immervision, which is a Montreal-based developer and licensor of patented, wide-angle optics and imaging technology. The thing is that I was recently chatting with Alessandro Gasparini and Alain Paquin from Immervision, where Alessandro is VP of operations and Alain is head of the JOYCE project.

Visualization of the namesake of the JOYCE project (Image source: Immervision)

“The JOYCE project,” I hear you say (metaphorically speaking) in a quizzical voice, “what’s a JOYCE project when it’s at home?” Well, I’m glad you asked, because I’m just about to tell you. First, however, it’s worth noting that Immervision has a 20-year history in optical system design and image processing, with more PhD’s per square foot than you can swing a stick at. In addition to the physicists working on the optics, they have a mix of specialists in math, GPUs, and DSPs working on the image processing.

Immervision’s lenses appear all over the place, such as in broadcast, space probes, robots, surveillance systems, smartphones, wearables, home appliances, and medical systems, including the endoscopes certain demented doctors delight in inserting into any obtainable orifice (I try not to be bitter).

The folks from Immervision say that many companies in the machine vision arena have brilliant people, but that they force them to work in silos. By comparison, since its inception, all of the scientists, technologists, and engineers at Immervision have worked collaboratively to solve challenges. Since they’ve been working this way on myriad designs for the past 20 years, the result is a crack team that can tackle any optical and image processing problem.

The exciting news at the moment is that the folks at Immervision are currently on a mission, which is to equip machines, including robots, with human-like perception. The idea is that to be truly useful to us, machines need to fully understand their environment. You only have to watch a Roomba Robot Vacuum bump into the same chair leg for the nth time to realize that we are currently far from this goal.

All of this leads us to the JOYCE project, which is going to be the first humanoid robot to be developed as a collaboration by the computer vision community.

Immervision unveiled its JOYCE-in-a box development kit at the recent Embedded Vision Summit, which took place 15-16 September 2020. This development kit is available to developers, universities, and technology companies to add additional sensors, software, and artificial intelligence (AI) algorithms to enhance JOYCE’s perception and understanding of her environment to solve computer vision challenges. To bring intelligent vision to computer vision, Joyce comes equipped with three ultra-wide-angle panomorph cameras calibrated to give 2D hemispheric, 3D stereoscopic hemispheric, or full 360 x 360 spherical capture and viewing of the environment.

Of course, we all know that the past 5+ years have seen a tremendous surge in artificial intelligence and machine learning, a prime example being machine vision and object detection and recognition, but we still have a long, long way to go (see also What the FAQ are AI, ANNs, ML, DL, and DNNs?).

And it’s not enough to just be able to see something — true human perception involves the fusion of sight, sound, taste, touch, smell, and all of the other senses available to us. What? You thought there were only five? You can’t trust everything your teachers tell you, is all I can say. In reality, we have at least nine senses, and possibly as many as twenty or more (see also People with More Than Five Senses).

In the case of machines, we can equip them with all sorts of additional sensing capabilities, including radar and lidar and the ability to see in the infrared and ultraviolet and… the list goes on. There’s also a lot of work going on with regard to sensor fusion — that is, the combining of sensory data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources are used individually.

For example, if you feel things go a little “wobbly,” you might worry that you were having a “bit of a turn,” but if you see all of the people around you are physically wobbling, then you might take a wild guess that you were experiencing an earthquake (I know whereof I speak on this one). Similarly, if a robot detects vibration via an accelerometer, it could be experiencing an internal error (e.g., bad servo, slipped gear, stripped cogs), but if it observes other things vibrating in its immediate vicinity, then it may come to a different conclusion.

One problem here is maintaining the chronological relationship between the data from the various sensors. You might think this is easy, but let me offer some perspective (no pun intended). I don’t know about you, but I remember the days when you could watch people talking on television and the words coming out of their mouths were synchronized to the movement of their lips. These days, by comparison, watching a program on TV is sometimes reminiscent of watching a badly dubbed Japanese action film. You would think that with things like today’s high-definition television systems and associated technologies, we could at least ensure that the sounds and images have some sort of temporal relationship to each other, but such is not necessarily the case. The problem is that the sound and image data are now processed and propagated via different channels and computational pipelines.

So, one very clever aspect of all this is the way in which JOYCE employs the latest and greatest in data-in-picture technology, in which meta-information is embedded directly into the pixels forming the images. By means of data-in-picture technology, each of JOYCE’s video frames can be enriched with data from a wide array of sensors providing contextual information that can be used by AI, neural networks, computer vision, and simultaneous localization and mapping (SLAM) algorithms to help increase her visual perception, insight, and discernment.

Immervision is encouraging members of the computer vision community to add their technologies to upgrade JOYCE in a series of international challenges that will help to bring her true value to life. If you wish to follow JOYCE’s progress and the collaboration within the community, you can visit her at JOYCE.VISION and follow her on her social channels.

On the one hand, I’m jolly excited by all of this. On the other hand, I’m reminded of my earlier columns: The Artificial Intelligence Apocalypse — Is It Time to Be Scared Yet? Part 1, Part 2, and Part 3. What do you think? Is the time drawing nigh for me to dispatch the butler to retrieve my brown corduroy trousers?


One thought on “The JOYCE Project to Equip Machines with Human-Like Perception”

Leave a Reply

featured blogs
Oct 30, 2020
[From the last episode: We saw that converters are needed around an analog memory to convert between digital and analog parts of the circuit.] We'€™ve seen that we can modify a digital memory a number of ways to make it do math for us. Those modifications include: Using the...
Oct 30, 2020
I like to do the (London) Times crossword most days. For more information on how cryptic crosswords even work, see my offtopic post Aren't All Crosswords Cryptic? There's also a blog where... [[ Click on the title to access the full blog on the Cadence Community si...
Oct 29, 2020
Autumn is shaping up to be a popular time for digital trade shows this year, and OCP Tech Week will be occurring November 9 – 13, 2020. OCP Tech Week 2020 will provide Engineering Workshops, live lectures, and interactive collaboration sessions. During this digital trad...
Oct 28, 2020
You rarely get to hear people of this caliber talk in this '€œfireside chat'€ manner, so I would advise younger engineers to take the time to listen to these industry luminaries....

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

See Inuitive’s NU4000 3D imaging and vision processor in action. The SoC supports high-quality 3D depth processor engine, SLAM accelerators, computer vision, and deep learning by integrating Synopsys ARC EV processor. In this demo, the NU4000 demonstrates simultaneous 3D sensing, SLAM and CNN functionality by mapping out its environment and localizing the sensor while identifying the objects within it. For more information, visit inuitive-tech.com.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

An engineer’s guide to autonomous and collaborative industrial robots

Sponsored by Texas Instruments

As robots are becoming more commonplace in factories, it is important that they become more intelligent, autonomous, safer and efficient. All of this is enabled with precise motor control, advanced sensing technologies and processing at the edge, all with robust real-time communication. In our e-book, an engineer’s guide to industrial robots, we take an in-depth look at the key technologies used in various robotic applications.

Click here to download the e-book

Featured Chalk Talk

RX23W Bluetooth

Sponsored by Mouser Electronics and Renesas

Adding Bluetooth to your embedded design can be tricky for IoT developers. Bluetooth 5 brings a host of new capabilities that make Bluetooth integration more compelling than ever. In this episode of Chalk Talk, Amelia Dalton chats with Michael Sarpa from Renesas about the cool capabilities of Bluetooth 5, and how you can easily integrate them into your next project.

More information about Renesas Electronics RX23W 32-bit Microcontrollers