feature article
Subscribe Now

Synthetic Images for AI Training

Tools and Assets from Mindtech

The upshot: Mindtech provides a capability for creating fully annotated synthetic training images to complement real images for improved AI training.

We’ve spent a lot of time looking at AI training and AI inference and the architectures and processes used for each of those. Where the AI task involves images, we’ve blithely referred to the need for training sets; that’s easy, right?

After all, if you’re trying to train your algorithm to recognize a dog, then just give it a bunch of pictures of dogs (OK, tag them with, “This one contains a dog”) and then a bunch of pictures without dogs (“This one contains no dog”), and off you go! Right? 

And the behemoths like Google and Facebook have oodles of images and videos (videos being collections of frames, each of which is an image), thanks to the free stuff willingly served up by unsuspecting users (including images now and 10 years ago to help improve aging algorithms). That will easily give those guys an edge in AI so that they can, at no cost, once and for all, eliminate all competition for anything ever. Right?

Harder than it Looks

Well… maybe not quite right… for any of those. Even with the last one, the fact that they have images is a benefit, but there’s still a ton of work to do. According to Mindtech, there are six big challenges with assembling a training set of images.

  • It can cost a lot to assemble and annotate a set of images. Yeah, the big guys may already have free images, but they aren’t useful until annotated.
  • Annotation takes a long time. Mindtech says that it can take 75 minutes per frame to edit a video. You’d think you could just tag it with “dog,” but no. You have to tag all of the images in the video for use in multiple jobs. So, one or more dogs; a car; a house; stairs; a window… all called out in each frame with the locations of each item. For 1D items, you have to specify pixel by pixel.
  • You need to be sure that the images are free of copyright infringements and that it’s ok to use and store them without running afoul of GDPR rules (at least where Europe or Europeans are concerned).
  • You have to make sure that the training set doesn’t introduce bias into the resulting trained model. If most of the images are of golden retrievers, that’s a poor way to train for dog recognition.
  • Annotation accuracy is hard to get 100% right. It takes more time and money to do that; mistakes will mistrain a model.
  • It can be hard to obtain some real-world models. For automotive model training, Mindtech gave a couple of examples where it’s either hard or unsafe to create a scenario:
    • Creating a video showing kids riding bicycles right on the edge of a sidewalk (important for a model to catch, since, if that happened, the bike is more likely to fall into the roadway).
    • Having a horse run across a roadway. Probably not a likely scenario in the middle of town (unless a police horse or one of those tour carriage horses got loose); it’s much more relevant in the country (along with cows, deer, elk, etc.)

So… what’s the answer? Is this just a tough problem that’s part of the AI reality? Not according to Mindtech. With their Chameleon tools, they provide a capability for creating synthetic images that will be suitable for training. Does this mean you can punt entirely on the real images? Well, that’s not necessarily what they’re recommending. They see synthetic images as a complement to real images, but in a way that can significantly bulk up a training set with less – but definitely not no – work.

Synthetic Images

To understand what they do, we need to define a few terms that, for them, have very specific meanings. You start by obtaining what they call “assets”: models of… things. Buildings, people, cars, lightpoles, animals – pretty much anything. You’re going to use these in a simulation.

(Click to enlarge. Image courtesy Mindtech.)

You start by creating a “scene”: this is, if you will, the background or the locus of activity in the simulation. The scene is built out of assets, but it’s static. Next, you select “actors” from the assets. These are elements that will do something. You might have a child that runs into the road. There might be a ball that bounces across the road. There might be a car parked along the side of the road that suddenly opens its door in front of a driver. Yeah, lots of these are automotive. They’re not restricted to automotive, but it’s a hot field these days, and it’s easy to visualize.

Once you have the scene and actors placed, then you create a “scenario”: this is where the actors perform some specified act. Because the assets include behaviors, you can have them “do” something. For instance, that car parked along the side of the road can open the driver-side door into the roadway. If it’s a really good model, you’ll even see the interior of the car once it’s been opened (useful for an algorithm that’s supposed to do this in the real world).

Finally, you establish some initial conditions that can be automatically implemented. You may set the time of day (for varying the illumination); you can assign a geographic location; you can set the weather. Once you have this, you run a simulation. That simulation executes the behaviors in a synthetic video. Because all of the assets are explicitly placed and operated, the simulator can identify where everything is within each frame and annotate it accordingly.

(Click to enlarge. Image courtesy Mindtech.)

All of this takes some time to do. Creating assets – 3D models of things – is also time-consuming, although these may already be available from Mindtech or a partner. But you get leverage from the time spent. Mindtech ran the following numbers: “… for one basic scene, with 5 different scenarios, each being replayed with ten different times of day and four different weather settings, you get 200 variations. Capture 1000 images from each variation and you get 200k images from that single simulation run. If you place multiple cameras, that number is multiplied up again.”

These synthetic images can then be merged with real images to improve the training data set. Together, training proceeds as normal. Reporting helps to assess the quality of the set, including bias. It’s up to the person creating the scenario to add diversity so as to reduce bias, but Mindtech suggests that it’s easier to do synthetically than it would be to create real-world images with little bias. I have this niggling feeling that, the more images you generate from a single scene, the more the collection is biased to that scene, so you probably need to generate lots of scenes to keep that from happening.

(Click to enlarge. Image courtesy Mindtech.)

Improved AI Results

They showed some results both for facial recognition and for automotive applications. When recognizing faces, adding one million synthetic images to some number of real images gave a significant boost in accuracy. Note that using both real and synthetic images provides better results than synthetic alone, even though the number of synthetic images is larger – sometimes far larger – than the number of real images. It’s probably obvious that the more real images there are as a percentage of the total, the less improvement the synthetic images provide.

It’s important to note that the data below was not generated by Mindtech; it’s from a paper unrelated to their specific product (see the attribution note). Their goal was to show, from independent work, that using synthetic images is a legitimate way to improve accuracy.

(Click to enlarge. Image courtesy Mindtech.
Data source: “Can Synthetic Faces Undo the Damage of Dataset
Bias To Face Recognition and Facial Landmark Detection?” Kortylewski et al https://arxiv.org/pdf/1811.08565.pdf)

Meanwhile, they compared automobile and pedestrian detection using only an industry benchmark set (KITTI*) versus supplementing with synthetic images created from their Highwai pack of assets. They didn’t specify the number of images, but over half of them were synthetic. Car detection improved by 6.9%; pedestrian detection by 8.4%.

(Click to enlarge. Image courtesy Mindtech.)

I’m going to take it on faith that the quality of results will be a function of the quality of the assets. If people are drawn as stick figures and trees as lollipops, then it’s probably not going to be helpful.  A wide variety of realistic images and complex scenes and scenarios takes more time to create, but you get a better bang for that buck than trying to find, get permission to use, and annotate images that cover all of the possibilities that you can get by rolling your own.


*KITTI is a combination of KIT – Karlsruhe Institute of Technology – and TTI – Toyota Technological Institute.


More info:


Sourcing credit:

Chris Longstaff, VP Product Management, Mindtech



One thought on “Synthetic Images for AI Training”

Leave a Reply

featured blogs
Dec 1, 2020
If you'€™d asked me at the beginning of 2020 as to the chances of my replicating an 1820 Welsh dresser, I would have said '€œzero,'€ which just goes to show how little I know....
Dec 1, 2020
More package designers these days, with the increasing component counts and more complicated electrical constraints, are shifting to using a front-end schematic capture tool. As with IC and PCB... [[ Click on the title to access the full blog on the Cadence Community site. ]...
Dec 1, 2020
UCLA’s Maxx Tepper gives us a brief overview of the Ocean High-Throughput processor to be used in the upgrade of the real-time event selection system of the CMS experiment at the CERN LHC (Large Hadron Collider). The board incorporates Samtec FireFly'„¢ optical cable ...
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...

featured video

AI SoC Chats: Protecting Data with Security IP

Sponsored by Synopsys

Understand the threat profiles and security trends for AI SoC applications, including how laws and regulations are changing to protect the private information and data of users. Secure boot, secure debug, and secure communication for neural network engines is critical. Learn how DesignWare Security IP and Hardware Root of Trust can help designers create a secure enclave on the SoC and update software remotely.

Click here for more information about Security IP

featured paper

Exploring advancements in industrial and automotive markets with 60-GHz radar

Sponsored by Texas Instruments

The industrial and automotive markets have a tremendous need for innovative sensing technologies to help buildings, cities and automobiles sense the world around them and make more intelligent decisions.

Click here to read the article

Featured Chalk Talk

Transforming 400V Power for SELV Systems

Sponsored by Mouser Electronics and Vicor

Converting from distribution-friendly voltages like 400V down to locally-useful voltages can be a tough engineering challenge. In SELV systems, many teams turn to BCM converter modules because of their efficiency, form factor, and ease of design-in. In this episode of Chalk Talk, Amelia Dalton chats with Ian Masza of Vicor about transforming 400V into power for SELV systems.

Click here for more information about Products by Vicor