Digital Twins Promote Predictive Maintenance

Industrial facilities are jam-packed with expensive machines of all shapes and sizes. If something goes pear-shaped with one of these little scamps, it can bring a production line to its metaphorical knees. Fortunately, the folks at MathWorks have the 21st century equivalent of a crystal ball that can help predict when the machines might fail.

I love the smell of fresh machine oil in the morning. It may sound strange, but I really enjoy wandering around an industrial facility heeding the sounds, observing the machines in action, and having a multitude of heavy engineering aromas playfully tickle my nostrils.

There are trillions upon trillions of dollars’ worth of machines in factories scattered around the world. These bodacious beauties range from simple (albeit potentially ginormous) pumps, motors, and generators, to humongously complex bodacious beasts that measure tens of meters in size, to sophisticated robots performing tasks too numerous to mention.

Of course, there are also countless machines running wild and free outside the confines of a factory. Many of these little rascals live a hard life toiling in extreme conditions. All any of these machines ask for is a little tender loving care in the form of maintenance to keep them up and running, but how is one to know which machines need what maintenance? “Ah, therein lies the rub,” as the Bard might have said had he turned his attention to the problems inherent in keeping an industrial society up and running.

Grizzled Old Curmudgeonly Engineers

One technique that has served us well for generations is to have humans with domain expertise on hand. I’ve spent more time than was good for me wandering round factories with grizzled old curmudgeonly engineers who seem to have an uncanny ability to determine when something is not as it should be (much like my dear 90-year-old mother, whose memory remains so sharp that she sometimes remembers things that haven’t even happened yet).

Anything and everything can provide clues to these folks – a small puddle of fluid glistening in a corner, an unexpected stain around a gasket, an unusual smell, or a tiny puff of steam from 50 paces. Sometimes they pause and cup their ear because they’ve heard something that “isn’t quite right” – possibly a low-frequency throb or a high-pitched squeak. Alternatively, they may casually rest a hand on a machine in passing, leading them to detect an unusual vibration. Sometimes they are alerted by something subliminal informing them that something, somewhere is going awry.

The problem is that these guys and gals are not as plentiful as one might hope, especially if you are the manager of a factory and your much anticipated (and possibly pre-spent) bonus depends on keeping everything up and running.

Reactive, Pre-emptive, and Predictive Maintenance

As an alternative, or perhaps an adjunct, to sprockets of curious curmudgeonly engineers ambling around — where “sprocket” is the collective noun for a gaggle of engineers (other acceptable alternatives are “an awkward,” “a design,” and “a geek”) – the simplest maintenance strategy is called “reactive maintenance” (a.k.a. “run-to-fail” or “run-to-failure”). In this case, the machines are deliberately allowed to run until they crash and burn, at which point the maintenance engineers arrive on the scene to pick up the pieces.

Another traditional approach is to employ “pre-emptive maintenance.” In this case, parts are checked and/or replaced at regular time intervals or after a predetermined number of working hours. Desiring to minimize the risk of unscheduled downtime, pre-emptive maintenance needs to occur more frequently than the average time between failures. This means that, whilst a pre-emptive maintenance strategy is effective, it also incurs a cost in terms of components, time, and resources.

A technique that is more closely aligned with 21st century aspirations is that of “predictive maintenance.” The idea here is to use sensors to continuously monitor the machine’s operation and – based on their observations – estimate the current state of health of the system and predict how it will behave in the future. An example of an ideal case would be for the predictive maintenance algorithm to detect an anomaly or a gradual change in some system parameter, and to issue a report along the lines of: “Component X in machine Y is experiencing wear and needs to be replaced within 72 hours.” In addition to optimizing engineering resources and lowering costs, since parts are replaced only on an “as needed” basis, predictive maintenance facilitates the scheduling of maintenance activities to fit in with production requirements.

Artificial Intelligence, Machine Learning, and Deep Learning

Relative newcomers to the predictive maintenance party are artificial intelligence (AI), machine learning (ML) and deep learning (DL). For the purposes of these discussions and to keep things simple, we will take the term AI to encompass the concepts of ML and DL (see also What the FAQ are AI, ANNs, ML, DL, and DNNs?)

There is, of course, much more to all of this than we can hope to cover here. Suffice it to say that training an AI system for the purposes of predictive maintenance typically requires a lot of sensor data to be gathered from the machine in question. Before anything else occurs, this data needs to be “cleaned,” which includes removing noise, outliers, and invalid values, and also interpolating missing values as required.

The next problem is the sheer amount of data that has to be processed. If you have tens of sensors being sampled hundreds of times a second over periods of hours, days, weeks, or months, you can easily end up with gigabytes or terabytes of data. One solution here is “feature extraction,” in which the original data is analyzed and a set of derived values called “features” are extracted. In some cases, a terabyte of data can be boiled down into a few hundred features. Performing feature extraction facilitates data storage, data transmission, data search and retrieval, the ability to compare with other data, and the ability to train AI systems.

One more issue is that of collecting “bad data.” We typically start by collecting “good data” from a happy and healthy machine. Following deployment, the AI can detect anomalies and deviations from this good data, and it can also spot trends along the lines of, “The temperature from this sensor is rising at X°C each day.” If the AI is aware of the maximum allowable temperature, it can augment its initial observations with a prognosis along the lines of, “Unless something is done, this machine will fail within 96 hours.”

In addition to good data, the capabilities of the AI can be dramatically enhanced if it is also provided with bad data captured from a distressed machine. In some cases, this is relatively easy. For example, I’m currently in the process of building an AI model using NanoEdge AI Studio from Cartesiam (see also Why, Hello FPGA and AI — How Nice to See You Together!).

This little ragamuffin (my AI model, not NanoEdge AI Studio), which is being deployed on an Arduino Nano 33 IoT, uses a current sensor to monitor the waveform profile of the current being used to power my household vacuum cleaner. I’m also hoping to use the Wi-Fi capabilities of the Nano 33 IoT. The idea is that if, by some strange quirk of fate, my son (Joseph the Common-Sense Challenged) happens to be hoovering the house, my AI system will send me an email or a text message if the dust collecting chamber needs to be emptied. I can only imagine Joseph’s surprise when I call him and say, “I think this would be a good time for you to empty the vacuum cleaner.”

The good data was easy enough to capture. The problem was that – not thinking – I’d emptied the canister before I commenced. Since I had taken the vacuum cleaner into my office for the purposes of this experiment, I was loath to have anyone come into the rec room to find me rooting around in the trash container gathering dust to stuff back into my vacuum cleaner (they think I’m odd enough already). As an alternative, I simply used a disk of paper to obstruct the main filter intake. (I’m going to compare this data with that gathered from an overfilled container later.)

The point is that it was relatively easy for me to fake bad data with my vacuum cleaner, but what do you do in the case of a ginormous (in size) and humongous (in cost) industrial machine? Generally speaking, it would be impractical to wait for the beast to fail of its own accord, not the least that – if you wait for it to fail – then having the ability to predict its failing might be considered to be superfluous to requirements.

Using a Testbench Machine

Despite what I just said, in some cases, using a real machine to generate “known bad data” may end up being the optimum solution.

Although not particularly common, it’s also not unknown for companies to take a known good machine, mount it on a test bench, festoon it with sensors, and then inject faults and monitor the results. These results are captured, cleaned, analyzed, feature-extracted, tagged, and used to train the AI. The idea is that each type of fault will generate its own unique “signature” embedded in the sensor data. In the future, when the AI observes a similar signature in the field, it can say, “Ah Ha! I’ve seen this sort of thing before. This means the screws attaching the main sprocket assembly are working loose.”

Using Deployed Machines

Earlier, I said that it would be impractical to wait for a machine to fail of its own accord. Actually, this isn’t strictly true, especially if we are talking about large numbers of such machines deployed around the world.

The idea here is that, over a longer period of time, we monitor the health of lots of machines and store the data in the cloud. Whenever a machine experiences a problem that has to be resolved by a maintenance team, that problem is associated with the captured data. Ideally, it will be possible to extract features from the data that are uniquely related to this particular type of problem.

Given a sufficient number of machines, each suffering health problems in its own way, it will be possible to build a rich repository of “bad data” over time, where this bad data will help protect other machines in the future.

Create a Digital Twin

The term “digital twin” refers to a digital replica of a living or non-living physical entity. In the context of these discussions, the simplest path to a digital twin would be to create a physics-based model of a machine.

Ideally, such a model will replicate the physical aspects and attributes of its real-world counterpart as accurately as possible. Now, as opposed to training an AI system using the real machine, it can be trained using that machine’s digital twin. In addition to good data, bad data can be generated by injecting faults into the digital twin.

Furthermore, as machines fail in the real world, either on test benches, in factories, or in other locations, in addition to augmenting the AI’s training, the faults can be replicated in the digital twin. In many cases, it may be advantageous to “tweak” the digital twin to more accurately match the real-world results. In a strange quirk of fate, the optimization of the model’s myriad parameters may be best effected by another AI, but that’s a story for a future column.

According to Wikipedia, a fully-fledged digital twin will “…integrate IoT, artificial intelligence, machine learning and software analytics with spatial network graphs to create living digital simulation models that update and change as their physical counterparts change. A digital twin continuously learns and updates itself from multiple sources to represent its near real-time status, working condition or position. This learning system learns from itself, using sensor data that conveys various aspects of its operating condition; from human experts, such as engineers with deep and relevant industry domain knowledge; from other similar machines; from other similar fleets of machines; and from the larger systems and environment of which it may be a part. A digital twin also integrates historical data from past machine usage to factor into its digital model.” I couldn’t have said it better myself.

Predicting the Future with MathWorks

The reason for my wafflings here is that I was just chatting with Aditya Baru, Senior Product Marketing Manager at MathWorks, whose flagship product is MATLAB. Even engineers who haven’t used MATLAB are typically aware that it’s a multi-paradigm numerical computing environment and programming language that facilitates things like matrix manipulations, implementation of algorithms, creation of user interfaces, interfacing with programs written in other languages, and the plotting of functions and data.

MATLAB’s capabilities can be enhanced by the concept of Toolboxes. Now, although I was aware of things like the Curve Fitting Toolbox, the Control System Toolbox, and the DSP System Toolbox, I was surprised to learn that the folks at MathWorks have added new capabilities, such as the Statistical and Machine Learning Toolbox, the Deep Learning Toolbox, and … wait for it … wait for it … the Predictive Maintenance Toolbox.

And then there’s Simulink, which can either drive MATLAB or be scripted from it. Simulink is a MATLAB-based graphical programming environment for modeling, simulating, and analyzing multidomain dynamical systems. Simulink’s primary interface is a graphical block diagramming tool and a customizable set of block libraries. In this case, I was surprised to learn that there’s an add-on to MATLAB and Simulink called Simscape, which provides tools for modeling and simulating multidomain physical systems.

The combination of MATLAB, Simulink, and Simscape allows us to do pretty much everything we discussed above. We can take humongous data files collected from multiple sensors, import this data into MATLAB, clean the data, perform feature extraction, and use the results to train AI systems. We can also create digital twins of the machines and use these models to both train and refine our AI systems, thereby equipping them with the ability to provide extremely sophisticated predictive maintenance capabilities.

I, for one, am tremendously excited by all of this. In addition to the obvious advantages of predictive maintenance for industry, I don’t think it will be long before every appliance in our homes – microwave ovens, dishwashers, washing machines, dryers, water heaters, HVACs, the list goes on – are equipped with predictive maintenance capabilities. Personally, I’ve had enough cold showers to last me a lifetime, so I cannot wait for this brave new world.

As always, I would love to hear your comments, questions, and suggestions. In the meantime, if you are interested in further reading, may I suggest the following: