Advanced simulations, the key to successful Deep Reinforcement Learning-based AI deployments

Advanced simulations, the key to successful Deep Reinforcement Learning-based AI deployments

Deep reinforcement learning (DRL) is an AI training methodology that, instead of using a set of preexisting training data set, lets the AI learn on its own through trial and error. Therefore, as it is almost never possible to do that on the live system, Deep Reinforcement Learning requires the use of advanced simulators to effectively pre-train the AI agent before deployment.  

There are multiple ways to build such simulators and which approach will be the most appropriate for a project will depend on the particular use case. At the highest abstraction level, we can categorize simulators types into five main approaches. Three of them are focusing on in-house “build” strategies, two are “buy” strategies. 

5 strategies to build advanced simulators:

  1. Physics-based
  2. Custom software
  3. Off-the-shelf simulation software packages
  4. Custom-built deep learning AI
  5. Digital twins

The remainder of this article will provide a brief introduction to the five approaches, and the embedded video will allow you to explore each of those further.

Simulation strategies to train AI agents using Deep Reinforcement Learning 

Physics-based simulations 

When systems are of limited complexity and well understood, one option is to use physicsbased simulators. This approach leverages well-known physics rules to build a very accurate simulation of a real-life system. However, these approaches can quickly become extremely complex when the system encompasses more than one device, process, or piece of equipment.  

An example of such an approach is this robotic arm simulator that Sberbank research labs used to train their AI agent.  

Custom software simulations 

When the system does not require advanced physics to be simulated, it can be relatively simple to build custom simulators using standard programming languages such as Python. 

However, very rarely are real-life systems simple enough for that approach to be a viable solution.  

Off-the-shelf simulation software packages 

The most popular approach is to leverage existing software packages that provide extensive libraries to simulate broad systems types spanning from discrete processes, process manufacturing, supply chain, and more. 

There are, of course, quite a few players in that space. However, two of the most popular ones used for Autonomous Systems DRL training are AnyLogic and Simulink.


An important element to keep in mind is that these platforms support a wide variety of modeling techniques. Deciding whether to leverage their capabilities or not is more a “build vs. buy” decision than a modeling approach selection one.

These platforms support many modeling techniques, including the aforementioned physics-based and custom models as well as many others. Therefore, it is a simulation strategy decision, not a simulation technique selection one. Project leads need to decide which option between a “build” from scratch or a “buy” from simulation experts is the most appropriate from a business and technology strategy standpoint.

Custom-built deep learning AI simulations 

However, not every system can be modeled using physics-based or simulation software packages. In these situations, an option is to develop a custom AI that will not simulate the behavior of every element in the system. Still, just the outputs the system produces for every input.  

This kind of black-box approach obviously requires a large amount of training data. This requirement by itself can be quite limiting for certain use cases. However, it allows the simulator to be abstract the system complexity while still delivering an effective simulation for DRL training purposes.  

To go around the issue of training data, the best option is to measure the system’s real-life inputs and outputs as it is today functioning to quickly create a large-scale training data set. Capturing these measures may involve using additional technologies to effectively capture this data, for instance, by using a vision AI to capture an output visual aspect parameters 

For instance, in the Cheetos customer story, Neal Analytics built and an AI to simulate the combination of the extruder and baking process. This was the only effective and workable solution to train the Project Bonsai AI brain. To train this simulator, the Neal team leveraged a custom vision AI developed by the PepsiCo team to programmatically measure the Cheetos’ visual characteristics coming out of the oven.  

To learn more about this project, please refer to this customer story:

Cheetos image from Microsoft AI

Digital twins 

The last type of simulation strategy is to leverage existing digital twins that manufacturers may provide when they supply their equipment. However, those twins will only be available for certain pieces of equipment, certain manufacturers, and most likely only for their most recent devices.  

Also, even if a digital twin is available, the AI agent’s system needs to control is often comprised of multiple pieces of equipment. Therefore, for digital twins to work for DRL training purposes, a mechanism must be found to stitch all those twins together in one overarching simulation. This is often a hard proposition, especially as some elements might be missing.

For instance, if the system has three components but only two of them have a digital twin, not only the customer will need to create a dedicated simulation for the third element, but they will also need to find a way to combine inputs and outputs of the three simulators in one overarching model.  

 Soon, as more digital twins are developed, and more and standardization becomes more common on how those are developed, it should be more easily possible to create digital twinbased system-level simulators.  

Video: Using simulations for Deep Reinforcement Learning training

This video, the fourth one in our fivepart series on Autonomous System, provides more details about the five types of simulations used to train using deep reinforcement learning for Autonomous Systems. These simulators can then be integrated into the Microsoft Project Bonsai platform as part of the end-to-end AI agent design, training, and deployment process. 

Additional reference material: