Bringing AI from research labs to real-life scenarios with Deep Reinforcement Learning

Bringing AI from research labs to real-life scenarios with Deep Reinforcement Learning

What is Deep Reinforcement Learning?

Deep Reinforcement Learning, or DRL, is a key technological foundation of Autonomous Systems. DRL leverages advanced process simulations and a trial-and-error approach to AI training. It enables companies to design, train, and deploy AI agents to tackle a broad scope of real-life scenarios.

“Deep” in DRL refers to its application to Deep Learning (DL), aka Deep Neural Networks (DNN), in opposition to simple (or shallow) neural networks and statistics-based machine learning algorithms. Reinforcement Learning” refers to a training approach where the machine learns through positive and negative reinforcement based on the difference between the AI actual and expected behavior.


By using DRL to train AI agents, a true shift to AI democratization is now happening. It allows any company to design, train and deploy custom AI agents without the need for in-house AI research capabilities.  

Traditional version reinforcement learning for AI training 

Since the breakthrough in DNN in the early 2010s, the most used model to train AIs has been to tune the DNN parameters by comparing its outputs for a set of training inputs with the expected ones.  

For instance, the picture of a dog would be associated with the tag “dog,” the sound “he-lo” with the word “Hello,” and the Italian “Come stai?” with the English “How are you?”. 

This approach requires massive amounts of human tagged data that is both hard to source and error-proneA typical DNN requires between several million to tens of millions of individual tagged training data to train from scratch (i.e., starting from random coefficients in the neural net).  

Unfortunately, most real-life business applications do not have an economical way to gather these training data points.

Why DRL works for real-life business applications 

DRL offers a more practical approach in these situations by letting the AI self-train using advanced process simulation. The AI will execute (initially random) actions on the simulator. The simulator reaction to this action (i.e., its “state”) will be the AI agent input for a new cycle. The state will also serve as a signal for the so-called “reward function” to tune the AI parameters (i.e., the neural net node weights) in a direction more likely to produce a better outcome.  

The loop will then run from hundreds of thousands of times to millions of times until it converges. It will happen much faster than with a real-life system. Depending on the system complexity and simulator performance, we have seen full AI agents (or “brains,” as Microsoft Project Bonsai platform dubbed them) training to happen in a matter of a few hours or less. 

AI training platform for Deep Reinforcement Learning  

To effectively design and train a DRL-based AI agent, it is critical to use an optimized platform for this approach 

Of course, “handmade” approaches using Python, external or custom-made simulators, and additional tools are possible. However, it is much more effective and efficient to leverage platforms such as the Microsoft Project Bonsai. Built specifically for DRL, running on the hyper-scale Azure cloud, scalable, and natively supporting leading simulation platforms, Bonsai offers a ready-to-use platform to help accelerate Autonomous Systems design, training, and deployments. 


Learn more about Deep Reinforcement Learning

This video, the third of our four-part series, will provide you with more information about DRL. 


Additional resources