Keys for developing an RL-ready simulator  

Keys for developing an RL-ready simulator  

Before diving deep into the concept of using simulators to solve Reinforcement Learning (RL) real-world problems, let’s understand the basics of RL.   

What is Reinforcement Learning? 

Reinforcement Learning (RL) is an area of machine learning that enables an agent to learn in an interactive environment (physical or simulated) by trial and error. The RL agent uses feedback from its own actions and experiences which will earn rewards or penalties. The RL agent’s goal is to maximize its reward.  

For example, in the game of PacMan, the goal of the agent (PacMan) is to eat the food in the grid while avoiding the ghosts on its way. Here, the grid world is an interactive environment for the agent. The agent receives a reward for eating food and gets punished if it gets killed by the ghost (loses the game). The states are the different positions of PacMan in the grid world and the total cumulative reward is PacMan winning the game. 

As RL requires a very high volume of “trial and error” episodes or interactions within an environment, using simulators is a good strategy to achieve results in a cost-effective and timely way.  

What is a simulator? 

A simulator is a program that depicts a real-life situation using a virtual environment, often for the purpose of instruction or experiment, such as a vehicle simulator.  

During the last 20 years, increasing computing power and data have allowed simulations to significantly increase in accuracy and value. Simulations are widely used in various industries and processes like manufacturing, robotics, energy, HVAC, and supply chain. 

Why is a simulation necessary for RL?   

    • Acts as a proxy for a real-life process   
    • Allows an RL agent to explore in a sandboxed environment (won’t break equipment)   
    • Much faster than a real process  

What makes a simulator RL-ready?  



    • Quick runtime: RL agents need many hundreds of thousands of iterations to properly learn. Long runtimes run up the training cost and turnaround time.   
    • Allow for input of actions: The RL agent will provide a set of actions or instructions for the simulator to perform. The simulator must be able to receive these inputs and act accordingly.   
    • Provide visibility into the internal state: Just as the simulator needs to be able to accept inputs from the agent, the simulator must also provide information about its state to the brain. This is used to calculate things like the reward function and terminal conditions.   
    • State variables and observations must match the capabilities of the real system: We typically have access to much more information with a simulator than we do in the real world. If we train a brain using observations from the sim that we can’t actually measure or observe in real life, deployment will fail as the agent is relying on those observations.   

Common pitfalls  

    • The simulator doesn’t accurately represent the process: This seems like a no-brainer but needs to make sure the “sim to real gap” is as tight as possible. 
    • Steady-state timing: If a real-life process takes about 15 minutes to reach steady-state, the simulator must take this into account (doesn’t have to take 15 real minutes, but 15 “sim” minutes). The simulator can’t just suddenly reflect the new state, or the agent won’t have a realistic understanding of the impact of its actions on the environment. 



 The use of simulators for training AI agents is critical to solving various complex RL problems. Also, selecting a correct simulator strategy for your system is very important when it comes to training performance, speed, and cost.    

Though simulators bring a lot of advantages for data scientists and engineers working on developing a system, you’ve to consider the pitfalls that emerge during carrying out the processes too.  

The Microsoft Project Bonsai platform helps enterprises to build a BRAIN (an AI model), connect the simulator of their choice, and finally train the brain to learn the desired behavior. Neal Analytics Autonomous Systems experts team has worked with Microsoft AI and Sberbank to develop a unique AI robot control system that moves heavy coin bags with an accuracy of 95% by leveraging the Microsoft Project Bonsai platform.  


Additional reference material: