Understanding the difference between supervised and reinforcement learning for deep neural networks
The traditional AI training method is rarely applicable in real-life business use cases simply because it requires an amount of data seldom available to businesses. Using the self-training mechanism of deep reinforcement learning, Autonomous Systems using Microsoft Project Bonsai open the door to the countless applications of AI without the need for this preexisting training dataset.
So, what is the difference between traditional supervised learning vs. deep reinforcement learning?
Understanding the traditional labeled training data-based supervised learning approach
A Deep Neural Network (DNN) based AI is comprised of thousands to billions (with a “b”) of nodes (aka “neurons”) organized in potentially tens of layers. They are structured in an ever-growing architecture variety, from the decades-old perceptron architecture to convolutional ones particularly effective with vision AI use cases, to LSTM (Long/Short Term Memory), to transformers architectures that became the norm for language processing use cases such as machine translation, and more.
Regardless of the architecture, an AI developer selects, the most common approach to train these models has been to use a large set of labeled data, aka training data, to teach the AI by comparing expected outputs (the human-labeled or verified data) with the AI output. Then, this difference is used through a traditional gradient descent algorithm, to tune the DNN parameters to provide the best results for a set of also human-labeled test data.
Gradient descent is a traditional algorithm that finds the minima of a cost function by “descending” along this function’s slope (i.e., “gradient”). Conceptually, it is like a marble using gravity (gradient) to reach the lowest point (descent) of a convex surface (cost function). In AI, this cost function is the mathematical representation of the quality of the neural network output vs. its expected output.
Reinforcement learning vs. supervised learning
As indicated above, supervised learning will use pre-existing human-created (e.g., a human translation), labeled (e.g., a picture description), or verified (e.g., crowdsourced captioning) data to train an AI model.
The illustration below shows what this means in three typical examples:
- For the acoustic model of a speech recognition AI, the sound of someone saying, “I like bananas” is labeled by a human into “I li ke ba na na s” (using the appropriate phonetic model, of course). Both the sound and its associate phonemes will be one pair out of the millions needed to train a speech recognition model
- Similarly, a vision model to tag a fruit image is trained with pictures of banana associated with an existing human-generated tag of “banana”. We also need tagged images for various banana angles, sizes, colors, partial visibility, etc. Likewise, we need similar tagged pictures for strawberries, oranges, and any other fruits the model should be able to recognize.
- For machine translation, millions of full human-translated sentences are needed to train a model.
Besides supervised learning, there are two other possible approaches to train an AI: Unsupervised learning and reinforcement learning (or Deep Reinforcement learning, when applied to deep neural networks).
Unsupervised learning is only applicable for a limited subset of AI use cases. It uses a large amount of untagged data and lets the system learn from this data. For instance, it can be very effective to train an AI to classify items such as pictures. It won’t know that these 5,000 pictures are pictures of tables, but it will know they’re of a similar object, vs. these 3,000 from – say – a car. Its exact mechanisms and applicability are outside of this article’s scope, so we will not drill further on the topic of unsupervised learning there.
Conversely, deep reinforcement learning (DRL) is an exciting approach that broadens the scope of AI applicability to countless real-life business use cases.
In a nutshell, instead of training an AI by comparing an idea output with the actual one like supervised learning does, DRL lets the system learn by itself through trial and error.
The DRL approach
As depicted below, and contrary to the supervised learning approach, in DRL the AI is trained using a so-called reward function that will simply compare a system state following an action taken by an AI agent and use a reward and punishment mechanism to provide feedback to the AI agent allowing to self-tune its parameters.
The initial AI agent actions will be very erratic as every new DNN starts with random parameters or, at best, with parameters for a roughly similar but not exactly alike process. Also, the number of times the action-state-reward training loop will run in the hundreds of thousands to millions of cycles. Therefore, it is not possible to train an AI agent on a real-life production system. So, DRL training always involves the use of advanced process simulators to (pre)train the AI agent.
Obviously, DRL is not an approach that is applicable to all AI use cases. However, it is an exciting approach for many process optimization situations such as:
- Production Yield Optimization such as extruder operations, robotic control, chemical processes, and more.
- Supply chain optimization
- Building energy management
- And many more
Designing, training, and deploying DRL-based AI agents
Designing, training (with a simulator), and deploying a deep reinforcement learning trained AI is a complex proposition. From designing the right AI architecture to defining the appropriate reward function to building an accurate simulator to training the AI agent at scale and deploying it requires the tools and the right competencies.
Neal Analytics leverages the Microsoft Project Bonsai platform to manage its DRL-built Autonomous Systems AI agent projects. Building on 10 years of cloud, data, AI, and IoT/Edge experience, we know how to help customers leverage the power of deep reinforcement learning to help solve concrete business challenges with AI.
- Learn how PepsiCo makes the perfect Cheetos with the help of Autonomous Systems
- How DRL works for real-life business applications
- Machine Teaching and DRL provides new opportunities to manufacturers aiming to optimize complex processes
- More about Autonomous Systems
- Learn more about Microsoft Project Bonsai