Understanding the difference between supervised and reinforcement learning for deep neural networks

Understanding the difference between supervised and reinforcement learning for deep neural networks

The traditional AI training method is rarely applicable in real-life business use cases simply because it requires an amount of data seldom available to businesses. However, AI agents can also be trained using the self-training mechanism of deep reinforcement learning. This approach opens the door to countless industrial AI applications without needing a preexisting training dataset.  

So, what is the difference between traditional supervised and deep reinforcement learning?   

Understanding the traditional labeled training data-based supervised learning approach 

A Deep Neural Network (DNN) based AI has thousands to billions (with a “b”) nodes (aka “neurons”) organized in potentially tens of layers. In the most complex cases, such as with the recently released GPT 3.5 model, the number of parameters reaches 175 billion. The trillion parameters mark is a real possibility for this model’s next version.  

Those DNNs are architectured in an ever-growing variety of approaches: from the decades-old perceptron architecture to convolutional ones particularly effective with vision AI use cases to LSTM (Long/Short Term Memory), to transformers ones. The latter has proven so effective beyond its original application in language processing use cases such as machine translation or other Natural Language Processing (NLP) use cases.  

Regardless of the DNN architecture an AI developer selects, the traditional AI training approach uses a large set of labeled data (training data) to train the AI by comparing expected outputs (human-labeled or verified data) with actual AI outputs. Then, this difference is used through a gradient descent algorithm to tune the DNN parameters.  

Gradient descent is a traditional algorithm that finds the minima of a cost function by “descending” along this function’s slope (i.e., “gradient”). Conceptually, it is like a marble using gravity (gradient) to reach the lowest point (descent) of a convex surface (cost function). In AI, this cost function is the mathematical representation of the quality of the neural network output vs. its expected one. 

The graphic below illustrates this concept for a two-parameters cost function. As mentioned above, DNNs have millions to billions of parameters, but the mathematical concept is the same.  

Gradient descent method

Source: Medium – Gradient descent  

Reinforcement learning vs. supervised learning 

Supervised learning uses preexisting human-created (e.g., a human translation), labeled (e.g., a picture description), or verified (e.g., crowdsourced captioning) data to train an AI model.  

The illustration below shows what this means in three typical examples:  

  • For the acoustic model of a speech recognition AI, the sound of someone saying, “I like bananas,” is labeled by a human as “I li ke ba na na s” (using the appropriate phonetic alphabet, of course). Both the sound and its associate phonemes will be one pair out of the millions needed to train a speech recognition model  
  • Similarly, a vision model to tag a fruit image is trained with pictures of bananas associated with an existing human-generated tag of “banana.” We also need tagged photos for various bananas’ angles, sizes, colors, partial visibility, etc. Likewise, we need similar tagged pictures for strawberries, oranges, and any other fruits the model should be able to recognize.  
  • For machine translation, millions of human-translated sentences are needed to train a model. 


Supervised learning concept

Besides supervised learning, there are a few other possible approaches to training an AI: unsupervised, adversarial, and reinforcement learning. The latter is called deep reinforcement learning (DRL) when applied to deep neural networks.   

Unsupervised learning is only applicable to a limited subset of AI use cases. It uses a large amount of untagged data and lets the system learn from this data. For instance, it can be very effective to train an AI to classify items such as pictures. It won’t know that these 5,000 pictures are pictures of tables, but it will know they’re of a similar object, vs. these 3,000 from – say – a car. Its exact mechanisms and applicability are outside this article’s scope, however.  

Adversarial neural networks are DNNs trained by pegging one AI against its “twin.” This is, for instance, how Alpha go was trained. It’s unlikely that this approach could apply in any real industrial scenario. 

The third option, deep reinforcement learning (DRL), is an exciting approach that broadens the scope of AI applicability to countless real-life business use cases.   

In a nutshell, instead of training an AI by comparing an ideal output with the actual one, as supervised learning does, DRL lets the system learn by itself through trial and error.  

The DRL approach 

As depicted in the diagram below, and contrary to the supervised learning approach, in DRL, the AI is trained using a so-called reward function. This function measures a system state following an action taken by an AI agent. It then uses a reward and punishment mechanism to provide positive or negative feedback to the AI agent. It allows the agent to self-tune its parameters time.   

The initial AI agent actions will be erratic as every new DNN starts with random parameters or, at best, with parameters for a roughly similar but not exactly alike process. Also, the number of times the action-state-reward training loop will run is in the hundreds of thousands to millions of cycles. Therefore, training an AI agent on a real-life production system is impossible. So, DRL training always involves using advanced process simulators to (pre)train the AI agent. 

Supervised Learning vs. DRL comparison architecture

Obviously, DRL is not an approach that is applicable to all AI use cases. However, it is an exciting approach for many process optimization situations such as: 

Designing, training, and deploying DRL-based AI agents 

Designing, training (with a simulator), and deploying a DRL-trained AI is a complex process. Designing the right AI architecture, defining the appropriate reward function, building an accurate simulator, training the AI agent at scale, and deploying it requires the right tools and skills.   

Building on over ten years of cloud, data, AI, and IoT/Edge experience, we know how to help clients leverage the power of deep reinforcement learning to help solve concrete business challenges with AI.  

Learn more: 

This blog was originally published on 6/8/2021 and has since been updated.