Image Segmentation using Deep Learning
The evolution in IoT (Internet of Things) devices and advancement of edge computing with AI capabilities has laid down convincing path to implement computer vision tasks with machine learning applications.
Let us understand the process of analyzing images for localization, detection, and segmentation.
Images are set of different pixels. Image segmentation groups the pixels with similar attributes. It is the next step forward after image classification and localization.
What is image segmentation?
Image segmentation can be classified into two groups:
- Semantic segmentation: Here every single pixel is either part of desired class or the background class i.e., all the pixels belonging to the class are represented by same color.
- Instance segmentation: Here every single pixel is assigned to class, but multiple objects of same class have different colors.
There are multiple methods of image segmentation i.e., traditional approach vs. deep learning-based approach. Traditional approaches mostly focus on local and gradients difference in pixels.
Traditional approaches to image segmentation
Traditional approaches often perform better than deep learning-based approaches when limited datasets are available. Some of the popular traditional techniques are:
Threshold method: Here, a threshold is set for dividing pixels into two classes. Pixels with value less than threshold are set to 0 and higher are set to 1.
Region-based segmentation: This technique looks for similarities between adjacent pixels and grouping them under same class.
Cluster-based segmentation: A clustering algorithm, such as K-means, takes all pixels into consideration and clusters pixels with common attributes together into “K” classes.
Deep learning-based approaches to image segmentation for object detection
A deep learning-based approach uses Convolutional Neural Network (CNN) and gives a faster way of detecting objects. There are different architectures to segment images.
This diagram below depicts the use of CNN for object detection:
In a CNN-based approach, image is passed to the network, and it goes through various convolution and pooling layers, and finally we get the output as object class. The CNN-based approach offers advantages of automatic feature extraction using convolution of images, though it can also be more computationally expensive.
Region-based CNN (RCNN): RCNN divides the image into multiple regions and then applies convolution on each image to classify them into various classes. Applying convolution on each region results in high computational time and makes the process slow.
Fast RCNN: RCNN was running a convolution network on all regions, Fast RCNN takes entire image at once and gets all regions containing same object or region of interest. A pooling layer is applied to all regions of interest to reshape and pass them to fully connected network and softmax layers gives the output classes.
Faster RCNN: It uses Region Proposal Network or RPN (Region Proposal Network) as selective search for generating region of interest. RPN takes image feature maps as input and creates a set of proposed objects.
Fast and effective image segmentation for object detection
Using object detection, we build a bounding box for each class or object in the image, but we do not know about the shape of the objects in the image. Consider a scenario of a self-driving car, if we create a rectangular portion of a turn, it would be of no use as it does not suggest anything about shape.
Object detection techniques
Refinement of object detection techniques leads to faster and effective segmentation of images.
Mask RCNN: Mask RCNN was extended from Faster RCNN object detection architecture. Mask RCNN added another branch which gives objects masks along with its identified class and bounding box co-ordinates. Along with instance segmentation it can be used for many applications, for example cattle counting. The diagram below depicts the architecture and working of Mask RCNN.
Source: Mask RCNN example
In Mask RCNN, images run through the CNN to generate feature map, Region Proposal Network generate Region of Interest (RoI). After getting multiple bounding boxes from RoI features are passed through fully connected layers to make classification, masking, boundary box prediction. Key point to note here is that anchor boxes are used to detect multiple objects, overlapping objects. These boxes give scale and aspect ratio of particular object.
U-Net Architecture: Semantic segmentation uses encoder decoder architecture. Usually on the encoder side there is a combination of convolutional layers and downsampling network to reach up to pixel level and then decoder reconstructs pixel information. U-Net was initially developed for medical semantic image segmentation. It utilizes successive contracting layers, which are immediately followed by upsampling for higher resolution outputs.
U-Net is like SegNet, the shape formed by architecture in the form of ‘U’ labeled it U-Net architecture. As shown in the diagram above, the input image is passed through the model and then convolutional layers with ReLU activation function. The use of unpadded convolutions resulted in reduced dimensionality. There is an encoder on the left side followed by decoder block on the right side. With max pooling and increasing number of filters in layers, encoder block reduced the image size. After the decoder level, it started gradual upscaling with decreasing number of filters.
Key point to note is the skipping connection that connects the previous outputs with the layers in decoder blocks. Skip connection preserves the loss from previous layers, leading to faster model convergence to produce better results. Final convolution layer has two filters with the required functions to display the output. The final layer can be changed as per requirements.
There are numerous applications of image segmentation as it divides the visual data into segments to perform segment specific tasks. Some of the uses of image segmentation are:
Robotics: Image segmentation aids perception and movement of robots or machines by pointing out the objects in their path of motion, to understand the environment and change paths.
Medical imaging: Image segmentation helps doctors to locate features in a fast and accurate manner. It can be used for cancer cell detection, tumor detection etc.
Self-driving cars: Semantic and instance segmentation plays key role in identifying the road patterns, turns, pothole and vehicle detection on road and drivable surface identification.
Now that you know how image segmentation, detection, localization works, the complexity attached and use cases of each process, it is easy decide the requirement and implementation. Neal Analytics has already provided similar solutions to PepsiCo and can assist you to choose the right tool and develop end-to-end solutions for the same.
External links and references:
- Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
- Automated cattle counting using Mask R-CNN in quadcopter vision system
- U-Net: Convolutional Networks for Biomedical Image Segmentation