Deep learning in computer vision starts with data – Part I

Deep learning in computer vision starts with data – Part I

Applied deep learning problems in computer vision start as data problems.  

In the planning stages of a deep learning problem, the team is usually excited to talk about algorithms and deployment infrastructure. More effort is needed to discuss the tradeoffs between various approaches and algorithms. Eventually, the project gets off the ground, but then the team often runs into a roadblock. They realize that the data available to train the deep learning models is insufficient to achieve good model performance. To move forward, the team needs to collect more data.  

Sound familiar?  

Whether the problem deals with image classification, object detection, or localization, at the core of every deep-learning vision algorithm is a large collection of labeled images. But developing a data collection strategy is a step that is often overlooked when tackling deep learning problems in computer vision. Make no mistake – compiling a quality dataset is one of the largest challenges in a successful applied deep learning project. In this article, we will cover the characteristics of a high-quality dataset for deep-learning computer vision models. 

What makes a good training dataset for deep learning in computer vision? 

A good dataset for deep learning projects has three keys: quality, quantity, and variety. 

Key considerations for a good training dataset


What does it mean for a vision dataset to be high quality? A high-quality vision dataset is one that accurately reflects the scenario being modeled. It means that the data should represent the environment, lighting conditions, camera angles, and distances found in the target location.   

A quality dataset has distinguishable examples of the target subject. As a general guideline, if you cannot identify your target subject by looking at an image, then neither can an algorithm. There are notable exceptions to this guideline, such as recent advances in facial recognition, but it’s a good starting point. If the target object is tough to see, consider adjusting the lighting or camera angle. You may also consider adding a camera with optical zoom to enable closer images with greater detail of the subject. 

quality comparison of lego blocks for computer vision


Generally, the more labeled instances available for training models, the better. Instances refer to not just the number of images, but the examples of a subject contained in each image. Sometimes an image may contain only one instance, as is typical in classification problems such as problems classifying images of cats and dogs. In other cases, there may be multiple instances of a subject in each image. For an object detection algorithm, having a handful of images with multiple instances is much better than having the same number of images with just one instance in each image. The reason is that each instance provides additional value to the algorithm. 

single vs multiple instance examples of lego block images


The more variety a dataset has, the more value that dataset can provide to the algorithm. A deep learning vision model needs variety to generalize to new examples and scenarios in production. Failure to collect a dataset with variety can lead to overfitting and poor performance when the model encounters new scenarios. For example, a model that is trained based on daytime lighting conditions may show good performance on images captured in the day but will struggle under nighttime conditions. Models may also be biased if one group or class is overrepresented in the dataset. This is common in face detection models. In a recent study, the National Institute of Standards and Technology (NIST) found that most facial-recognition algorithms show inconsistent performance across subjects that vary by age, gender, and race. Having a dataset with good variety not only leads to good performance but also helps address potential issues related to consistent performance across the full range of subjects. 

low vs high variety image examples of legos

Building a high-quality dataset is essential for successful deep-learning projects in computer vision. A good dataset should have a balance of quality, quantity, and variety. Having quality means data should be representative of the target scenario, with distinguishable examples of the target subject. Quantity is important to provide enough labeled instances for training vision models, while variety is crucial for generalizing to new examples and scenarios in production. Collecting a good dataset can be a challenging task, but the effort can pay off greatly because data is often the key to improved performance in deep learning models. In the next article, I will cover strategies to make data collection more effective and efficient. 


Further reading: 

This blog was last published on 7/2/2020 and has since been updated.