Deep learning in computer vision starts with data

Whether the problem deals with image classification, object detection, or localization, at the core of every deep learning vision algorithm is a large collection of labeled images. But developing a data collection strategy is a step that is often overlooked when tackling deep learning problems in computer vision.

Deep learning in computer vision starts with data

Applied deep learning problems in computer vision start as data problems.

In the planning stages of a deep learning problem, the team is usually excited to talk about algorithms and deployment infrastructure. Much effort is spent discussing the tradeoffs between various approaches and algorithms. Eventually, the project gets off the ground, but then the team often runs into a roadblock. They realize that the data available to train the deep learning models are not sufficient to achieve good model performance. To move forward the team needs to collect more data.

Sound familiar?

Whether the problem deals with image classification, object detection, or localization, at the core of every deep learning vision algorithm is a large collection of labeled images. But developing a data collection strategy is a step that is often overlooked when tackling deep learning problems in computer vision. Make no mistake – compiling a quality dataset is one of the largest challenges in a successful applied deep learning project.

In this article we will cover:

  • What makes a good dataset for deep learning in computer vision.
  • How to build a good dataset to enable your deep learning projects.

What makes a good dataset for deep learning in computer vision?

A good dataset for deep learning projects has three keys: quality, quantity, and variety.


What does it mean for a vision dataset to be high quality? To start, data must be representative of the scenario. Quality images will replicate the lighting, angles, and camera distances that would be found in the target location.

A quality dataset has distinguishable examples of the target subject. As a general guideline, if you cannot identify your target subject by looking at an image, then neither can an algorithm. There are notable exceptions to this guideline such as recent advances in facial recognition, but it’s a good starting point. If the target object is tough to see, consider adjusting the lighting or camera angle. You may also consider adding a camera with optical zoom to enable closer images with greater detail of the subject.

quality comparison of lego blocks for computer vision


Generally, the more labeled instances available for training vision models the better. Instances refers to not just the number of images, but the examples of a subject contained in each image. Sometimes an image may contain only one instance as is typical in classification problems such as problems classifying images of cats and dogs. In other cases, there may be multiple instances of a subject in each image. For an object detection algorithm, having a handful of images with multiple instances is much better than having the same number of images with just one instance in each image. The reason is that each instance provides additional value to the algorithm.

single vs multiple instance examples of lego block images


The more variety a dataset has, the more value that dataset can provide to the algorithm. A deep learning vision model needs variety in order to generalize to new examples and scenarios in production. Failure to collect a dataset with variety can lead to overfitting and poor performance when the model encounters new scenarios. For example, a model that is trained based on daytime lighting conditions may show good performance on images captured in the day but will struggle under nighttime conditions. Models may also be biased if one group or class is overrepresented in the dataset. This is common in face detection models. In a recent study the National Institute of Standards and Technology (NIST) found that most facial-recognition algorithms show inconsistent performance across subjects that vary by age, gender, and race 1). Having a dataset with good variety not only leads to good performance, this also helps address potential issues related to consistent performance across the full range of subjects.

low vs high variety image examples of legos

How to build a good dataset to enable your deep learning projects

Don’t underestimate the difficulty of collecting a high-quality dataset. Collecting enough examples can be time consuming and expensive. Even with a good data collection process, it could take weeks or months to collect enough instances to achieve good model performance across all representative classes. This is particularly true when you are trying to capture examples of rare events, such as examples of bad quality in a manufacturing line.

Here are five strategies to help you build a good dataset for deep learning:

  • Make data collection part of the business process
  • Accelerate training examples artificially
  • Simulate examples if possible
  • Build quality assurance into the tagging process
  • Monitor progress
5 strategies to build a good dataset for deep learning infographic

Make data collection and image tagging part of the business process.

If you can build data collection and labeling into the normal operation of your business, it will be less disruptive and reduce the costs of data collection. To do so, find ways to involve people familiar with the day to day operation of your business – these are the people who are the most familiar with the nuances of the problem. They can often identify key examples that dedicated labelers might miss. This is especially true in cases like visual quality inspection, where differences between good and bad quality may not be apparent to an external labeler. Involving the subject matter experts in your business ensures that your training data is representative of the problem that you are trying to solve.

Accelerate data collection by creating examples artificially

The data collection process is dependent on the frequency of events that you are trying to detect. If the target event occurs infrequently, then collecting a good set of images could take weeks or even months.

One way to accelerate this process is to artificially create training examples. For example, if your goal is to train a deep learning model to detect stockouts on a retail store shelf you can temporarily remove objects from the store shelves to simulate stockout events. In the case of detecting quality in a factory, this could involve purposefully assembling a product incorrectly or simulating wear or damage. A few hours of generating examples for training can dramatically reduce the time to create a good dataset, dramatically reducing the time needed to develop a deep learning model.

Consider using simulations

Great progress has been made in recent years in simulating realistic images. Simulators have been used to help train models for self-driving cars and robotics problems. These simulations have become so good that the resulting images can be used to support training deep learning models for computer vision. These images can augment your dataset and, in some cases, even replace your training dataset. This is an especially powerful technique for deep reinforcement learning, where the model must learn a wide variety of training examples.

Build Quality Assurance into your labeling process

Many applications for deep learning in vision require labels that identify objects or classes within the training images. Labeling takes time and requires consistency and careful attention to detail. Poor quality in the labeling process could be due to several causes, all of which can lead to poor model performance. Untagged instances and Inconsistent bounding boxes or labels are two examples of poor labeling quality.

To help ensure labeling quality, build a “review” step into the labeling process. Have each label reviewed by at least one other person than the labeler to help protect against bad quality in the labeling process.

labeling quality comparison for computer vision

Monitor your progress

Even with just a few dozen training examples, it is possible to train a deep learning model using transfer learning to get an idea of performance. Let your first model serve as a baseline. As more training images become available take the time to train and evaluate new models. Continue to evaluate your models as you collect more images to maintain a sense of progress. This will give you an idea of how your model is improving and allow you to gauge the value of more training images.

Next steps

Once you have a large-high-quality dataset you can focus on model training, tuning, and deployment. At this point, the hard effort of collecting and labeling images can be translated into a working model that can help solves your computer vision problem. After spending days or even weeks collecting images, the training process will go fast by comparison.

Good luck with training your deep learning models for computer vision.

Further reading:

This article was also published on LinkedIn.