
Azure labeling projects: How to add data to an existing dataset
When working on an Azure Data Labeling project, it is common to have this dreaded realization in the middle of your labeling efforts: “I have more data than I can add to my dataset.”. Rather than creating a new dataset and splitting your pool of data between two locations, it’s best to add the new data to your existing dataset to keep your labeling efforts centralized to one location. This allows you to maintain one view into your entire labeling project and helps prevent duplicate images from being labeled, all while allowing you to make progress on your labeling project while you wait for new data to be delivered.
You can accomplish this through 3 key steps, which we explain through this process:
-
- Identify where your data is stored
- Find where it is stored in blob storage
- Upload your new data
Step 1: Identify where your data is stored
First, you will need to identify where your dataset is stored in your Azure Resource Group. To begin, go to your labeling project and click “Details”, then click “Datasets” and click the name of the dataset you are using.
Here is where you will see which datastore your dataset is saved to as well as the relative path to where it exists within the datastore.
Note: usually, Blob storage is the default datastore when the dataset is first created.
Step 2: Find where your data is stored in Blob storage
Next, you will need to go to your Azure Resource Group and select the relevant storage account then select containers to see your available data storage locations.
Next, select your Blob storage container (or relevant datastore if you are using something other than Blob storage).
From here, you can follow the relative path to the location of your dataset. From the example in our previous image, our relative path is UI/04-19-2021_081209_UTC.
Step 3: Upload your new data
Now that you are at your dataset’s location, you can click upload to begin adding new data to your labeling project. Once you have uploaded your additional images, you can return to your labeling project.
When you return to your labeling project, you may see that the total number is unchanged. You will need to make sure the project is refreshed to see the changes.
To do this, you can either:
-
- Disable and re-enable your project. When the project is re-enabled, it will identify any changes to the dataset and update the project accordingly.
- Wait 24 hours for your project to automatically refresh. This is a good option if you need to avoid turning your project off.
The end result should show an increase in the amount of data in your labeling project while also not affecting any results of your previously labeled data.
Conclusion
Hopefully, this tutorial will help you save some time and effort on your next Azure labeling project. By following these three steps to add data to an existing dataset, rather than splitting it between two locations, you’ll maintain one view into your data labeling project, prevent duplicate images from being labeling, and be able to continuously add data to your project.