What is the best way to transfer huge files to Azure Data Lake Storage?
There are many ways to upload files to Azure Data Lake Storage (ADLS) Gen2.In this article, we will compare two popular ways for an organization to upload files to ADLS. Mainly, we will compare below parameters to identify which one suits best for your need:
- Performance: Speed by which the file is being uploaded
- Ease: How easy it is to set up and use
- Automation: Where there will be or not any manual intervention after operationalization
In this tutorial we will rate each of the above three parameters with the range from 1 to 5, with 1 being lowest and 5 being highest supported/achievable parameter.
We will be testing these two approaches to upload 10 GB test file:
- Azure Storage Explorer
Without further ado, let’s get started.
In order to transfer file using AzCopy, you will need AzCopy, which you can download from here.
Once the AzCopy is downloaded, lets create a ADLS Gen2 for our tutorial. I have already created one for the purpose of this tutorial and a container for the AzCopy test:
In order to use AzCopy to transfer file to container, we need a SAS token, lets generate our SAS token:
Click on “Shared access signature” under “Security + networking” within Storage account
I have selected 1-day range for the SAS token to be active. Then click the “Generate SAS and Connection string” button
Copy the SAS token from the available URLs, it should look like below:
Now that we have SAS token lets generate our AzCopy command, below is the syntax for AzCopy command-line:
azcopy copy '' 'https://..core.windows.net//' // TIP This example encloses path arguments with single quotes (''). // Use single quotes in all command shells except for the Windows Command Shell (cmd.exe). // If you're using a Windows Command Shell (cmd.exe), enclose path arguments with double quotes ("") instead of single quotes ('').
I have placed the sample file and AzCopy application both in the root folder on C drive called “AzCopy”. Replacing the values in above syntax, below is what we get:
azcopy copy “c:\AzCopy\10GB.bin” “https://tuttransfer.blob.core.windows.net/azcopy/10GB.bin?sv=2020-08-04&ss=bfqt&srt=sco&sp=rwdlacupx&se=2021-09-30T04:15:53Z&st=2021-09-28T20:15:53Z&spr=https&sig=w%2Bn8Jl%2BsnOe1Xk5A2CfjM6%2BDuHz8kb4ZFQugmgtOcfQ%3D”
AzCopy being command-line utility, we will have to open command prompt and execute above command, at the end of the execution, AzCopy will provide a summary of transfer. Let’s execute:
So, it took 16.34 mins to transfer 10 GB file using AzCopy and SAS token authorization.
Now let’s try using Storage Explorer.
Azure Storage Explorer has graphical user interface to interact with storage account on Azure, which makes it more user friendly, but here we are trying to compare which one of these two options is best for transferring big files to ADLS Gen 2. Let’s download the Storage Explorer from here. Execute the executable file downloaded and follow the installation wizard.
Once installed, the storage explorer should automatically start with the setup screen to connect to Azure Storage, select “ADLS Gen2 container or directory” option:
Now that we have the SAS token we generated during AzCopy test, lets select SAS URL option on next screen and then click “Next”:
Input the friendly name in Display name and paste the SAS URL and click next:
Click “Connect” on the Summary screen:
This step will connect to your storage container, the next step is simply using GUI to click “Upload”, select the file and start uploading:
We can monitor the transfer on the bottom pane:
The total time taken for Storage explorer was 18.03 minutes.
Let’s use our parameters to identify which approach is best between above two approaches to transfer big files to Azure Storage
Performance: Speed by which the file is being uploaded
I would rate performance of the AzCopy better than Storage Explorer, it took ~3 mins less than storage explorer:
- AzCopy – 5 points
- Storage Explorer – 3 points
Ease: How easy it is to set up and use
Storage Explorer is much easier to use than AzCopy because of the user-friendly GUI, while AzCopy is command-line utility and requires command formation to be executed
- AzCopy – 3 points
- Storage Explorer – 5 points
Automation: Where there will be or not any manual intervention after operationalization
AzCopy can be automated using Windows Task, PowerShell, and other orchestration and parameterized executions, while Storage Explorer is completely manual, we cannot automate jobs or tasks using Storage explorer GUI
- AzCopy – 4 points
- Storage Explorer – 1 point
Though both of the options have pros and cons, AzCopy can cover all scenarios of Storage Explorer transferring to/from ADLS Gen2, while some use cases might not be achievable using Storage Explorer, especially automation.