What is the best way to transfer huge files to Azure Data Lake Storage?

What is the best way to transfer huge files to Azure Data Lake Storage?

There are many ways to upload files to Azure Data Lake Storage (ADLS) Gen2.In this article, we will compare two popular ways for an organization to upload files to ADLS. Mainly, we will compare below parameters to identify which one suits best for your need:

  1. Performance: Speed by which the file is being uploaded
  2. Ease: How easy it is to set up and use
  3. Automation: Where there will be or not any manual intervention after operationalization

In this tutorial we will rate each of the above three parameters with the range from 1 to 5, with 1 being lowest and 5 being highest supported/achievable parameter.

We will be testing these two approaches to upload 10 GB test file:

  1. AzCopy
  2. Azure Storage Explorer

Without further ado, let’s get started.

AzCopy

In order to transfer file using AzCopy, you will need AzCopy, which you can download from here.

Once the AzCopy is downloaded, lets create a ADLS Gen2 for our tutorial. I have already created one for the purpose of this tutorial and a container for the AzCopy test:

Screenshot example of AzCopy ADLS Gen2 setup

In order to use AzCopy to transfer file to container, we need a SAS token, lets generate our SAS token:

Click on “Shared access signature” under “Security + networking” within Storage account

Screenshot example of SAS token generation

I have selected 1-day range for the SAS token to be active. Then click the “Generate SAS and Connection string” button

Screenshot example range for SAS token

Copy the SAS token from the available URLs, it should look like below:


?sv=2020-08-04&ss=bfqt&srt=sco&sp=rwdlacupx&se=2021-09-30T04:15:53Z&st=2021-09-28T20:15:53Z&spr=https&sig=w%2Bn8Jl%2BsnOe1Xk5A2CfjM6%2BDuHz8kb4ZFQugmgtOcfQ%3D

Now that we have SAS token lets generate our AzCopy command, below is the syntax for AzCopy command-line:


azcopy copy '' 'https://..core.windows.net//'

// TIP This example encloses path arguments with single quotes (''). 
// Use single quotes in all command shells except for the Windows Command Shell (cmd.exe). 
// If you're using a Windows Command Shell (cmd.exe), enclose path arguments with double quotes ("") instead of single quotes ('').

I have placed the sample file and AzCopy application both in the root folder on C drive called “AzCopy”. Replacing the values in above syntax, below is what we get:


azcopy copy “c:\AzCopy\10GB.bin” “https://tuttransfer.blob.core.windows.net/azcopy/10GB.bin?sv=2020-08-04&ss=bfqt&srt=sco&sp=rwdlacupx&se=2021-09-30T04:15:53Z&st=2021-09-28T20:15:53Z&spr=https&sig=w%2Bn8Jl%2BsnOe1Xk5A2CfjM6%2BDuHz8kb4ZFQugmgtOcfQ%3D”

 

AzCopy being command-line utility, we will have to open command prompt and execute above command, at the end of the execution, AzCopy will provide a summary of transfer. Let’s execute:

Screenshot example command line summary of ADLS transfer with AzCopy and SAS token authorization

So, it took 16.34 mins to transfer 10 GB file using AzCopy and SAS token authorization.

Now let’s try using Storage Explorer.

Storage Explorer

Azure Storage Explorer has graphical user interface to interact with storage account on Azure, which makes it more user friendly, but here we are trying to compare which one of these two options is best for transferring big files to ADLS Gen 2. Let’s download the Storage Explorer from here. Execute the executable file downloaded and follow the installation wizard.

Once installed, the storage explorer should automatically start with the setup screen to connect to Azure Storage, select “ADLS Gen2 container or directory” option:

Screenshot example Azure Storage Explorer to ADLS Gen2 setuo

Now that we have the SAS token we generated during AzCopy test, lets select SAS URL option on next screen and then click “Next”:

Screenshot example Azure Storage Explorer select connection method

Input the friendly name in Display name and paste the SAS URL and click next:

Screenshot example Azure Storage Explorer connection info

Click “Connect” on the Summary screen:

Screenshot example summary screen

This step will connect to your storage container, the next step is simply using GUI to click “Upload”, select the file and start uploading:

Screenshot example Storage Explorer upload

We can monitor the transfer on the bottom pane:

Screenshot example Storage Explorer monitor transfer

The total time taken for Storage explorer was 18.03 minutes.

Conclusion

Let’s use our parameters to identify which approach is best between above two approaches to transfer big files to Azure Storage

Performance: Speed by which the file is being uploaded

I would rate performance of the AzCopy better than Storage Explorer, it took ~3 mins less than storage explorer:

  • AzCopy – 5 points
  • Storage Explorer – 3 points

Ease: How easy it is to set up and use

Storage Explorer is much easier to use than AzCopy because of the user-friendly GUI, while AzCopy is command-line utility and requires command formation to be executed

  • AzCopy – 3 points
  • Storage Explorer – 5 points

Automation: Where there will be or not any manual intervention after operationalization

AzCopy can be automated using Windows Task, PowerShell, and other orchestration and parameterized executions, while Storage Explorer is completely manual, we cannot automate jobs or tasks using Storage explorer GUI

  • AzCopy – 4 points
  • Storage Explorer – 1 point

AzCopy vs Storage Explorer comparison table

Though both of the options have pros and cons, AzCopy can cover all scenarios of Storage Explorer transferring to/from ADLS Gen2, while some use cases might not be achievable using Storage Explorer, especially automation.