
Reducing cloud application operational cost with Neal Analytics’ Azure Storage Optimizer
Neal Analytics offers a solution to solve a common business pain point: the increasing cost of Azure Blob Storage. The result? “Our Azure Storage Optimizer” (ASO) solution.
Azure Storage Optimizer
In multiple customer projects, we have seen the requirement to impose rules on storage accounts to move files from costly to less expensive tiers. Currently, developers can only move blobs older than specific days. They can check files with specific prefixes, keys, and other available filters. Also, a single wrong file move can catastrophically impact downstream pipelines and applications.
There are, however, scenarios where we can impose rules based on the last access time instead of the last modified time.
We built this functionality with more customizable filters for customers to move files quickly and confidently. We developed a Machine Learning-driven approach that considers the file usage pattern to predict file inactivity more reliably.
Azure has a (preview) feature that provides the latest access time of any blob in our storage account.
Using ‘the Get Blob APIs, we can get this information about the last access time of all blobs. This allowed us to build a tool that could:
- Provide customizations on filters
- Analyze storage accounts at different granularity levels
- Predict files that users can move to less expensive/more cost-effective storage tiers
Our Azure Storage Optimizer solution allows customers to:
- Perform analysis on their Azure Blob Storage with a wide variety of metrics like total number of files present in an account, number of files by tiers (Hot, Cool, Archive), growth of the storage size by time, and more.
- Filter files by accessed and modified time for the specified date range.
- Move rarely accessed files from one tier to another (for example, from Hot tier to Archive tier).
- Perform predictive analysis to segment active and inactive files based on historical file usage trends.
ROI of the ASO solution
Using synthetic data, we estimated that a model-based approach could achieve up to three-time savings compared to the default Azure rule-based approach.
Also, it will provide additional value for customers with huge files as it only consumes the files’ metadata and tracks its usage pattern to calculate its future probably inactivity.
Model based approach:
Predicted_Active | Predicted_Inactive | |
Actually_Active | 1196 | 12 |
Actually_Inactive | 612 | 166 |
Rule based approach:
Predicted_Active | Predicted_Inactive | |
Actually_Active | 1193 | 15 |
Actually_Inactive | 731 | 47 |
Typically, moving files from the hot to the cool tier is based on days since the last access time. This tool does not consider each file’s different usage patterns. It provides a model-driven approach to predict file inactivity. So, this tool allows more cost savings and a more reliable process for file management.
For example, considering the approximate storage cost for 100 TB of blob data:
-
-
- Hot (online) tier: 4,500 USD/month
- Cool (online) tier: 2,500 USD/month
- Archive (offline) tier: 400 USD/month
-
The simple saving equation for a 100TB storage is calculated as: IF x CS, where
- IF is the amount of expected Inactive Files (in %) and
- CS is the storage Cost Savings per TB, i.e., $4,500 – $2,500 = $2,000
For instance, if out of the 100 TB blob data, a user moves 30% of the files from the hot tier to the cool tier, the savings will be 30% * $2,000 ≈ $600 /month.
*Calculated for Read-access geo-redundant storage (RA-GRS) redundancy
Solution architecture and process flow
The steps the solutions take are as follows:
- Application users interact with the web application and enter their storage account details.
- Via HTTP Trigger Azure Function, the application fetches the metadata of all blobs in their storage.
- The application stores the blob’s metadata in Azure SQL.
- For every metadata loaded in the database, the application runs the prediction model using the Time Trigger Azure function and stores the output back into Azure SQL.
- Using Microsoft Power BI on the Azure SQL data, users can access reports to analyze their storage and the prediction results.
- Finally, the Power BI report is embedded into the UI of a web application.
Implementation
Using Microsoft’s Get Blob APIs, the solution fetches the last access time of all blobs along with some other metadata. It also creates Http Trigger and Time Trigger Azure Functions, which extract the metadata of all blobs based on the required frequency (daily, weekly, monthly, etc.) and store it in the Azure SQL Database.
The next step is to run the solution’s prediction model to calculate the probability of every file becoming inactive in the next N days/weeks/months. These predictions are stored in the Azure SQL Database as well.
For the end user, we have built a Power BI report on top of the fetched and predicted data so that they can perform their own analysis. We have also developed a web app that embeds this Power BI report.
With this web app, users can move files from a more expensive to a cost-effective tier based on their filter selection and a button click. This application will pass the selected filters as a parameter to Set Blob APIs to perform tier movement operations. Also, a user can store this filter information and automate the movement daily, if needed.
Step-by-step demo
First, the user must enter their storage account details like the storage account name and its primary key.
On successfully scanning the credentials, the application will display the Power BI report (below) for analyzing and performing any required movement of files from one tier to another.
In this section, the user can select the set of files from any container or of any type they would like to move from one tier to another based on the analysis made from reports.
Also, this provides flexibility to choose the date range for files accessed and modified dates and choose tiers using a dropdown menu. By clicking the Move button, the files will move according to the selection of tiers.
Roadmap
Based on initial feedback, and although this is not fully set in stone, here are a few additional capabilities our team is looking to add to the solution in the future: Provide an automation feature in the embedded Power BI report to automate the process of moving filtered files from one tier to another.
- Make the tool capable enough to handle multiple customer requests and their storage accounts simultaneously. Currently, it is handling only one storage account at a time.
- Automate daily movement of files based on specific filters applied by the customer.
- Include representation of predicted storage account size based on historical trend.
- Optimize our system settings for different data size scenarios.
Conclusion
Our Azure Storage Optimizer solution allows organizations to effortlessly identify and transfer their files to various tiers, potentially helping them save up to 30% of costs when moving 30% of their total files to a lower tier. If you’re struggling with tier management, we are ready to help. Our team has the right expertise to address storage cost issues and has assisted many organizations with their cloud migrations. If you’re curious to learn more, don’t hesitate to get in touch with us.
References:
- Microsoft Documentation – Get Blob properties
- Microsoft Documentation – Set Blob properties
- Microsoft Documentation – Time trigger for Azure functions
- Migration and modernization solution
This blog is co-authored by Gaurav Ghorpade , Vidhi Saxena , and Yash Mittal .