Top 4 things to consider when moving AI workloads to the edge

Top 4 things to consider when moving AI workloads to the edge

Microsoft recently announced several new updates to its Azure Stack Hub and the Azure Stack Edge product lines. Azure Stack Hub is now equipped with GPUs, enabling the Stack Hub to run AI compute-intense workloads such as vision models. Microsoft also announced the Azure Stack Edge Pro and Azure Stack Edge Pro R, both equipped with the NVIDIA Tensor T4 GPU. These product releases are just the latest steps in an important trend of moving AI workloads closer to the point of data collection, enabling AI applications in a wider range of environments.

Considerations when moving AI workloads to edge devices

There are several advantages and some disadvantages to moving AI workloads to edge devices. In this post, I will cover a few of the tradeoffs in deciding where to deploy AI models including feasibility, time to action, cost transparency, and scalability.


Feasibility has been an important driver of inference at the edge for several years. The issue stems from limitations in mobile bandwidth and the high bandwidth requirements of transferring vision or voice data with low latency. Inference on edge devices reduces the mobile bottleneck by allowing engineers to be selective in the data they choose to transmit and centralize. By using edge devices engineers may choose to process and store large quantities of raw data locally and transmit the inference results over mobile as needed. The inference results themselves are usually much smaller in size than the raw images or audio data and can be readily transmitted in scenarios with limited mobile bandwidth.

Time to action

Time to action is another important consideration for many scenarios where actions or decisions must take place on the order of seconds. Automatic control of equipment such as self-driving loaders or mining equipment is one class of scenarios where latency or long inference times can lead to equipment errors or damage. In such cases, performance can be greatly improved by moving inference workloads to the location of the equipment at the edge, where the time to translate data into action can be minimized.

Cost transparency and scalability

Performing inference at the edge can also improve cost transparency for many scenarios by bringing the hardware and operating costs in closer alignment with the physical location. Consider a security algorithm designed to detect human trespassing at a remote site. In performing inference in a central location or the cloud the inference costs must be divided across multiple sites to get a good understanding of the cost per site. By performing inference at the edge, compute costs are more naturally aligned to individual sites, linked to either that location’s CapEx or OpEx budget. Costs scale linearly with each new site, making cost modeling more straightforward.

On the flip side, performing inference at the edge can reduce the advantages of pooling compute resources for cost benefits. As such, it is necessary to be more deliberate in matching the capabilities and costs of hardware to the portfolio of scenarios deployed at each site. Depending on capabilities an edge device may be able to handle the workload of a single AI scenario or several scenarios operating in parallel. With inference at the edge, sizing the capabilities of the hardware to the needs of a site becomes an important consideration as unnecessary equipment costs can quickly reduce the value of a business case.

The incorporation of edge computing devices into your infrastructure strategy is an important question for many companies. Leveraging these devices requires carefully weighing the tradeoffs while considering the potential use cases. If you want to learn more about how edge devices may fit into your infrastructure strategy, please contact us to discuss.

Looking for more resources? Check out our Azure Stack Edge page and the links below.

Further reading: