0x8 — Autoscale optimizations - Challenges with memory based metrics

This post is 3rd of a 3 part series discussing autoscale and its related concepts. In the previous posts, we discussed flapping and identified effective scale-in metrics to help optimize autoscale configurations. This post focuses on challenges during scale-in operations on memory based metrics and builds a model to help optimize the same. We will build on top of mathematical models and observations from the previous posts, so I encourage you to take a detour if you haven’t read them already .

Services like image/video processing apps can be memory intensive, requiring autoscale configurations to be backed by observing memory usage metrics. Calculating resulting average metrics can be a challenge when relying on memory. A detailed discussion on this challenge is available here https://medium.com/@jonfinerty/flapping-and-anti-flapping-dcba5ba92a05.

To summarize - When running an application on a Virtual Machine or Azure App Service plan, the memory metrics account for application and operating system usage combined. When scaling out, fresh instances of the application are added, which also add operating system footprints. When average memory usage across all instances starts receding below scale-in thresholds, autoscale tries to predict the resulting metric (average memory utilization) based on the distribution of current metric. This prediction captures a margin of error due to the operating system footprint belonging to excess instances which will be wiped out after a scale-in operation. This error triggers the anti-flapping mechanisms, delaying the scale-in operation.

Scaling in

We have the following model from the previous post that calculates the resulting metrics during scale-in operations.

r_i = (x * m) / (x - s)

Here, x represents the allocated instances, m represents the memory utilization at any given time, s represents the number instances being removed during scale-in and r_i represents the resulting scale-in metrics.

In this calculation, m is the instantaneous memory usage, which includes the operation system footprint. Memory used by instances being scaled in (removed) should be discarded from this calculation to get a better approximation of the resulting memory metric.

It is reasonable for autoscale to face challenges while differentiating operating system and application memory. Therefore the onus is on us to estimate the operating system’s memory footprint. This can help us identify system states (allocated instances, scale-out, scale-in thresholds, etc.) when scale-in/out can be challenging. We modify the model above to account for the memory being used by the operating system in percentage (b) and exclude the portion being used by the instances to be scaled-in.

r_i = [(x * m) - (b * s)]/ (x - s)

The plot (black) below shows the resulting memory usage at allocated instances along X-axis using the model above. Take note of the scale-out threshold (red) at 60%, scale-in threshold (green) at 50% and the current memory utilization (purple) at 49%. Number of instances to scale-in is set to 2. You can click on the Desmos link and modify the values using sliders to understand the relationships with the resulting metric. The shaded area indicates allocated instances that will experience flapping at given m.

The plot (blue dotted) indicates the actual resulting metrics after excluding the operating system memory usage from scaled-in instances. This reference helps understand the margin of error in calculating resulting metrics.

From the plot, we can observe

At 6 instances, if memory usage drops to 49% autoscale will calculate the resulting memory usage as 73%. This will block the scale-in operation as autoscale predicts a scale-out operation.
For scale-in to occur at 6 instances, current memory utilization will have to drop to 39% (try sliding m to 39%).
Though there is an error in the resulting metrics at 11 instances and above, both metrics lie below the scale-out threshold indicating that the above configuration is safe to use for applications which normally operate on 11+ instances with average memory utilization at 50%.

Scaling out

The above effect on average metric calculations is observed during scale-out too. We apply the following model to calculate resulting scale-out metric (r_o).

r_o = (x * m) / (x + s)

This again has margin of error as the resulting metric does not include the additional memory usage added by the scaled out instances (s). Let’s repeat the process to factor that in considering memory being used by the operating system in percentage (b).

r_o = [(x * m) + (b * s)] / (x + s)

The plot below is identical to the one above designed to indicate resulting scale-out metrics. The only difference being, the shaded area highlights allocated instances for given m where the actual memory utilization due to operating system overheads is bound to cause further scale-out operations.

We can observe,

If memory utilization spikes to 65% when allocated instances are 15 or more, multiple scale-out events are bound to happen in a short duration to cover for the operating system overheads added by new instances.
The autoscale configuration is safe to use for services that will not exceed 15 instances.

While multiple scaling events or undertaking excess resources is normal for autoscale operations, the plots above help identify the limits within which our configurations will operate optimally. We can identify the normal operating range of instances, estimated operating system memory usage and then identify the scale-in and scale-out thresolds such that the resulting metrics calculated by autoscale and actual metrics are as close to each other as possible.

Ideas presented here are based on my personal observations. Please maintain caution when applying configurations on your own cloud environments. Your results might vary. Got feedback or ideas? Drop a comment or email om [at] 0x8 dot in.

Autoscale optimizations - Challenges with memory based metrics

Scaling in

Scaling out

Recommended for you