0x8 — Autoscale optimizations - Identifying effective scale-in metrics

This post is 2nd of a 3 part series discussing autoscale and its related concepts. In the previous post, we discussed flapping during autoscale scale-in operations and built a model to help optimize autoscale configurations. In this post we identify the effective scale-in threshold considering the anti-flapping mechanism. If you haven’t read the previous post, I encourage you to take a detour as we build on top of the information established earlier.

Scale-outs are usually simple. You define an upper bound on your metric and autoscale will take care of the rest. Anti-flapping mechanisms with prevent unsafe scale-ins from happening so it is scale-ins that should be better estimated. An incorrect configuration can lead to unnecessary costs. You can be a lot more confident about your autoscale configurations, if you know precisely when a scale-in is going to occur keeping in mind all the factors that prevent a scale-in from happening.

We have identified the following model that helps calculate the resulting scale-in metric (r_i) based on the applied scale-in, scale-out thresholds (t_i, t_o respectively) and scale-in instance count (s). We consider t_i as this is the metric where the scale-in intent is triggered (view plots in previous post).

r_i = (x * t_i) / (x - s)

A more generalized form of the above equation would be

r_i = (x * m) / (x - s)

Where m is the metric upon which autoscale is being configured.

Flapping will occur when the metric falls below scale-in threshold t_i and the resulting scale-in metric r_i is greater than scale-out threshold t_o.

r_i ≥ t_o when m ≤ t_i

However, this doesn’t block scale-in. Only delays it until the metric falls further. This takes us to the important question. For the allocated instances, what value should the metric reach to definitely cause a scale-in?

For the allocated instances, what value should the metric reach to definitely cause a scale-in?

We need to find a metric value m for a given instance count x, scale-in instances s which upon scale-in leads to a resulting scale-in metric r_i that is just slightly below scale-out threshold t_o i.e. t_o - 1.

Let’s build on what we have established for r_i

(x * m) / (x - s) = t_o - 1
∴ m = [(t_o - 1) * (x - s)] / x

For any given x, t_o and s, we are now able to calculate the effective scale-in threshold. Take note, that scale-in threshold t_i is still important to consider as for any value of m that falls above t_i, the effective scale-in threshold will always be t_i.

m = [(t_o - 1) * (x - s)] / x where { m < t_i }

Let us plot a graph for m (orange). This indicates the effective scale-in threshold. i.e. Scale-in will occur only if the metric falls below this value for given system state. The shaded area indicates the region where m will cause flapping. For optimal autoscale configurations, instance count x during average operating hours should not fall under the shaded area.

Some observations that can be made

Flapping will not occur for scale-ins beyond 10 instances.
At 6 instances, the metric will have to fall below 74% for scale-in to occur

We can now identify if this meets our normal operating ranges or optimize the autoscale configuration to ensure that effective scale-in thresholds are met.

Feel free to open the Desmos link and play around with the parameters configured to explore further.

In the next part we will dive into challenges faced when using memory based metrics for autoscale configurations.

Ideas presented here are based on my personal observations. Please maintain caution when applying configurations on your own cloud environments. Your results might vary. Got feedback or ideas? Drop a comment or email om [at] 0x8 dot in.

Autoscale optimizations - Identifying effective scale-in metrics

Recommended for you