Autoscale optimizations - Identifying effective scale-in metrics

This post is 2nd of a 3 part series discussing autoscale and its related concepts. In the previous post, we discussed flapping during autoscale scale-in operations and built a model to help optimize autoscale configurations. In this post we identify the effective scale-in threshold considering the anti-flapping mechanism. If you haven’t read the previous post, I encourage you to take a detour as we build on top of the information established earlier.

Scale-outs are usually simple. You define an upper bound on your metric and autoscale will take care of the rest. Anti-flapping mechanisms with prevent unsafe scale-ins from happening so it is scale-ins that should be better estimated. An incorrect configuration can lead to unnecessary costs. You can be a lot more confident about your autoscale configurations, if you know precisely when a scale-in is going to occur keeping in mind all the factors that prevent a scale-in from happening.

We have identified the following model that helps calculate the resulting scale-in metric (ri) based on the applied scale-in, scale-out thresholds (ti, to respectively) and scale-in instance count (s). We consider ti as this is the metric where the scale-in intent is triggered (view plots in previous post).

ri = (x * ti) / (x - s)

A more generalized form of the above equation would be

ri = (x * m) / (x - s)

Where m is the metric upon which autoscale is being configured.

Flapping will occur when the metric falls below scale-in threshold ti and the resulting scale-in metric ri is greater than scale-out threshold to.

ri ≥ to when m ≤ ti

However, this doesn’t block scale-in. Only delays it until the metric falls further. This takes us to the important question. For the allocated instances, what value should the metric reach to definitely cause a scale-in?

For the allocated instances, what value should the metric reach to definitely cause a scale-in?

We need to find a metric value m for a given instance count x, scale-in instances s which upon scale-in leads to a resulting scale-in metric ri that is just slightly below scale-out threshold to i.e. to - 1.

Let’s build on what we have established for ri

(x * m) / (x - s) = to - 1
∴ m = [(to - 1) * (x - s)] / x

For any given x, to and s, we are now able to calculate the effective scale-in threshold. Take note, that scale-in threshold ti is still important to consider as for any value of m that falls above ti, the effective scale-in threshold will always be ti.

m = [(to - 1) * (x - s)] / x where { m < ti }

Let us plot a graph for m (orange). This indicates the effective scale-in threshold. i.e. Scale-in will occur only if the metric falls below this value for given system state. The shaded area indicates the region where m will cause flapping. For optimal autoscale configurations, instance count x during average operating hours should not fall under the shaded area.

Some observations that can be made

  • Flapping will not occur for scale-ins beyond 10 instances.
  • At 6 instances, the metric will have to fall below 74% for scale-in to occur

We can now identify if this meets our normal operating ranges or optimize the autoscale configuration to ensure that effective scale-in thresholds are met.

Feel free to open the Desmos link and play around with the parameters configured to explore further.

In the next part we will dive into challenges faced when using memory based metrics for autoscale configurations.

Ideas presented here are based on my personal observations. Please maintain caution when applying configurations on your own cloud environments. Your results might vary. Got feedback or ideas? Drop a comment or email om [at] 0x8 dot in.