*This post is 2nd of a 3 part series discussing autoscale and its related concepts. In the previous post, we discussed flapping during autoscale scale-in operations and built a model to help optimize autoscale configurations. In this post we identify the effective scale-in threshold considering the anti-flapping mechanism. If you haven’t read the previous post, I encourage you to take a detour as we build on top of the information established earlier.*

Scale-outs are usually simple. You define an upper bound on your metric and autoscale will take care of the rest. Anti-flapping mechanisms with prevent unsafe scale-ins from happening so it is scale-ins that should be better estimated. An incorrect configuration can lead to unnecessary costs. You can be a lot more confident about your autoscale configurations, if you know precisely when a scale-in is going to occur keeping in mind all the factors that prevent a scale-in from happening.

We have identified the following model that helps calculate the resulting ** scale-in metric (r_{i})** based on the applied

**and**

*scale-in, scale-out thresholds (t*_{i}, t_{o}respectively)**. We consider t**

*scale-in instance count (s)*_{i}as this is the metric where the scale-in intent is triggered (view plots in previous post).

r_{i} = (x * t_{i}) / (x - s)

A more generalized form of the above equation would be

r_{i} = (x * m) / (x - s)

Where *m* is the ** metric** upon which autoscale is being configured.

Flapping will occur when the metric falls below scale-in threshold t_{i} and the resulting scale-in metric r_{i} is greater than scale-out threshold t_{o}.

r_{i} ≥ t_{o} *when* m ≤ t_{i}

However, this doesn’t block scale-in. Only delays it until the metric falls further. This takes us to the important question. For the allocated instances, what value should the metric reach to definitely cause a scale-in?

For the allocated instances, what value should the metric reach to definitely cause a scale-in?

We need to find a metric value *m* for a given instance count *x*, scale-in instances *s* which upon scale-in leads to a resulting scale-in metric r_{i} that is just slightly below scale-out threshold t_{o} i.e. t_{o} - 1.

Let’s build on what we have established for r_{i}

(x * m) / (x - s) = t_{o} - 1

∴ m = [(t_{o} - 1) * (x - s)] / x

For any given *x, t _{o} and s*, we are now able to calculate the effective scale-in threshold. Take note, that scale-in threshold t

_{i}is still important to consider as for any value of

*m*that falls above t

_{i}, the effective scale-in threshold will always be t

_{i}.

m = [(t_{o} - 1) * (x - s)] / x where { m < t_{i} }

Let us plot a graph for *m* (orange). This indicates the effective scale-in threshold. i.e. Scale-in will occur only if the metric falls below this value for given system state. The shaded area indicates the region where *m* will cause flapping. For optimal autoscale configurations, instance count *x* during average operating hours should not fall under the shaded area.

Some observations that can be made

- Flapping will not occur for scale-ins beyond 10 instances.
- At 6 instances, the metric will have to fall below 74% for scale-in to occur

We can now identify if this meets our normal operating ranges or optimize the autoscale configuration to ensure that effective scale-in thresholds are met.

Feel free to open the Desmos link and play around with the parameters configured to explore further.

*In the next part we will dive into challenges faced when using memory based metrics for autoscale configurations.*

*Ideas presented here are based on my personal observations. Please maintain caution when applying configurations on your own cloud environments. Your results might vary. Got feedback or ideas? Drop a comment or email om [at] 0x8 dot in.*