Week 43, 2019 - CloudWatch Anomaly Detection; Some AWS Container Updates

By (4 minutes read)

AWS introduced CloudWatch Anomaly Detection as a way to get smarter alarms for your metrics. There were also a couple of smaller updates to the AWS container services.

CloudWatch Anomaly Detection

Almost a year ago, AWS introduced Predictive Scaling for their EC2 instances. Predictive Scaling allowed your EC2 instances to scale automatically based on past behaviour. The newly released CloudWatch Anomaly Detection does something similar in that it looks at past behaviour to determine if an alarm should be triggered.

The concept is straightforward: CloudWatch looks back at a training period you define and learns from that when something is expected and when it isn’t. So, if you have a regular spike because a cron job runs every day at noon, it will take that into account when determining if it should send an alert about that.

I’m pretty confident that it won’t surprise you that under the hood there’s a lot of machine learning going on to make these predictions, and you can use it for every metric you can measure in CloudWatch. You’ve got quite a bit of control over the training period as well, and you can exclude specific times (such as a deployment that took a lot of CPU) so those don’t get taken into account.

Ok, you might ask, why is this actually interesting? After all, you could already set your alarms to trigger if resources went out of control. There are a couple of reasons why this might be interesting, so let’s examine those a bit.

Let’s take a hypothetical example, you have a single instance that serves traffic to the internet1. Let’s further assume that you originally had an alarm set to trigger when you used over 80% CPU, but over time you found that it would occasionally have a short spike of 90%. Because of this you’d adjust your alerts to trigger only if it goes over 90% for a sustained period. With Anomaly Detection you can instead get alerted when it goes over 80% outside the predicted period.

Similarly, you can find out that things are going wrong a lot earlier. In another hypothetical example, you have an instance that doesn’t get a lot of traffic during the day (for example it’s running in a different region), so you do your deployments during the day. Now, if there is an issue with the new code that increases CPU usage by quite a bit you wouldn’t usually get an alert for that until it crosses the maximum you set for during the busy times. Anomaly Detection however would notice that instead of only using 20% CPU it was now suddenly using 40% and alert on that, giving you time to investigate before anything goes wrong.

Some AWS Container Updates

As Anomaly Detection took up quite a bit of this note, I’ll finish up with a couple of smaller, but potentially interesting container updates in the AWS world.

As Kubernetes 1.14 officially supports running Windows nodes, it won’t come as a big surprise that this is now also supported in EKS that runs that version or higher2. I don’t have much to say about this, if you were waiting for this because you absolutely need it you now have it. In every other case3, I still recommend looking for alternatives.

On the native orchestration side, ECS now supports ECS Image SHA tracking. This means that you can more easily track where a container has been deployed to. Something like this can be useful for auditing purposes, but it can also be useful in finding out if there are some older versions of an image running in your environment.

Ambassador Corner


  1. No, this isn’t a good design. Obviously you should have an autoscaling group etc., but let’s keep the example simple. ↩︎

  2. Ok, until a newer version is released that means only on 1.14. ↩︎

  3. And also if you need it. ↩︎

comments powered by Disqus