Kubernetes autoscaling for event-driven workloads

A Microsoft and Red Hat open source collaboration, KEDA, brings event-driven autoscaling to any Kubernetes cluster

CSO > Hands together in a huddle / teamwork / collaboration / empathy

Credit: PeopleImages / Getty Images

Kubernetes, in all its many forms, is a powerful tool for building distributed systems. There’s one big problem though: Out of the box it’s only designed to offer resource-based scaling. If you look at its history (coming from Google’s internal Borg service as a response to AWS), that decision isn’t surprising. Most of the applications and services it was designed to work with were resource-bound, working with large amounts of data, and dependent on memory and CPU.

Not all distributed applications are like that. Many, especially those that work with IoT (Internet of things) systems, need to respond rapidly to events. Here it’s I/O that’s most important, providing events and messages that trigger processes on demand. It’s a model that works well with what we’ve come to call serverless compute. Much of the serverless model depends on rapidly spinning up new compute containers on demand, something that works well on dedicated virtual infrastructures with their own controllers but isn’t particularly compatible with Kubernetes’ resource-driven scaling.

Introducing KEDA: Kubernetes-based event-driven autoscaling

Microsoft and Red Hat have been collaborating on a means of adding event-driven scaling to Kubernetes, announcing their open source KEDA project at Microsoft’s Build conference back in May 2019. That initial KEDA code quickly got a lot of traction, and the project recently unveiled its 1.0 release, with the intent of having the project adopted by the Cloud Native Computing Foundation.

KEDA can be run on any Kubernetes cluster, adding support for a new set of metrics that can be used to drive scaling. Instead of only responding to CPU and memory load, you’re now able to respond based on the rate of received events, reducing the risk of queuing delays and lost event data. Since message volumes and CPU demands aren’t directly linked, a KEDA-enabled cluster can spawn new instances as messages arrive, well before traditional Kubernetes metrics would have responded. It can also support clusters scaling down to zero when queues are empty, keeping costs to a minimum and allowing Kubernetes clusters to behave like Azure Functions.

Deploying KEDA

Like many of Microsoft’s recent distributed application tools, KEDA is explicitly platform agnostic. That’s not surprising, given the Kubernetes approach to supporting as many platforms as possible and an informal agreement to avoid forking the platform between distributions. Microsoft provides a way of installing KEDA as part of Azure Functions, or you can add it to any Kubernetes installation using Helm charts or YAML.

Deploying with Helm is probably your best option, as it can easily be scripted and added to your infrastructure repository. To install KEDA, add its GitHub repository to your Helm install. Once you’ve updated your Helm repositories to use the KEDA charts, run the appropriate install command for the version of Helm you’re using. Helm 3 users will need to use kubectl to add a namespace to their Kubernetes install before running the chart.

Once installed, KEDA needs to be configured to work with your chosen scaler. In KEDA terms, a scaler is a tool for monitoring an event source that’s used to scale your Kubernetes application. Although you can create your own, the KEDA community has put together a set that covers most common scenarios, from Apache Kafka to Azure Event Hubs via AWS Cloudwatch and GCP’s Pub/Sub tooling.

Inside KEDA autoscaling

KEDA implements two key Kubernetes roles which are deployed as separate containers in your Kubernetes cluster. The first, keda-operator, is a controller, managing other containers, scaling up and down as needed. The second, keda-operator-metrics-apiserver, manages the pod’s scaling service, scaling out the cluster when defined metrics, such as queue length are detected. KEDA isn’t a tool for managing events; your pod code will need to process them for you. All KEDA does is monitor queues to ensure that your cluster always has the optimum number of instances in order to avoid losing data or adding significant lag to a system.

Implementing event-driven scaling isn’t hard. With a scaler attached to an event source, KEDA is then linked to a Kubernetes deployment in the same namespace. Like much of Kubernetes, KEDA’s behavior is described in XAML, using a custom resource definition. You start by defining a spec for the scaling behavior, listing the target, the polling interval and the cooldown period, as well as minimum and maximum replica counts. There are default options for all these, but you’ll want to tune them.

The default polling interval is 30 seconds. That might be suitable for a low frequency of events, but if you’re expecting to process a lot of data you’ll want to increase this to avoid queue backlogs. Similarly, you’re likely to need to change the cooldown period to one that fits better with your cloud provider’s container billing policies.

Getting your KEDA configuration right

It’s important to understand the underlying pattern of your event-driven applications when configuring KEDA. For example, choosing an appropriate cooldown period can keep costs down by keeping the number of active containers to a minimum. However, there’s an associated delay when starting up new containers. This can add lag to a system, and, in many cases, it might be best to keep a minimum number of replicas running to ensure the quickest response times. It’s up to you to determine whether that fits into your application’s economic model.

Using KEDA to scale deployments can be an issue if you have long-running events. Since containers are managed by Kubernetes’ own controllers, it’s possible for resources to be scavenged even when an event is still being handled. The alternative is to use KEDA to work with Kubernetes jobs, with each event being treated as a single job that’s created, run to process a single message, and then terminated. It’s easy to see this approach hosting containers with Azure Functions, allowing you to move your serverless code between clouds and onto your own infrastructure.

It’s interesting to see Microsoft collaborating with Red Hat on a tool like KEDA. Cloud applications shouldn’t be tied to just one public cloud, which makes platform-agnostic Kubernetes services an important tool for breaking out of the public clouds’ walled gardens. Using KEDA means you can have the same serverless applications running on any certified Kubernetes distribution, from centralized hyperscale services all the way to edge implementations running on clusters of Raspberry Pis.

Kubernetes autoscaling for event-driven workloads

A Microsoft and Red Hat open source collaboration, KEDA, brings event-driven autoscaling to any Kubernetes cluster

Introducing KEDA: Kubernetes-based event-driven autoscaling

Deploying KEDA

Inside KEDA autoscaling

Getting your KEDA configuration right

More from this author

How Azure Functions is evolving

Understanding DiskANN, a foundation of the Copilot Runtime

AI development on a Copilot+ PC? Not yet

Inside today’s Azure AI cloud data centers

Microsoft Fabric evolves from data lake to application platform

Copilot Studio turns to AI-powered workflows

Understanding the Windows Copilot Runtime

Copilot Runtime: Building AI into Windows

Most popular authors

Show me more

Beyond the usual suspects: 5 fresh data science tools to try today

Generative AI won’t fix cloud migration

HR professionals trust AI recommendations

How to use dbm to stash data quickly in Python

How to auto-generate Python type hints with Monkeytype

How to make HTML GUIs in Python with NiceGUI