Quantcast
Channel: Backstage With Architects - Medium
Viewing all articles
Browse latest Browse all 17

How Autoscaling Works in Kubernetes? Why You Need To Start Using KEDA?

$
0
0

Kubernetes is a very vast topic. The most common question asked when building cloud native apps is on Scaling ⬆️.

What is scaling? What should we do to implement an efficient scaling practice? And does Kubernetes help us in this regard?

Well well, no need to get worried because I will be sharing some insights on application autoscaling and touch upon some real-life challenges faced when using the K8s autoscaler (last point of this article). So tighten up your seatbelts and get ready for the ride!!

Scaling is the process of provisioning your application in such a way that it can handle changes in the incoming load like a charm. There are two types of scaling viz. cluster & application level.

I’m sure you’ve heard of the Horizontal Pod Autoscaler or HPA. Its the first thing that shows up once you dive into the realm of autoscaling in Kubernetes. HPA performs autoscaling based on CPU, memory, or any external metric source.

While on the surface HPA seems to be perfect, there are a few challenges which limit it’s use for modern application.

Let’s understand where K8s HPA lacks?

HPA offers only CPU and Memory as sources out-of-the-box!

This can be a huge problem for most modern applications. It very likely that your microservice would be communicating via. one the following ways:

  1. Directly via RESTFul or gRPC APIs.
  2. Indirectly via a messaging broker like RabbitMQ.

To maintain a good QoS and prevent swamping your service when load spikes, you implement some kind of rate limiting functionality. In HTTP based APIs we use API rate limiters. For messaging brokers, we limit the number of events our service can process simultaneously.

In either case, these mechanisms prevent CPU / Memory consumption from skyrocketing making those metrics incorrect for scaling purposes.

Adding new metrics to HPA is hard.

The solution seems simple right? Simply add new metric sources like the pending message queue size to the HPA?

Not really 😔!

Adding new metrics sources is hard. And honestly, it just seems like too much effort to put it for such a tiny problem.

HPA cannot scale down to zero!

Most of you probably don’t need this. But I am a heavy user of event driven architectures. Quite a few of my pipelines are asynchronous. This means that when the load on my system is zero, I can scale down my background tasks to zero to save on costs.

Do you find this feature necessary? Let me know in the comments below!

Due to the way HPA’s scaling algorithm works, it isn’t possible to scale your applications back up from zero.

HPA scaling algorithm

If your currentReplicasbecome zero, which it would when you scale to zero, your multiplier will become zero as well. This means no matter how high your load is, your desiredReplicas will always be zero.

Okay okay! Enough backstabbing HPA for now. Don’t worry, we have got an amazing Open source project to help us out!

We have got KEDA!

What is KEDA? How will it make our life easy 😊 ?

  1. KEDA is a Kubernetes-based Event-Driven Autoscaler.
  2. It provides 30+ built-in scalers for Kubernetes resources so that we don’t have to worry about writing custom adapters for the various metric sources we need.
  3. KEDA gives you the awesome feature of scaling your resources to zero!! Yes, I am not kidding. KEDA can scale your resources from 0 to1 or 1 to 0. Scaling from 1 to n and back is taken care of by HPA.
Architecture using KEDA as a scaler

Let’s talk about KEDA’s architecture!

The architecture of KEDA is simple and pretty easy to understand. KEDA has 3 important components i.e. Metric adapter, Controller Scaler.

Keda’s Architecture
  • Scaler: Connects to an external component (e.g., RabbitMQ) and fetches metrics (e.g., Pending Message Queue Size).
  • Operator (Agent): Responsible for “activating” a Deployment and creating a Horizontal Pod Autoscaler object.
  • Metrics Adapter: Presents metrics from external sources to the Horizontal Pod Autoscaler.

How KEDA achieves “0 to 1" and “1 to 0" scaling 🧐 ?

  • Whenever KEDA’s metric adapter senses that no load exists, it scales down the Deployment to zero.
  • When there is no application running and the Metric adapter senses load, it scales the Deployment from 0 to 1.

How can I use KEDA? Do I have to write a lot of configuration for it to work?

The answer is - not really.

KEDA provides a single CRD for all types of mapping between scaler metric and resource autoscaling logic. Let’s take the example of using Prometheus as a metric source for scaling our deployment.

Prometheus has become a standard place for storing Kubernetes metrics. So in the ScaledObject we have to add a PromQL query for autoscaling.

Just by adding a single line of PromQL query, we can now autoscale our resources based on the value the query returns.

Let’s now dive into some of the configs of ScaledObject

  • pollingInterval: This is the interval to pull metrics on. KEDA will check each trigger source on that ScaledObject and scale the deployment up or down accordingly.
  • coolDownPeriod : It the period to wait before scaling down to zero occurs.
  • minReplicaCount: Minimum replicas your application demands. By default, it is 0.

Well, the Prometheus autoscaler was really cool!. But now things will become more interesting when we see the next scaler in action.

At the time of sales, Amazon and other e-commerce platforms face a lot of traffic. So with the help of KEDA, we can autoscale based on a cron schedule. We can basically preemptively scale up the replica count on a fixed schedule.

In the above config, we have to specify the timezone and time span during which KEDA will scale your resource to the desired replicas.

With the help of the cron scaler, you can prepare to manage traffic in advance.

Let’s now talk about some REAL LIFE CHALLENGES one might face while using KEDA and how to overcome them?

Let’s say your application is running a video transcoding job & every workload is transcoding a video which takes around 8 hours & your queue contains 1000's of such events. The following image demonstrates the progress bar for each video.

Now the problem will start when Queue events start slowing down, HPA starts to scale down your resource

The problem is that HPA doesn't know the progress of transcoding each video, so it’s just going to snap its fingers like Thanos and will kill a random worker.

To avoid this problem you can have 2 solutions :

  1. Use Kubernetes lifecycle management hooks SIGTERM , which you can leverage to delay termination.
  2. Instead of creating deployment with respect to the event, we can create JOB and scale that using KEDA’s ScaledJobobject. This way, it can control the parallelism & those jobs can run till completion.

Conclusion:

  • KEDA is a lightweight component that can be added to any Kubernetes cluster to extend its functionality. It solves the modern world application autoscaling problems that K8s HPA is unable to solve.
  • It provides extensibility by giving the ability to write custom event sources.
  • KEDA can scale from “0 to 1” and back depending on workloads, optimizing the cost of infrastructure.

References:


How Autoscaling Works in Kubernetes? Why You Need To Start Using KEDA? was originally published in Backstage With Architects on Medium, where people are continuing the conversation by highlighting and responding to this story.


Viewing all articles
Browse latest Browse all 17

Latest Images

Trending Articles





Latest Images