FinOps for Kubernetes: How to Reduce Costs in K8s Environments

Omri Tsabari Head of FinOps @ GlobalDots
12 Min read

Kubernetes, also known as K8s, is a container orchestration platform built on open-source principles. Offering a hard-hitting combination of container management and load balancing across multiple hosts, software developers are provided a major boon. Intricate deployment tasks can now be automated, accelerating development and release cycles. 

As a result, Kubernetes is one of the fastest growing projects in the history of open-source software. Between 2020 and 2021, the number of engineers using K8s swelled to 3.9 million — now encompassing almost a third of all backend developers. It’s just not speed, either: K8s architecture grants organizations a path toward high-efficiency resource allocation. 

How One AI-Driven Media Platform Cut EBS Costs for AWS ASGs by 48%

How One AI-Driven Media Platform Cut EBS Costs for AWS ASGs by 48%

While K8s’ remarkable flexibility promises the world, many organizations have excitedly spun up new clusters — only to quickly find themselves mired in the depths of such hyper-complex, deeply customizable software. This mountainous learning curve is compounded by a chronic lack of skilled staff: 61% of companies have made the decision to go ahead with K8s adoption regardless of staff shortages. As a result, Kubernetes mismanagement and inefficiencies are widespread. Without a granular understanding of the platform, Kubernetes cost optimization can feel like a complete shot in the dark. And the weight of external factors is only growing heavier — in the midst of the Eurozone’s shrinking GDP, and increasing wariness around a global recession, today’s tectonic economic shifts have placed even greater pressure on organizations to crack the K8s code.

What Is FinOps for Kubernetes?

As organizations begin to grow accustomed to their cloud strategies, the shadow of such expenditure often begins looming. FinOps is an advanced form of cloud optimization that looks beyond isolated prices: instead, it identifies and streamlines the operational practices that help keep cloud infrastructure lean and cost-efficient. 

While many find the complexity of Kubernetes to be a thorn in the side of their cost optimization efforts, K8s architecture innately lends itself to FinOps efficiency — for instance, not only is Kubernetes flexible enough for container management across a broad range of infrastructure types, but it also supports virtually any type of program that can run containers. This is a vast change from most other orchestrators, which often keep customers tied to specific cloud infrastructures. Kubernetes’ open source ethos allows for lean development that extends beyond its own platform, and rids a lot of required re-architectural work.

Fundamentally, the difference between Cloud FinOps and its Kubernetes counterparts stems from abstraction. Cloud services segment their invoices in a more understandable way, detailing the prices of instances or volumes that are tagged within the cloud environment. Kubernetes, on the other hand, works in a more short-lived manner. Nodes may only exist for a day at a time, and the transience of its resources make it incredibly difficult to identify which costs are attributed to which system, particularly in the context of shared resources. That makes it hard to allocate spend to cost per customer and costs across different environments. The abstraction between the cloud API and what’s being deployed increases the challenge of K8s so that even those deeply familiar with cloud FinOps may need deeper visibility into their Kubernetes costs.

The Challenges of Controlling Kubernetes Costs

Kubernetes can be a minefield of unexpected costs. This is largely due to the sheer abstraction facing any attempt at expenditure wrangling. While AWS instances can be individually chopped and changed, access to underlying K8s nodes is provided only via the Control Plane — meaning you can’t really manage any nodes or workloads directly. The nodes are best viewed as individual units of computational power; clusters of these are controlled via continuous back-and-forth communication between the nodes and the Control Plane. In the midst of constant moving parts and intricate architecture, it’s vital to keep an eye on the specific challenges facing your attempts at Kubernetes cost optimization.

Fluctuating Resource Demands

One of the core challenges faced by DevOps is efficiently configuring Kubernetes around CPU and memory. When configuring, a developer must decide what settings to put in place. If CPU and memory resources are set too low, the application is crippled from the inside. On the other hand, if set too high, the organization is left writing checks for unnecessary resources.

Applications rely on different types of resources, spanning from common ones like CPU and memory to more specialized GPU accelerators and high-speed storage. The typical Kubernetes ecosystem encompasses a hugely diverse range of workloads, each with its own resource demands. The Kubernetes scheduler is responsible for assigning the smallest computing units (pods) to your organizations’ various nodes. This is done by determining which nodes are most suitable for each relevant pod placement. By scheduling pods in a queue according to pre-set requirements and available resources, each node is kept fully provisioned. 

Understandably, as a result of the ever-changing requirements, organizations are often fearful of simply cutting costs — the risk of under-provisioning is simply too high. And while reducing costs must not come at the expense of user experience, this is easier said than done. Manual rightsizing demands an intricate, on-the-ground knowledge of the appropriate resources for every request and workload. Not only is this a steep burden for an understaffed field, but the remoteness of each node further heightens the difficulty.

Parameter Problems

As a result of the large fluctuations in resource requirements, one of the major challenges presented by Kubernetes is the sheer flexibility of request and limit parameters to put in place. These parameters help to effectively allocate resources to workloads within a node. Placing multiple pods onto a node needs to be done with great care, however: if one of them lacks the defined request and limit specifications, this pod can begin consuming all available resources associated with that node. When this occurs, other pods are starved of their own resources, and requests begin to fail as one workload suddenly demands far greater resources. By appropriately defining the request and limit parameters for each pod, resource allocation can be optimized, ensuring fair distribution and preventing accidental resource monopolization.

Traffic Scaling

Many applications experience uneven traffic patterns. For example, end-users may engage with an application primarily during the day. Throughout the night, on the other hand, data center servers for that application are sat idle. This is a relatively easy challenge to fix, on the surface. Well-understood traffic spikes can be manually scaled up, allowing for back-end devs to anticipate increased users from ad campaigns and weekend traffic. 

However, the primary challenge facing scaling is far more corrosive than short-term traffic jumps. Over time, as active clusters undergo new deployments and periodic scaling back — over and over again — inefficiencies begin to accrue. As pods and nodes are added and removed from the cluster, resource fragmentation begins to build up. Inconsistencies in the scheduling of each pod can easily create a faux resource crunch, as — even if a cluster may technically have the required capacity across all nodes — each node may be restricted from fulfilling all of the resources demanded by a request. As a result, the pod can’t be scheduled for that request. This chronic issue is vastly underdiagnosed, due to the fact that even medium-sized Kubernetes stacks are simply too complex to manually identify — never mind consolidate all those resources into one usable pod.

Cost Collaboration

FinOps demands cross-collaboration, doing away with siloed teams that heavily segment the purchasing process. The idea of service ownership — where DevOps provide developers the tools and guidance they need to build, deploy and own an application from start to maintenance — is a cornerstone of FinOps culture. Collaborative service ownership is one area where K8s provides a unique opportunity for real FinOps development: Kubernetes clusters natively include core services that different teams mutually benefit from, such as a central control pane and log service. And while technical responsibilities are clear-cut, many organizations today are battling with the question of how to divide up and communicate on shared costs.

The challenge, therefore, is getting everyone on the same page. It is crucial for organizations to establish clear guidelines on how to allocate these costs among teams. They need to decide whether the costs should be distributed evenly, proportionate to usage, or managed through a separate central cost center. Developers must have a comprehensive understanding of the overall cost of their applications and ensure that these costs align with their own pricing KPIs. In the past, before the introduction of Kubernetes, businesses could rely on cloud cost tools to gain visibility into any underlying cloud infrastructure. However, with the adoption of Kubernetes, a new layer of complexity emerges in cloud cost management. Unexplainable via traditional cloud cost monitoring tools, Kubernetes quickly descends into a black hole of cloud cost.

One underlying reason for the sheer scale of this issue is the fact that K8s operates on a wholly new cost model. Instead of purchasing a large number of servers and waiting for their installation — followed by  deployment three months later – the approach is shifted towards on-demand purchases. From a financial perspective, this is a significant change – resources are now rented by the hour, minute, or even second. This shift requires a deep understanding of cost management, escalation, and handling by both the development and operations teams. This is the essence of FinOps, a novel practice that emerges at the intersection of comprehending cost dynamics, managing them effectively, and aligning infrastructure requirements with development and operations tasks.

On the observability side of things, you need to tick off two key boxes: clear FinOps observability, and accessible dashboarding. K8s observability is not effective if the right people can’t view the real-world outcome of their cost optimization attempts; and even the most relevant dashboards are useless if they don’t provide enough real insight. Once this challenge is overcome, it finally becomes possible to begin optimizing costs. 

Finally, organizations must still understand that K8s costs are still closely linked to the cloud vendors they run on. In the private cloud, organizations are responsible for providing their own insights into the costs of running a Kubernetes stack, including hardware, software, and labor. Major cloud providers offer different resource purchasing options, including discounted price options for modified service contract terms. These options apply to Kubernetes just as they would to non-containerized infrastructure. There’s a reason why Kubernetes is often dubbed as the most complex form of FinOps.

Despite the challenges, GlobalDots already has a track record in slashing Kubernetes cost (see below for a case study). With a sharp eye for innovative tools, we’re proud to provide cutting-edge FinOps guidance for those seeking leaner, more accessible cloud cost.

How to Track and Manage Kubernetes Costs 

Managing Kubernetes costs starts at the ground level, and visibility is king here. Whether this is made accessible to teams via a gamified ranking of how optimized each team is, or through more traditional forms of education on optimization opportunities, the following data points should be clearly and easily accessible for every relevant team member:

  • Memory, CPU, and disk usage
  • What jobs are running — and where
  • How traffic is moving through the system
  • The costs of everything outside of compute resources — including storage, data transfer and networking
  • A map how things run on the cluster
  • How much an application costs to run over time
  • A complete picture of your K8s costs hour-by-hour

Many Kubernetes cost trackers do not provide a deep enough level of granularity.  As such, you’re left unable to break costs down into individual impacts. This results in lackluster FinOps attempts for Kubernetes, as the big picture for your bottom line is still heavily blurred. Instead, organizations need to take a cohesive, multidisciplinary approach that cultivates both technical efficiency and cultural responsibility. 

Here are four major paths that GlobalDots have identified as the most promising: 

Autoscaling

An early discovery made by many hopeful FinOps explorers is Karpenter. This is a Kubernetes cluster autoscaler offered by Amazon Web Services (AWS); responsible for monitoring the changing loads placed on AWS-based clusters, this allows organizations to rapidly launch right-sized compute resources. Karpenter helps remove one of the first barriers to FinOps for Kubernetes — that is, the threat of resource throttling. Before Karpenter, organizations were forced to adjust the capacity of their clusters with an Amazon EC2 Auto Scaling group. This demanded heavy configuration and hundreds of extra node groups. Now, however, organizations are spared from the threat of under-provisioning. 

Karpenter goes further than removing that blocker, however. It also offers a number of key ways to consolidate computational power. By continuously identifying oversized nodes, pods can be drastically streamlined. Identifying and deleting any empty nodes that are still in parallel allows organizations to finely tune their architecture in a way that far surpasses manual processes. 

Horizontal Scaling 

while Karpenter helps eradicate excess nodes, horizontal scaling can help automatically scale workloads in order to match real-time demand. Horizontal Pod Autoscaler (HPA for short) automatically updates pod resources to closely match demand.

Put simply, horizontal scaling is like adding more identical items to a collection. Imagine a collector that wants to make his set of toy cars bigger. He could simply buy more toy cars of the same kind and add them to the box. With horizontal scaling, you increase the number of instances (or replicas) of a particular application or service. Each instance can handle a portion of the workload, and by adding more instances, you distribute the work among them.

HPA works by continuously monitoring the metrics server. This lends insight into actual resource usage, which is used to calculate the desired number of replicas required. From there, HPA handles the scale up process. HPA is only responsible for horizontal scaling, however. This means that the only response it can call upon is to increase or decrease the number of pods. This is different from vertical scaling — which requires its own tool.

Vertical Scaling

Vertical scaling is like making a single toy bigger and more powerful. Picture a remote-controlled car that you can upgrade. You can install a more powerful battery, upgrade the motor, or add extra features to make it faster and more capable. Similarly, with vertical scaling, you enhance the capabilities of a single instance by increasing its resources. You might give it more CPU power, memory, or storage to handle a greater workload.

Many organizations rely solely on CPU-based horizontal scaling far beyond its use-by-date. For instance, in cases where many workers are idling with low CPU usage, HPA autoscaling often fails to kick in. After all, if the CPU usage of every node is the only thing being monitored, then HPA considers all perfectly fine. In reality, however, there could be a growing backlog of messages. This situation introduces delays and allows a traffic jam to form. By relying singularly on horizontal scaling, companies often artificially inflate their own resource requests. Vertical scaling tools such as KEDA allow for more intelligent resource allocation. When an event is detected, KEDA determines whether it should trigger a scaling action. It evaluates rules and conditions configured by the user to decide whether scaling is needed. In our example, if there are more than a certain number of messages in a queue, it can trigger scaling.

While just a few tools can make an incredible difference to your Kubernetes efficiency, there needs to be a high degree of cohesion between each one. And while there is no silver bullet for K8s cost, GlobalDots’ innovation hunting has uncovered the next best thing.

Automated Rightsizing and Node Optimization

Each step of K8s FinOps maturation is defined by its ability to zoom out — from pods to nodes to clusters, every area needs to be individually accounted for. 

GlobalDots’ has uncovered a tool that evolves this automated rightsizing. With it, pods can be automatically consolidated onto more appropriate nodes via a deep, contextual understanding of compute efficiency. The same tool enhances Kubernetes efficiency beyond individual elastic pod resizing: by optimizing HPA triggers, for instance, the best number of replicas of every workload can always be maintained. This drives your infrastructure’s adherence to SLAs. 

This tool also matches your own priorities: dedicated to complete workload protection, before this tool initiates any scaling, it performs a rapid simulation to evaluate the impact of removing or adding nodes to the cluster in question. All of this granular knowledge is granted to every relevant team member via an accessible dashboard. Unlock crystal-clear Kubernetes visibility and jumpstart your FinOps maturation with real-time alerts and recommendations. 

How GlobalDots Cut 91% Of Kubernetes Cost

A major FinTech firm was battling with their unruly and opaque Kubernetes setup. Spending $816,000 per annum, the company was relying on anything between 70 to 120 worker nodes to serve their customers.

Turning to GlobalDots for help, they experienced an immediate and impressive consolidation in their Kubernetes architecture — with our choice of easy-to-implement solutions — the FinTechOps team was granted real-time insights into their wider K8s cost structure. This made the financial impact of every design decision deeply accessible. Establishing a real-time view of what resources were genuinely required, the solution’s autonomous pod rightsizing could begin crunching up their K8s cost. These in-place updates ensured that every request — and the limits the firm paid for — were fully optimized and tied tight to their real usage. Resource consumption anomalies and alerts were able to be introduced, cementing this new lean layout. Post-optimization, their annual costs are now $72,000. This is one example of how GlobalDots’ multi-pronged approach to FinOps has allowed organizations to take back control over their cloud costs. Other leaders such as SentinelOne, Gong, and Playtika have already made major strides toward deep visibility and lean resource provisioning. To discuss how your Kubernetes efficiency can be accelerated at the pace of autonomous optimization, get in touch today.

Latest Articles

Project FOCUS: A New Age of FinOps Visibility

It’s easy for managers and team leaders to get caught up in the cultural scrum of FinOps. Hobbling many FinOps projects, however, is a lack of on-the-ground support for the DevOps teams that are having to drive this widespread change – this is how all too many FinOps projects become abandoned on the meeting room […]

Nesh (Steven Puddephatt) Senior Solutions Engineer @ GlobalDots
27th March, 2024

Unlock Your Cloud Potential

Schedule a call with our experts. Discover new technology and get recommendations to improve your performance.

Unlock Your Cloud Potential