Industry AnalysisCloud & DevOpsInfrastructure

Kubernetes Overprovisioning: 82% Waste Resources

The Komodor 2025 Enterprise Kubernetes Report, published in September, revealed a systemic crisis: 82% of Kubernetes workloads are overprovisioned, 65% use less than half of their requested CPU and memory, and only 7% hit accurate resource requests and limits. This isn’t a technical failure—it’s a cultural breakdown. Fear-driven “play it safe” resource allocation wastes billions of dollars while distorting scheduling, hiding performance hotspots, and complicating scaling decisions.

The Numbers Are Worse Than Expected

Komodor’s benchmark of 2,100+ production clusters exposes the scale of the problem. Average CPU utilization sits at 10%, memory at 23%—meaning nearly three-quarters of allocated cloud spend is idle. The 82% overprovisioning rate is staggering, but the truly damning statistic is that only 7% of workloads achieve accurate resource requests and limits. That means 93% of Kubernetes deployments are either wasting money (82%) or risking failures from underprovisioning (11%).

The waste varies by workload type. Jobs and CronJobs are the worst offenders at 60-80% resource waste, followed by StatefulSets at 40-60%. Moreover, even relatively well-managed Deployments and DaemonSets waste 40-50% of allocated resources. CAST AI’s 2024 Kubernetes Cost Benchmark Report found that 99.94% of clusters are overprovisioned, with CPU overprovisioning averaging 40% and memory hitting 57%. One industry expert captured the shock: “I was expecting something not great, but I was not expecting something that bad… On average, you have an 8x overprovisioning.”

This directly fuels the $44.5 billion in infrastructure cloud waste projected for 2025 by the FinOps Foundation. Komodor found that 90% of organizations are overspending on cloud resources, with 37% of IT teams needing to rightsize 50% or more of their workloads. These aren’t rounding errors—they’re systematic organizational failures affecting virtually every company running Kubernetes in production.

Root Cause: Fear Culture, Not Technology Gaps

Developers overprovision because they prioritize uptime over cost and lack feedback loops connecting production usage to resource settings. The tools to fix this exist—the problem is organizational. When developers set resource requests, they choose convenient round numbers: 500 millicores, 1GB of memory. However, these guesses rarely match actual application needs, but they feel safe. Fairwinds found that one in every two Kubernetes containers uses less than a third of its requested CPU and memory.

The organizational structure perpetuates the problem. Developers who set resource requests don’t see the financial impact of their choices. Furthermore, operations teams see the waste but can’t change the settings without risking application stability. Platform teams lose 34 workdays per year resolving issues caused by resource misconfigurations, according to Komodor’s report. As one Hacker News developer put it: “Scheduling a pod without request/limit is like giving a blank check.” Yet manual sizing without production data is exactly how most organizations operate.

Komodor CTO Itiel Shwartz framed the challenge correctly: “Organizations have made Kubernetes their standard, but our report shows the real challenge is operational, not architectural.” You can’t fix cultural problems with technical solutions alone. VPA, Goldilocks, and Kubecost exist but adoption remains low because organizations haven’t addressed the fear-driven behavior and siloed responsibilities causing overprovisioning in the first place.

Technical Consequences Beyond Wasted Dollars

Overprovisioning creates systematic technical problems that degrade performance and complicate operations. The Kubernetes scheduler makes placement decisions based on inflated resource requests, not actual usage. Consequently, this distorts scheduling, prevents efficient node packing, and hides real resource pressure behind false capacity constraints. Cluster Autoscaler can’t remove “unused” nodes that are actually underutilized because phantom resource consumption makes them appear full.

Performance suffers in counterintuitive ways. Kubernetes uses the Linux Completely Fair Scheduler, which throttles containers when they hit CPU limits—even if the node has idle capacity. Standard CPU utilization graphs can look fine while applications struggle with throttling-induced latency. A Hacker News case study illustrated the danger: a GraphQL implementation grew in complexity over two years without adjusting resource requirements. Nevertheless, when a different service consumed bursting CPU, the GraphQL service stopped processing requests and entered CrashLoopBackoff with failing liveness probes before autoscaling could react.

Memory limits create hard failures. Unlike CPU throttling, which degrades performance, exceeding memory limits triggers immediate OOM kills with no graceful degradation. Set limits too low and you get unexpected pod restarts. Set them too high and you waste money while masking real memory pressure. Therefore, the vicious cycle continues: developers pad resources to avoid failures, which masks actual resource needs, which prevents data-driven rightsizing, which perpetuates the padding.

Who’s Responsible? The Developer vs. Platform Debate

The Kubernetes community is divided on whether developers should manually tune resources or platform teams should provide automated tooling and enforcement. The manual camp argues that developers build the applications and should understand resource needs. Best practices include profiling workloads for 72 hours, using monitoring data from Prometheus and Grafana, and adjusting iteratively—reducing requests by 20% if average CPU consistently stays below 50%.

The automation camp counters that developers lack time, expertise, or incentive to tune resources properly. Manual recommendations create endless to-do lists that never get prioritized. Tools should handle this automatically: Vertical Pod Autoscaler adjusts CPU and memory requests based on historical usage, admission controllers like OPA Gatekeeper and Kyverno enforce policies preventing pods from deploying without proper limits, and ML-driven platforms like PerfectScale and StormForge continuously optimize resources without human intervention.

Kubernetes 1.33’s beta feature, InPlaceOrRecreate VPA mode, is a game changer. It allows the kubelet to resize resources allocated to running containers without restarting them. This eliminates the biggest objection to VPA—pod restarts causing downtime—and makes continuous rightsizing practical for production workloads. Indeed, the feature treats resource requests as “fluid, living values rather than static guesses,” enabling teams to pack nodes tighter and stop paying for wasted capacity.

The reality is that static recommendations don’t scale. Workloads change, traffic patterns shift, and manual adjustments lag behind reality. Most successful organizations implement both cultural change—making developers accountable for costs through FinOps attribution—and automation with guardrails that allow platform teams to enforce policies while giving developers flexibility within bounds.

The Path Forward Requires Both Tools and Culture Change

The tools to fix Kubernetes overprovisioning exist today. Short-term actions include auditing current state with Kubecost or Goldilocks, enabling VPA in “Off” mode to gather data for a week without applying changes, profiling workloads for a 72-hour minimum to understand actual patterns, and reducing requests incrementally—cutting by 20% if average CPU utilization is below 50%.

Mid-term strategies involve deploying admission controllers like OPA Gatekeeper or Kyverno to enforce resource policies, implementing namespace ResourceQuotas and LimitRanges to prevent runaway requests, enabling VPA for select workloads starting with non-critical services using in-place updates in Kubernetes 1.33+, and establishing developer accountability through cost attribution per team and service via FinOps tooling.

Long-term solutions require investing in automation over manual processes. ML-driven rightsizing from StormForge or PerfectScale handles the continuous tuning burden. Platform engineering teams provide golden paths with pre-configured resource policies embedded in Internal Developer Platforms. Policy-as-code approaches use version-controlled admission policies with gitops-based enforcement. The cultural shift makes developers accountable for cloud costs from the development stage, closing the feedback loop that currently allows waste to persist.

The 34 workdays per year that Komodor found platform teams lose to resource issues represents opportunity cost for building features. Organizations that automate rightsizing and enforce policies stop fighting uphill battles and redirect engineering effort toward value creation. This isn’t about achieving perfection—even reducing overprovisioning from 82% to 50% would save billions industry-wide. The path forward requires both technical implementation through VPA and admission controllers, and cultural change through developer accountability and trust in automation.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to simplify complex tech concepts, breaking them down into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *