Kubernetes v1.36 Haru: What Changed and What to Do Now

Kubernetes v1.36 “Haru” shipped April 22 with 70 enhancements. The headline isn’t any of the new features — it’s what got killed. Ingress NGINX, the ingress controller that anchored a decade of Kubernetes tutorials and production configs, is dead. No more security patches. No more bug fixes. If you’re running it without a migration plan, that conversation should be happening now.

The rest of the release is genuinely good. DRA graduates to GA, HPA scale-to-zero ships enabled by default after seven years in alpha, and User Namespaces reaches stable. But the Ingress NGINX situation is the thing that matters most to the most people.

Ingress NGINX Is Gone. Now What?

As of March 24, 2026, Ingress NGINX is officially retired. The maintainer team couldn’t keep pace with the volume of CVEs at the scale the project reached — which, to be clear, is a testament to how widely it was deployed, not a failing of the maintainers. But the result is the same: if a new vulnerability surfaces in Ingress NGINX today, it doesn’t get patched.

Existing deployments keep running. Nothing breaks immediately. But you’re now accumulating technical security debt at an unknown rate.

The Kubernetes project’s answer is Gateway API — a more expressive, standardized successor to Ingress that handles multi-tenant routing, traffic splitting, and backend configuration through structured resources instead of annotations. The core APIs (Gateway, GatewayClass, HTTPRoute) reached GA in 2023. Implementations worth evaluating include kgateway, Envoy Gateway, Traefik, and Istio.

For teams that need a migration path, Ingress2Gateway 1.0 — released March 20, 2026 — automates the translation of over 30 common Ingress annotations into Gateway API resources. It audits your cluster, previews the output, and handles TLS secret migration. It won’t cover every edge case, but it handles the vast majority of standard configurations.

DRA Goes GA: GPU Scheduling Finally Works

Dynamic Resource Allocation (DRA) reached General Availability in v1.36, and this one matters for any team running AI or ML workloads on Kubernetes.

DRA replaces the extended resources model — the approach that required declaring GPU counts as opaque integers and offered no mechanism for requesting specific hardware configurations, fallbacks, or partitioned devices. With DRA at GA, cluster administrators can define device classes for GPUs and custom accelerators, and workloads can request devices with specific configurations and fallback alternatives. Kubernetes handles scheduling, placement, and assignment automatically.

Several capabilities ship enabled by default in v1.36: partitionable devices, consumable capacity, and device taints and tolerations. A new alpha feature, Workload-Aware Preemption, treats a PodGroup as a single preemption unit — which matters for distributed training jobs that fail expensively when only some of their pods get preempted.

NVIDIA donated its DRA driver for GPUs to the CNCF alongside this release. Cloud provider DRA driver support across GKE, EKS, and AKS is accelerating. If your team has been working around Kubernetes GPU scheduling limitations, this is the release to revisit.

HPA Scale-to-Zero: Seven Years in the Making

The HPAScaleToZero feature gate first appeared in Kubernetes v1.16 in 2019. It is now enabled by default in v1.36. Seven years is a long alpha.

The Horizontal Pod Autoscaler can now scale workloads to exactly zero replicas when idle and back up when demand returns. The caveat: scaling back up from zero requires an external metric source — KEDA is the standard choice.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 0
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: queue_depth
      target:
        type: AverageValue
        averageValue: "5"

The practical wins are in staging environments, test clusters, and batch workloads with clear idle windows. For services that genuinely go quiet overnight or between jobs, scale-to-zero eliminates the baseline compute cost without any manual intervention.

User Namespaces Is Now Stable

User Namespaces graduated to General Availability in v1.36. The feature isolates the user running inside a container from the host: a process that runs as root inside the container maps to a non-privileged user on the host. A container escape no longer hands an attacker host admin access.

Opting in is a single field:

spec:
  hostUsers: false

It works with standard runc runtimes and requires no kernel modifications. The security benefit is real and the operational overhead is minimal. There’s no longer a good reason not to use it.

Also in This Release

OCI VolumeSource (Stable): Reference any OCI image as a volume. For AI teams, this means packaging model weights and datasets as standalone OCI artifacts — clean separation of model versions from deployment artifacts.
IPVS Removed: Fully removed after deprecation in v1.35. Check your kube-proxy configuration if you ever enabled IPVS mode.
Mutating Admission Policies (Stable): The webhook-based admission model gets a standardized, more maintainable replacement.
Resource Health Status for Pods (Beta): kubectl describe pod now reports the health of allocated devices — useful when GPU assignment fails silently.

What You Should Do Now

If you’re running Ingress NGINX: audit your cluster with Ingress2Gateway, pick a Gateway API implementation, and set a migration timeline. You’re not in a fire drill yet, but you’re running unpatched infrastructure from a project that has formally ended.

If you’re running AI workloads on Kubernetes: review the DRA GA documentation and check whether your cloud provider has published a DRA driver. The old extended resources workarounds are now legacy.

If you have idle services burning compute overnight: set minReplicas: 0 on your HPA and wire up KEDA. Seven years of waiting is over.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.