Scaling Kubernetes: Lessons from Production

Kubernetes in 2026 is a more capable platform than it was even a year ago, but scale still punishes hand-wavy decisions. The control plane is better, the resource model is getting richer, and workload identity keeps improving. None of that rescues a cluster with bad defaults, weak observability, or unclear ownership.

As of March 19, 2026, the latest upstream Kubernetes release is 1.35.3. That matters because some operational advice has changed: in-place Pod resize is now stable in 1.35, the project only maintains the latest three minor releases, and newer resource-management features are becoming practical for real production use.

Resource Requests Are Not Optional

The single most important baseline is still accurate requests. Without them, the scheduler is guessing, autoscaling signals are noisy, and your node utilization story is fiction. At scale, bad requests turn every incident into a scheduling problem.

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

A few rules of thumb still hold:

Requests should follow observed usage, not hope. Measure p90 and p95 behavior before standardizing defaults.
Memory limits need intent. Use them when you need hard isolation, not because every YAML template had one.
CPU limits deserve scrutiny. They still create throttling pathologies that are hard to detect until latency rises.

What is new in 2026 is that in-place Pod resize is now stable in Kubernetes 1.35. That makes vertical tuning less disruptive for some workloads, but it does not remove the need to size things sensibly up front.

Node Pools Are Your Friend

Heterogeneous clusters are normal now. General-purpose nodes, memory-heavy nodes, burst pools, and accelerator-backed pools all have different failure modes and cost profiles. Treating them as one undifferentiated fleet is wasteful.

System pool: small, protected nodes for cluster services
Application pool: steady-state nodes for most stateless workloads
Memory pool: higher-memory nodes for caches, queues, and data-heavy services
Interruptible pool: spot or preemptible capacity for batch and elastic workers
Accelerator pool: GPU or specialty hardware with explicit scheduling and quota controls

Use taints, tolerations, and affinity rules aggressively. The newer Dynamic Resource Allocation work in Kubernetes 1.34 and 1.35 is especially relevant if you are scheduling specialized hardware, because extended resources alone are no longer the whole story.

Identity and Network Boundaries Need To Be There From Day One

Default-open east-west networking is still one of the easiest ways to build a fragile cluster. Start with network policy early, and treat workload identity with the same seriousness. Kubernetes 1.35 also introduced pod certificates in beta, which is a strong signal that workload identity is becoming a more native concern.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}
  policyTypes:
    - Ingress

Start with deny-by-default and explicit allow rules. Retrofitting boundaries after teams have already built invisible dependencies is always more painful.

The HPA Isn't Magic

Horizontal Pod Autoscaler still works best when the metric maps directly to user pain. CPU is sometimes fine, but queue depth, in-flight requests, or request latency are often closer to what you actually care about.

Custom metrics through prometheus-adapter or equivalent integrations remain the difference between cosmetic autoscaling and useful autoscaling. We usually scale on:

queue depth for APIs and async workers
consumer lag for streaming and messaging systems
latency or saturation signals as guardrails, not just CPU

The best autoscaling metric is the one that directly represents the user-facing symptom you're trying to prevent.

Observability Is Non-Negotiable

You still cannot operate Kubernetes seriously without metrics, logs, traces, and clear ownership for alerts. The difference now is that richer control-plane telemetry is becoming more available upstream, including API server tracing in recent releases. That makes it easier to debug behavior that used to feel opaque.

Metrics: Prometheus + Grafana for cluster and application metrics
Logs: Structured logging piped to a central system (we use Loki)
Tracing: Distributed tracing with OpenTelemetry for request flows across services
Alerts: Symptom-based alerting, not cause-based. Alert on high error rates, not on "pod restarted."

The investment pays back the first time you can explain saturation, eviction pressure, or a noisy rollout without guessing.

Wrapping Up

Kubernetes at scale still rewards boring discipline. Accurate requests, explicit placement, real identity boundaries, sensible autoscaling, and first-class observability remain the difference between a platform and a science experiment. The 2026 features are useful, but they amplify good operations more than they replace them.