Kubernetes capacity platform

CPU scheduling, enforced inside the kernel.

Temper is a Kubernetes capacity platform that replaces best-effort CFS arbitration with kernel-enforced QoS (Linux sched_ext). Latency-critical services keep their p99 while batch work soaks up the idle cycles — so you pack nodes tight without losing tail latency.

Get started → Read the deep dives

One helm install. Kernel-native rollback — kill the agent and the node reverts to the stock scheduler instantly.

memcached p99 vs. node load Temper flat · CFS 3.1×

Multi-node GKE, batch-filler ladder. Nodes at 0.81–0.92 utilization.

CFS Temper

Measured live, single run per arm; the full write-up with methodology and caveats is the sideloading deep dive. source: docs/training-artifacts/binpack/REPORT.md

−88% memcached p99 at the knee vs. CFS, heavy operating point deep dive: sideloading →

9.4×→~1.5× end-to-end p99 growth under load, 19-service DeathStarBench deep dive: service chains →

+72% pods placed at equal SLO on the same nodes with overcommit deep dive: sideloading →

<1% CPU overhead for always-on observation (measured 0.13% of one core) docs: observability →

Built on Linux sched_ext Runs on GKE & EKS today Zero external calls — fully in-cluster Kubernetes-native inputs, no CRDs

01 The problem

Every Kubernetes cost tool operates above the kernel.

Rightsizers predict usage. Placement engines move pods. Autoscalers resize fleets. All of it is a prediction — and the moment two pods share a node, the prediction is over and CFS arbitration begins.

The Linux Completely Fair Scheduler has no notion of which pod’s p99 matters. Pack a node tight and CFS decides — microsecond by microsecond — who eats the latency. A batch task that becomes runnable can take a timeslice from your revenue service at exactly the wrong moment, and that one delayed request is the tail spike your SLO dashboard shows.

This is why every cost-optimization product keeps its savings engine conservative: the utilization it can safely reach is capped by a scheduler it does not control. Temper occupies the layer none of them do — CPU arbitration inside the kernel’s scheduling path — and makes dense packing safe by enforcement instead of prediction.

SaaSRightsizers · FinOps platforms Predict usage, trim requests & limits. Container-granular.

k8sPlacement · autoscalers · bin-packing Move pods, resize fleets. Pack by requests, blind to on-node contention.

kernel boundary — every other vendor stops here

L0Temper — scx_layered (sched_ext) CPU arbitration inside the kernel scheduling path. Fence, loan, preempt on wake.

CFSStock Linux scheduler What every co-located pod inherits without Temper — and the automatic fallback if our agent dies.

02 The platform

Five capabilities, installed from one helm chart.

The node engine is the core; everything else is optional and individually switchable. Install any depth — from a single protected node to the full multi-cluster management plane.

a · node engine (L0)

Enforce QoS at the kernel scheduler

A node agent maps every pod to one of five QoS tiers — derived from standard Kubernetes PriorityClasses and resource requests, no CRDs and no app changes — and drives scx_layered, a Linux sched_ext scheduler, with a config computed from what your pods actually request. Critical tiers get fenced CPU; batch tiers get whatever is idle, and get preempted the instant a protected workload wakes.

Five tiers (Critical → Background) from pod.spec.priority — Kubernetes-native inputs only
Layer weights, CPU ranges, and utilization bands computed from real resource requests
Fail-safe by construction: agent death = instant kernel revert to the stock scheduler

Learn more →

b · node engine (L0)

Tune scheduling per thread group, inside one pod

A pod is not one uniform workload — a database has connection threads, I/O threads, and background purge threads with completely different needs. Workload Profiles give each thread group its own scheduling treatment inside a single pod: exclusive cores for the hot path, latency treatment for wake chains, yield for housekeeping. No product operating above the kernel can see thread structure, let alone schedule on it.

Builtin profiles for common shapes (e.g. PyTorch dataloaders, MySQL/InnoDB), plus file-based custom profiles
Auto-detection by container image or a single pod annotation
Training mode measures your workload and synthesizes a profile automatically

Learn more →

c · cluster intelligence (L1)

Pack more pods without breaking SLOs

Kubernetes bin-packs by declared requests — which are usually padded, because nobody trusts CFS with a tight node. With enforcement underneath, Temper’s optional placement layer packs by what protection capacity actually exists: a density-aware scheduler plugin reads per-tier load from node annotations, and an opt-in admission webhook scales down the CPU requests of non-critical pods so the bin-packer fits more of them. It never touches limits or the Critical tier, and every change is reversible.

Opt-in per namespace via a label; every mutation annotated with the original value
Packing and consolidation results measured live — deep dive: sideloading
Complement mode: one helm flag stands this layer down and Karpenter / Cast AI keep placement

Learn more →

d · management plane (L2)

Run the fleet from one in-cluster plane

A hierarchy explorer walks cluster → node → pod → container with live updates: logs, manifests with revision diffs, per-pod performance panels, and scheduling detail down to the layer a pod landed in. Operator actions — cordon, drain, rollout restart, safe mode, trace download — are role-gated and audit-logged. A savings view splits realized reclaim from identified opportunity, priced per machine type.

Viewer / operator / admin roles, named tokens, audit export
Multi-cluster hub via a peer registry — per-cluster data planes stay self-contained, air-gap friendly
Versioned REST API (/api/v1, OpenAPI); the built-in UI uses the same public API

Learn more →

e · observability

See threads, not container averages

An always-on observation layer samples per-pod, per-thread placement and runqueue telemetry at under 1% CPU overhead — measured, not promised. A placement linter continuously checks scheduling invariants, and on-demand kernel trace capture gives you bounded perfetto traces without installing anything extra. The same thread-level data feeds a rightsizer that sees what container averages hide: one hot thread that needs an exclusive core.

Prometheus /metrics, a JSON /observe snapshot, and generated Grafana dashboards
Placement linter with invariant checks exported as metrics
Thread-aware rightsizing recommendations — a class above container-average rightsizers

Learn more →

03 The engine

Three layers, each consuming only the one below.

L0 is a helm-installed DaemonSet and the whole story works with it alone. L1 and L2 are optional and individually switchable — which is why running under Karpenter or Cast AI and running standalone are the same codebase.

Inside the scheduler, not above it. Other capacity products predict contention from above the kernel. Temper stands in the kernel’s CPU scheduling path and arbitrates it — the difference between suggesting who should run and deciding who runs.

Thread-group granularity. Container-level tools see one number per pod. Temper schedules the threads inside a pod differently — exclusive cores for a hot loop, latency treatment for a wake chain — because at the kernel layer, threads are what actually exist.

Fail-safe is the kernel’s contract, not ours. When the BPF scheduler detaches for any reason — crash, kill, upgrade — the kernel atomically reverts to the stock scheduler. The worst case is the scheduler you already run today. Read the architecture →

04 How it connects

Observe first, then enforce.

Installation is deliberately boring: a DaemonSet, a ConfigMap, and Kubernetes-native inputs. Nothing changes for your workloads until you assign a PriorityClass.

STEP 01

helm install

One chart deploys the node agent DaemonSet and, optionally, the dashboard. The agent verifies each node’s kernel and attaches the scheduler; nodes that can’t run sched_ext simply stay on the stock scheduler.

helm install temper deploy/helm/temper -n temper --create-namespace

STEP 02

Observe — zero enforcement

With no PriorityClasses assigned, workloads land in default tiers and behave as before. Meanwhile the observation layer streams per-pod, per-thread placement telemetry to /metrics and /observe — you see the contention before you act on it.

kubectl get node NODE -o jsonpath='{.metadata.annotations}'

STEP 03

Assign PriorityClasses

Add priorityClassName: temper-critical to the services whose p99 matters. The agent recomputes the layer config from their real resource requests and the kernel starts enforcing. That is the entire integration.

priorityClassName: temper-critical

Safety first. One annotation — temper.codes/safe-mode-requested — is a fleet-wide kill switch that stands the scheduler down and returns every node to stock CFS. We benchmark the failure paths too — agent kills under load, an 8-hour soak, churn storms. Deep dive: failure modes & rollback behavior →

05 Deep dives

Measured results, written up as technical reports.

Per-scenario articles from committed benchmark records, each stating experimental setup, results, and limitations. All reports →

Sideloading · memcached, redis, packing p99 flat at 0.92 util

Density ladders on memcached and redis: batch work fills protected nodes while the protected workload’s p99 remains flat. CFS proportional weights do not bound wakeup latency under contention.

Read the deep dive → CPU limits · PostgreSQL, MySQL, Cassandra 33.6 → 4.3 ms p99

Quota-throttle latency cliffs verified with kernel throttle counters, the cpu.max enforcement semantics under sched_ext, and a direct quota-parity consumption measurement.

Read the deep dive → Accelerators · PyTorch on NVIDIA L4 +67% samples/s

CPU-side descheduling of data-loading threads idles the accelerator: measured PyTorch/L4 degradation under density, a comparison against standard Kubernetes remedies, and a negative result on GPU-bound serving.

Read the deep dive →

06 FAQ

Common questions.

Is it safe to put something in the kernel’s scheduling path?: sched_ext was designed for exactly this: the kernel’s contract is that a misbehaving or detached BPF scheduler causes an instant, atomic revert to the stock scheduler, so the worst case is the scheduler you run today. We benchmark the failure paths, not just the happy path. Deep dive: failure modes & rollback behavior →
What happens if the agent dies?: The scheduler keeps running without it, the replacement agent (it is a DaemonSet) re-attaches when Ready, and degradation — if any — is per-node, never per-cluster. We force-killed it under load and published the numbers. Deep dive: failure modes & rollback behavior →
Does it work with Karpenter or Cast AI?: Yes — that is complement mode, and it is the default: your existing tool decides where pods go, Temper decides who gets the CPU when the node is busy (complement mode). The measured Karpenter-in-both-arms result is in the capacity write-up. Deep dive: sideloading →
What kernels and platforms does it need?: A node kernel ≥6.12 with CONFIG_SCHED_CLASS_EXT=y plus BTF, and permission to run a privileged DaemonSet — verified live on GKE Standard ≥1.36 and EKS. AKS and Autopilot-style managed modes are not supported, and the matrix explains why. Full matrix →

CPU scheduling, enforced inside the kernel.

Every Kubernetes cost tool operates above the kernel.

Five capabilities, installed from one helm chart.

Enforce QoS at the kernel scheduler

Tune scheduling per thread group, inside one pod

Pack more pods without breaking SLOs

Run the fleet from one in-cluster plane

See threads, not container averages

Three layers, each consuming only the one below.

Observe first, then enforce.

helm install

Observe — zero enforcement

Assign PriorityClasses

Measured results, written up as technical reports.

Common questions.

Try it on a real cluster.