SLO-safe density & overcommit
Kernel enforcement makes tight packing safe; the optional L1 layer converts that safety into capacity. Both components are off by default — complement mode — and each is one helm flag away.
infera); the commands below are what works today. A rename migration is planned.Complement mode (the default)
Out of the box, Temper does not touch placement. Karpenter, Cast AI, Cluster Autoscaler, or the stock scheduler decide where pods go; Temper decides who gets the CPU once they share a node. This is deliberate: the node engine needs only Kubernetes-native inputs, so it works under any placer, and your existing tool’s aggressive mode stops being scary — the blast radius of a wrong prediction becomes batch throughput, not the p99 of a revenue service.
Measured with Karpenter as the autoscaler in both arms: same load, same SLO, −40% provisioned vCPU (12 vs 20). Temper made the tighter packing hold.
# complement mode is simply the default:
helm install infera deploy/helm/infera -n infera --create-namespace
# scheduler.enabled=false, webhook.enabled=false
The density-aware scheduler plugin
When you do want Temper placing pods, the scheduler plugin extends kube-scheduler with awareness of L0’s enforcement state. It reads the per-tier load annotations every agent publishes (pod counts and CPU millis per tier) and scores nodes by where protection capacity actually exists — contention-aware for latency-critical pods, packing-aware for batch.
helm upgrade infera deploy/helm/infera --reuse-values \
--set scheduler.enabled=true
The overcommit webhook
Kubernetes bin-packs by declared CPU requests, which are usually padded. The overcommit webhook shrinks that padding where it is safe:
- Opt-in per namespace — the webhook only acts in namespaces you label
infera.io/overcommit=enabled. Everything else is untouched. - Requests only, never limits — the pod’s ceiling is unchanged; only the scheduler’s packing input shrinks by the configured factor.
- Never the Critical tier — latency-critical pods keep their full requests.
- Reversible and audited — every mutation records the original value in an annotation on the pod, so reverting is mechanical and the change history is inspectable.
helm upgrade infera deploy/helm/infera --reuse-values \
--set webhook.enabled=true
kubectl label namespace batch infera.io/overcommit=enabled
Why this is safe with Temper underneath and reckless without it: overcommitting requests means more runnable threads per node, which under CFS translates directly into latency-critical tail damage. With the kernel fence in place, the extra pressure lands on the Open layers — batch absorbs it, protected tiers do not.
Measured results
| Run | Result | Source |
|---|---|---|
| Overcommit on 3 nodes, equal SLO (p99 ≤ 1.87 ms) | 31 vs 18 pods placed (+72%) | docs/training-artifacts/binpack/REPORT.md |
| 16-pod fleet consolidation | 3 → 2 nodes (33% shrink), p99 spot-check 1.56 ms | docs/training-artifacts/binpack/SAVINGS-REPORT.md |
| Karpenter + Temper vs Karpenter alone | −40% provisioned vCPU at equal load and SLO | docs/training-artifacts/karpenter/REPORT.md |
Caveats and methodology travel with each number on the benchmarks page.
Undoing it
Remove the namespace label to stop new mutations; already-admitted pods carry their original
request values in annotations and pick them back up on the next rollout. The whole layer can be
stood down with webhook.enabled=false / scheduler.enabled=false
without touching L0 enforcement.