SLO-safe density & overcommit

Kernel enforcement makes tight packing safe; the optional L1 layer converts that safety into capacity. Both components are off by default — complement mode — and each is one helm flag away.

Naming note. Binaries, the helm chart, and annotation keys currently ship under the project’s former name (infera); the commands below are what works today. A rename migration is planned.

Complement mode (the default)

Out of the box, Temper does not touch placement. Karpenter, Cast AI, Cluster Autoscaler, or the stock scheduler decide where pods go; Temper decides who gets the CPU once they share a node. This is deliberate: the node engine needs only Kubernetes-native inputs, so it works under any placer, and your existing tool’s aggressive mode stops being scary — the blast radius of a wrong prediction becomes batch throughput, not the p99 of a revenue service.

Measured with Karpenter as the autoscaler in both arms: same load, same SLO, −40% provisioned vCPU (12 vs 20). Temper made the tighter packing hold.

# complement mode is simply the default:
helm install infera deploy/helm/infera -n infera --create-namespace
# scheduler.enabled=false, webhook.enabled=false

The density-aware scheduler plugin

When you do want Temper placing pods, the scheduler plugin extends kube-scheduler with awareness of L0’s enforcement state. It reads the per-tier load annotations every agent publishes (pod counts and CPU millis per tier) and scores nodes by where protection capacity actually exists — contention-aware for latency-critical pods, packing-aware for batch.

helm upgrade infera deploy/helm/infera --reuse-values \
  --set scheduler.enabled=true

The overcommit webhook

Kubernetes bin-packs by declared CPU requests, which are usually padded. The overcommit webhook shrinks that padding where it is safe:

helm upgrade infera deploy/helm/infera --reuse-values \
  --set webhook.enabled=true

kubectl label namespace batch infera.io/overcommit=enabled

Why this is safe with Temper underneath and reckless without it: overcommitting requests means more runnable threads per node, which under CFS translates directly into latency-critical tail damage. With the kernel fence in place, the extra pressure lands on the Open layers — batch absorbs it, protected tiers do not.

Measured results

RunResultSource
Overcommit on 3 nodes, equal SLO (p99 ≤ 1.87 ms)31 vs 18 pods placed (+72%)docs/training-artifacts/binpack/REPORT.md
16-pod fleet consolidation3 → 2 nodes (33% shrink), p99 spot-check 1.56 msdocs/training-artifacts/binpack/SAVINGS-REPORT.md
Karpenter + Temper vs Karpenter alone−40% provisioned vCPU at equal load and SLOdocs/training-artifacts/karpenter/REPORT.md

Caveats and methodology travel with each number on the benchmarks page.

Undoing it

Remove the namespace label to stop new mutations; already-admitted pods carry their original request values in annotations and pick them back up on the next rollout. The whole layer can be stood down with webhook.enabled=false / scheduler.enabled=false without touching L0 enforcement.