Safety & rollback: the kernel takes back over

The question that decides whether kernel-level enforcement is adoptable is not the upside — it is the failure modes. This page is the complete honest list, with measurements.

Naming note. Binaries, the helm chart, and annotation keys currently ship under the project’s former name (infera); the commands below are what works today. A rename migration is planned.

The fail-safe is the kernel’s contract

sched_ext was designed so that a BPF scheduler cannot take the system down with it: if the scheduler misbehaves, stalls, crashes, or detaches for any reason, the kernel ejects it and atomically resumes scheduling with the stock scheduler. This is not Temper code — it is the kernel feature Temper is built on. The consequence is structural: the worst case is the scheduler you already run today, per node, never per cluster.

Measured failover

We force-killed the node agent mid-benchmark, under load:

Momentmemcached p99
Before the kill (Temper attached)0.607 ms
During the kill (kernel reverts to CFS)0.639 ms
After recovery (agent re-attached)back to baseline in seconds

A 0.03 ms blip, no blackout, replacement agent (it is a DaemonSet) Ready in 15 s. An 8-hour soak ran clean. Full data on the benchmarks page.

The kill switch

Fleet-wide rollback is one annotation — no helm operation, no control-plane dependency, honored directly by each node’s agent:

# stand the scheduler down everywhere; pods run stock CFS
kubectl annotate node --all infera.io/safe-mode-requested=true

# re-engage
kubectl annotate node --all infera.io/safe-mode-requested-

Safe mode can also be targeted at single nodes, toggled from the dashboard (audit-logged), or driven by the optional controller via an InferaPolicy resource. Entering safe mode always succeeds — it kills the scheduler; exit re-generates config and re-attaches.

Reconfiguration churn cost

When QoS assignments change on a node (a pod joins or leaves a tier), the agent regenerates the scheduler configuration and restarts the kernel scheduler. The measured cost is a ~52 ms window per reconfiguration during which the node runs stock CFS — node-local, bounded, and in the same safe failure direction as everything else here: absence of benefit, not harm. Pod churn is debounced and batched so a busy node does not thrash.

The cpu.max disclosure

Stated plainly, because it is the one behavioral difference you must know: while Temper’s scheduler is attached, cgroup cpu.max CPU quotas are not enforced by the kernel. This is a property of sched_ext scheduling, not a Temper choice. Containment of greedy workloads comes from Temper’s layer ceilings instead — which is what the benchmarks exercise — and quota-derived layer ceilings are on the roadmap to close the semantic gap. Two mitigations are unconditional: memory limits are unaffected (only CPU quota semantics change), and the kill switch restores CFS with quotas instantly. If strict CPU quota enforcement is a compliance requirement for a node, do not attach Temper to that node — mixed fleets are fully supported.

Privileged DaemonSet posture

Loading a kernel scheduler requires privileged + hostPID and /sys access — the standard posture of node agents like Falco or Datadog. What bounds it: the agent serves only in-cluster endpoints, executes no remote code, makes zero external calls, and writes only its own scheduler process and node annotations. Every permission is justified line by line in the security whitepaper. Full posture, supply chain, and disclosure policy: security & trust.