Architecture: three layers, each consuming only the one below

Temper is deliberately layered so you can install any depth: the node engine alone, the engine plus placement intelligence, or the full platform with the management plane. Each layer consumes only the public interface of the layer below it.

The layer map

L2 · MANAGEMENT PLANE explorer · actions · savings · thread-aware rightsizer L1 · CLUSTER INTELLIGENCE density-aware scheduler plugin · overcommit webhook KERNEL BOUNDARY L0 · NODE ENFORCEMENT scx_layered (sched_ext) — BPF scheduler in the kernel fence critical layers · loan idle cycles · preempt-kick on wake CFS · FAIL-SAFE kill the agent → node reverts to the stock scheduler — measured, no blackout

L0 — node enforcement engine

A privileged DaemonSet runs one node agent per node. The agent watches the pods scheduled locally, derives a QoS tier for each from its PriorityClass and resource requests, discovers each pod’s cgroup, and generates a resource-aware configuration for scx_layered — a Linux sched_ext scheduler that runs inside the kernel’s CPU scheduling path. When QoS assignments change (a pod arrives, leaves, or changes tier), the agent regenerates the configuration and reloads the scheduler.

L0 also carries the always-on observation layer: per-pod and per-thread placement telemetry, runqueue statistics, PSI pressure monitoring, a placement linter, and on-demand kernel trace capture — all at under 1% CPU overhead (measured 0.13% of one core).

Inputs are pure Kubernetes: pod.spec.priority, container resource specs, and infera.io/* annotations. Outputs are node annotations (per-tier pod counts and CPU totals), Prometheus metrics, and a local JSON endpoint. No CRDs are required for enforcement.

L1 — cluster intelligence (optional, off by default)

Two components, individually helm-gated:

Details and measured results: density & overcommit.

L2 — management plane (optional)

The dashboard: hierarchy explorer, live logs and manifests, perf panels, audit-logged actions, RBAC, the realized-vs-identified savings view, the thread-aware rightsizer, and a multi-cluster hub. It consumes the same public interfaces the CLI does — agent gRPC, node annotations, and metrics — and exposes a versioned REST API (/api/v1) that the built-in UI itself uses. Tour: management plane.

What talks to what

ComponentRuns asReadsWrites
Node agent (L0)Privileged DaemonSetLocal pod state (kube API), cgroup v2 stats, PSI, sched_ext sysfsKernel scheduler config, node annotations, /metrics, /observe
Scheduler plugin (L1)Deployment (kube-scheduler extension)Node annotations from L0, pod prioritiesPlacement decisions
Overcommit webhook (L1)Deployment (mutating admission)Namespace opt-in labels, pod specsScaled CPU requests + reversal annotations
Controller (L1)Deployment (optional)InferaPolicy resource (safe mode)infera.io/safe-mode-requested node annotations
Dashboard (L2)DeploymentKube API, agent gRPC, metricsAudit-logged actions via kube API

Safe mode illustrates the philosophy: the controller signals it with a node annotation, not an RPC — the agent honors it without any control-plane round trip, so the kill switch works even when everything above L0 is down.

Deployment model: everything in-cluster

All components install from one helm chart into your cluster. There is no SaaS control plane, no telemetry beacon, no account — the product makes zero external calls and runs air-gapped because there is nothing to gap. The failure direction is equally structural: any component you kill degrades to stock Kubernetes behavior, per node, never per cluster. Posture details: security.