Architecture: three layers, each consuming only the one below
Temper is deliberately layered so you can install any depth: the node engine alone, the engine plus placement intelligence, or the full platform with the management plane. Each layer consumes only the public interface of the layer below it.
The layer map
L0 — node enforcement engine
A privileged DaemonSet runs one node agent per node. The agent watches the pods
scheduled locally, derives a QoS tier for each from its
PriorityClass and resource requests, discovers each pod’s cgroup, and generates a
resource-aware configuration for scx_layered — a Linux
sched_ext scheduler that runs inside the kernel’s CPU scheduling
path. When QoS assignments change (a pod arrives, leaves, or changes tier), the agent
regenerates the configuration and reloads the scheduler.
L0 also carries the always-on observation layer: per-pod and per-thread placement telemetry, runqueue statistics, PSI pressure monitoring, a placement linter, and on-demand kernel trace capture — all at under 1% CPU overhead (measured 0.13% of one core).
Inputs are pure Kubernetes: pod.spec.priority, container resource specs, and
infera.io/* annotations. Outputs are node annotations (per-tier pod counts and
CPU totals), Prometheus metrics, and a local JSON endpoint. No CRDs are required for
enforcement.
L1 — cluster intelligence (optional, off by default)
Two components, individually helm-gated:
- A density-aware kube-scheduler plugin that reads L0’s node annotations to place pods where protection capacity exists. Disable it and keep Karpenter, Cast AI, or the stock scheduler in charge of placement — L0 works under any placer.
- An overcommit admission webhook that, in namespaces you opt in, scales down the CPU requests of non-critical pods so the bin-packer fits more of them — never limits, never the Critical tier, always annotated with the original value.
Details and measured results: density & overcommit.
L2 — management plane (optional)
The dashboard: hierarchy explorer, live logs and manifests, perf panels, audit-logged
actions, RBAC, the realized-vs-identified savings view, the thread-aware rightsizer, and a
multi-cluster hub. It consumes the same public interfaces the CLI does — agent gRPC, node
annotations, and metrics — and exposes a versioned REST API (/api/v1) that
the built-in UI itself uses. Tour: management plane.
What talks to what
| Component | Runs as | Reads | Writes |
|---|---|---|---|
| Node agent (L0) | Privileged DaemonSet | Local pod state (kube API), cgroup v2 stats, PSI, sched_ext sysfs | Kernel scheduler config, node annotations, /metrics, /observe |
| Scheduler plugin (L1) | Deployment (kube-scheduler extension) | Node annotations from L0, pod priorities | Placement decisions |
| Overcommit webhook (L1) | Deployment (mutating admission) | Namespace opt-in labels, pod specs | Scaled CPU requests + reversal annotations |
| Controller (L1) | Deployment (optional) | InferaPolicy resource (safe mode) | infera.io/safe-mode-requested node annotations |
| Dashboard (L2) | Deployment | Kube API, agent gRPC, metrics | Audit-logged actions via kube API |
Safe mode illustrates the philosophy: the controller signals it with a node annotation, not an RPC — the agent honors it without any control-plane round trip, so the kill switch works even when everything above L0 is down.
Deployment model: everything in-cluster
All components install from one helm chart into your cluster. There is no SaaS control plane, no telemetry beacon, no account — the product makes zero external calls and runs air-gapped because there is nothing to gap. The failure direction is equally structural: any component you kill degrades to stock Kubernetes behavior, per node, never per cluster. Posture details: security.