Architecture: three layers, each consuming only the one below

Temper is deliberately layered so you can install any depth: the node engine alone, the engine plus placement intelligence, or the full platform with the management plane. Each layer consumes only the public interface of the layer below it.

The layer map

L0 — node enforcement engine

A privileged DaemonSet runs one node agent per node. The agent watches the pods scheduled locally, derives a QoS tier for each from its PriorityClass and resource requests, discovers each pod’s cgroup, and generates a resource-aware configuration for scx_layered — a Linux sched_ext scheduler that runs inside the kernel’s CPU scheduling path. When QoS assignments change (a pod arrives, leaves, or changes tier), the agent regenerates the configuration and reloads the scheduler.

L0 also carries the always-on observation layer: per-pod and per-thread placement telemetry, runqueue statistics, PSI pressure monitoring, a placement linter, and on-demand kernel trace capture — all at under 1% CPU overhead (measured 0.13% of one core).

Inputs are pure Kubernetes: pod.spec.priority, container resource specs, and infera.io/* annotations. Outputs are node annotations (per-tier pod counts and CPU totals), Prometheus metrics, and a local JSON endpoint. No CRDs are required for enforcement.

L1 — cluster intelligence (optional, off by default)

Two components, individually helm-gated:

A density-aware kube-scheduler plugin that reads L0’s node annotations to place pods where protection capacity exists. Disable it and keep Karpenter, Cast AI, or the stock scheduler in charge of placement — L0 works under any placer.
An overcommit admission webhook that, in namespaces you opt in, scales down the CPU requests of non-critical pods so the bin-packer fits more of them — never limits, never the Critical tier, always annotated with the original value.

Details and measured results: density & overcommit.

L2 — management plane (optional)

The dashboard: hierarchy explorer, live logs and manifests, perf panels, audit-logged actions, RBAC, the realized-vs-identified savings view, the thread-aware rightsizer, and a multi-cluster hub. It consumes the same public interfaces the CLI does — agent gRPC, node annotations, and metrics — and exposes a versioned REST API (/api/v1) that the built-in UI itself uses. Tour: management plane.

What talks to what

Component	Runs as	Reads	Writes
Node agent (L0)	Privileged DaemonSet	Local pod state (kube API), cgroup v2 stats, PSI, sched_ext sysfs	Kernel scheduler config, node annotations, /metrics, /observe
Scheduler plugin (L1)	Deployment (kube-scheduler extension)	Node annotations from L0, pod priorities	Placement decisions
Overcommit webhook (L1)	Deployment (mutating admission)	Namespace opt-in labels, pod specs	Scaled CPU requests + reversal annotations
Controller (L1)	Deployment (optional)	`InferaPolicy` resource (safe mode)	`infera.io/safe-mode-requested` node annotations
Dashboard (L2)	Deployment	Kube API, agent gRPC, metrics	Audit-logged actions via kube API

Safe mode illustrates the philosophy: the controller signals it with a node annotation, not an RPC — the agent honors it without any control-plane round trip, so the kill switch works even when everything above L0 is down.

Deployment model: everything in-cluster

All components install from one helm chart into your cluster. There is no SaaS control plane, no telemetry beacon, no account — the product makes zero external calls and runs air-gapped because there is nothing to gap. The failure direction is equally structural: any component you kill degrades to stock Kubernetes behavior, per node, never per cluster. Posture details: security.