Deep observability & thread-aware rightsizing

The node agent watches scheduling at the same depth it enforces it: per-pod, per-thread placement and runqueue telemetry, always on, at a measured 0.13% of one core.

Naming note. Binaries, the helm chart, and annotation keys currently ship under the project’s former name (infera); the commands below are what works today. A rename migration is planned.

/metrics — Prometheus

Every agent serves a Prometheus endpoint on the node (port 9100 by default) with QoS and scheduler metrics: per-tier assignment counts, config generation, safe-mode state, PSI-derived contention signals, and placement-linter verdicts. A ServiceMonitor is helm-gated for Prometheus Operator setups:

helm upgrade infera deploy/helm/infera --reuse-values \
  --set serviceMonitor.enabled=true

Generated Grafana dashboards ship with the chart, so the fleet view lands in your existing metrics stack without hand-building panels.

/observe — the thread-level snapshot

Alongside metrics, the agent publishes a JSON observation snapshot at GET /observe on the same port: the machine shape, per-layer scheduler statistics, the top busy threads with their placement, PSI readings, linter verdicts, and the observation layer’s own overhead. It is the raw material for profile training and the quickest way to answer “what is the scheduler actually doing on this node?”

kubectl -n infera port-forward ds/infera-agent 9100:9100
curl -s localhost:9100/observe | jq '.layers[] | {name, cpus, util}'

The placement linter

An always-on invariant checker audits actual thread placement against what the configuration promises, and exports violations as metrics (infera_lint_violation{invariant=...}) with rate-limited log warnings:

Invariant	Catches
`smt_collision`	Protected threads sharing a physical core with noisy siblings
`protected_fallback`	Protected-tier threads running outside their fenced layer
`open_reserve`	Open-layer work violating the reserved headroom
`layer_mismatch`	Threads attributed to a different layer than their pod’s tier implies

Zero violations is the steady state; a persistent violation is an alertable signal that configuration and reality have drifted.

Kernel trace capture

For the questions metrics cannot answer, the agent captures bounded perfetto kernel traces on demand — scheduling events straight from the kernel, downloadable and openable in the perfetto UI. Captures are duration- and size-capped, one at a time per node (concurrent requests are refused, not queued), and triggered via the CLI, the agent RPC (CaptureTrace), or the dashboard. Nothing extra is installed on nodes — the tracer ships in the agent image.

Thread-aware rightsizing

Container-average rightsizers recommend requests from a pod-level CPU mean — which is blind to structure. A pod averaging 1.2 cores might be four lazy threads (fine at 1.5 cores, shared) or one hot thread pinned at 100% plus overhead (needs an exclusive core; throttling it is a latency incident). Temper’s rightsizer reads the thread-level usage the observation layer already collects, so its recommendations distinguish those cases. The identified-savings number in the dashboard is computed from the same data (declared requests vs. 6-hour measured usage).

Overhead — measured, not promised

The entire always-on observation layer — placement sampling, schedstat deltas, scheduler stats, the linter — costs under 1% CPU, measured at 0.13% of one core in production configuration. Perfetto trace bursts are bounded and only run when you ask for them or during training cycles.