The fleet management plane

One in-cluster dashboard for seeing what the platform is doing and acting on it: explorer, logs, manifests, performance, savings, and audit-logged operations — with a public API underneath everything the UI shows.

Naming note. Binaries, the helm chart, and annotation keys currently ship under the project’s former name (infera); the commands below are what works today. A rename migration is planned.

Enabling it

helm upgrade infera deploy/helm/infera --reuse-values \
  --set dashboard.enabled=true \
  --set dashboard.image.repository=$REGISTRY/infera-dashboard \
  --set-string dashboard.image.tag=$TAG

TOKEN=$(kubectl -n infera get secret infera-dashboard-token -o jsonpath='{.data.token}' | base64 -d)
kubectl -n infera port-forward svc/infera-dashboard 8090:8090

Hierarchy explorer

The spine of the UI: cluster → node → pod → container, covering every Kubernetes workload kind (Deployments, StatefulSets, DaemonSets, Jobs, and the rest), with live updates streamed over SSE. Each pod page brings together what normally takes four terminal windows:

Live logs — tail and follow, per container.
Manifests — current YAML plus revision diffs, so you can see what changed between rollouts.
Perf panels — CPU, runqueue, PSI pressure, throttling, memory, and network timeseries.
Scheduling detail — the pod’s QoS tier, which kernel layer it landed in, and current placement.
Events — the Kubernetes event stream scoped to the object.

Savings: realized vs. identified

The savings view enforces a two-number discipline instead of one inflatable figure:

Number	What it counts	Nature
Realized	Measured background reclaim actually delivered, plus CPU requests actually freed by the overcommit webhook	Measured fact
Identified	Rightsizing slack (declared requests vs. 6-hour measured usage) plus idle headroom	Estimated opportunity

Both are priced per machine type from list-price tables, with per-type overrides for your negotiated rates. The split exists so the instrument stays honest: realized is what happened, identified is what could. (Invoice reconciliation is on the roadmap; today the pricing is list-price based.) This is also why Temper is never priced as a percentage of savings — tying revenue to a number we compute would corrupt it.

Actions — audit-logged, role-gated

Operator actions run through the dashboard with an audit trail: cordon / uncordon / drain (PDB-aware), rollout restart, scale, pod delete, safe-mode toggle, overcommit toggle, scheduler-config view with generation diffs, and kernel trace download. Every action records who, what, and when; the audit log exports as CSV or JSON.

Multi-user auth & RBAC

Role	Can
viewer	Read everything: explorer, perf, savings, audit log
operator	Viewer + run actions (cordon, drain, restart, scale, safe mode…)
admin	Operator + manage tokens and configuration

Authentication uses named tokens (so the audit log names humans, not shared secrets), with a whoami endpoint for verification. An OIDC design is published on the roadmap.

Multi-cluster hub

Any dashboard instance can front the fleet through a peer registry: register your other clusters’ dashboards as peers and a cluster switcher appears in the nav. Two deliberate properties:

Per-cluster data planes stay self-contained — each cluster keeps its own telemetry and storage, which keeps the model air-gap friendly.
Cross-cluster writes are blocked by design — actions execute only on the cluster whose dashboard you are on, so audit locality is preserved.

Integrations & API

A Prometheus /metrics exporter and generated Grafana dashboards cover the metrics-stack path (observability). Everything the UI shows comes from a versioned REST API (/api/v1, OpenAPI-described) — the built-in UI is a first-class client of the same public API, so anything it can do, your tooling can.