The fleet management plane
One in-cluster dashboard for seeing what the platform is doing and acting on it: explorer, logs, manifests, performance, savings, and audit-logged operations — with a public API underneath everything the UI shows.
infera); the commands below are what works today. A rename migration is planned.Enabling it
helm upgrade infera deploy/helm/infera --reuse-values \
--set dashboard.enabled=true \
--set dashboard.image.repository=$REGISTRY/infera-dashboard \
--set-string dashboard.image.tag=$TAG
TOKEN=$(kubectl -n infera get secret infera-dashboard-token -o jsonpath='{.data.token}' | base64 -d)
kubectl -n infera port-forward svc/infera-dashboard 8090:8090
Hierarchy explorer
The spine of the UI: cluster → node → pod → container, covering every Kubernetes workload kind (Deployments, StatefulSets, DaemonSets, Jobs, and the rest), with live updates streamed over SSE. Each pod page brings together what normally takes four terminal windows:
- Live logs — tail and follow, per container.
- Manifests — current YAML plus revision diffs, so you can see what changed between rollouts.
- Perf panels — CPU, runqueue, PSI pressure, throttling, memory, and network timeseries.
- Scheduling detail — the pod’s QoS tier, which kernel layer it landed in, and current placement.
- Events — the Kubernetes event stream scoped to the object.
Savings: realized vs. identified
The savings view enforces a two-number discipline instead of one inflatable figure:
| Number | What it counts | Nature |
|---|---|---|
| Realized | Measured background reclaim actually delivered, plus CPU requests actually freed by the overcommit webhook | Measured fact |
| Identified | Rightsizing slack (declared requests vs. 6-hour measured usage) plus idle headroom | Estimated opportunity |
Both are priced per machine type from list-price tables, with per-type overrides for your negotiated rates. The split exists so the instrument stays honest: realized is what happened, identified is what could. (Invoice reconciliation is on the roadmap; today the pricing is list-price based.) This is also why Temper is never priced as a percentage of savings — tying revenue to a number we compute would corrupt it.
Actions — audit-logged, role-gated
Operator actions run through the dashboard with an audit trail: cordon / uncordon / drain (PDB-aware), rollout restart, scale, pod delete, safe-mode toggle, overcommit toggle, scheduler-config view with generation diffs, and kernel trace download. Every action records who, what, and when; the audit log exports as CSV or JSON.
Multi-user auth & RBAC
| Role | Can |
|---|---|
| viewer | Read everything: explorer, perf, savings, audit log |
| operator | Viewer + run actions (cordon, drain, restart, scale, safe mode…) |
| admin | Operator + manage tokens and configuration |
Authentication uses named tokens (so the audit log names humans, not shared secrets), with a
whoami endpoint for verification. An OIDC design is published on the roadmap.
Multi-cluster hub
Any dashboard instance can front the fleet through a peer registry: register your other clusters’ dashboards as peers and a cluster switcher appears in the nav. Two deliberate properties:
- Per-cluster data planes stay self-contained — each cluster keeps its own telemetry and storage, which keeps the model air-gap friendly.
- Cross-cluster writes are blocked by design — actions execute only on the cluster whose dashboard you are on, so audit locality is preserved.
Integrations & API
A Prometheus /metrics exporter and generated Grafana dashboards cover the
metrics-stack path (observability). Everything the UI shows
comes from a versioned REST API (/api/v1, OpenAPI-described) — the built-in
UI is a first-class client of the same public API, so anything it can do, your tooling can.