Security & Trust

Nothing phones home, because there is no home.

Temper is fully in-cluster: a helm-installed DaemonSet, a ConfigMap, node annotations, Prometheus metrics, and a local JSON endpoint. Zero external calls — no SaaS control plane, no telemetry beacon, no account to create. It runs air-gapped because there is nothing to gap. This page explains the posture, justifies the privilege it does need, and states plainly what is in progress.

01 Architecture posture

Everything stays inside your cluster.

This is the deployment model regulated and sovereign environments — defense, banking, healthcare, government clouds — require, and the one SaaS control planes structurally cannot offer.

P-1No egress, by architecture

The agent talks to the local kubelet-managed pod state, the node’s cgroup and sysfs trees, and the Kubernetes API of your cluster. Outputs are node annotations, /metrics, and /observe — all served locally. There is no endpoint of ours for it to call, so “disable telemetry” is not a setting; it is the design.

P-2Kubernetes-native inputs only

QoS tiers derive from pod.spec.priority (PriorityClasses) and container resource specs. No custom CRD is required for enforcement, no admission of our own objects into your API server beyond the optional L1 webhook — which is reversible and annotates every mutation it makes with the original value.

P-3Auditable surface

The node agent is a single binary driving scx_layered with a generated JSON config you can inspect at any time (GetLayeredConfig RPC, CLI, or dashboard). Management-plane actions are audit-logged. What the scheduler is doing is never a black box.

P-4Per-node blast radius

Degradation is per-node, never per-cluster. A node without sched_ext, or one placed in safe mode, runs stock CFS exactly as it would without Temper — absence of benefit, not harm. Fleet-wide rollback is a single node annotation, infera.io/safe-mode-requested, honored by the agent without any control-plane round trip.

02 The privileged DaemonSet

Yes, it is privileged. Here is exactly why.

We would rather justify the privilege precisely than pretend it away. This is the standard posture of every node security and observability agent — Falco, Datadog, eBPF tooling.

Loading a BPF scheduler into the kernel’s scheduling path requires privileged + hostPID + /sys mounts. That is the entire point of the product: enforcement below the layer userspace can reach. The privilege is used for three things — loading scx_layered, reading cgroup v2 statistics and PSI pressure files, and (during opt-in training captures) reading the node’s tracefs.

What bounds it: the agent creates no listening surface beyond its gRPC and metrics ports inside the cluster, executes no remote code, pulls no dynamic content, and writes only to its own scheduler process and node annotations. Kill it — deliberately or by crash — and the kernel instantly reverts to CFS. We measured that failover under load: p99 0.61 ms → 0.64 ms → 0.61 ms, replacement agent Ready in 15.0 s.

Environments that forbid privileged DaemonSets outright (GKE Autopilot) cannot run Temper, and we say so in the platform matrix rather than working around the policy.

# what the privilege is actually for privileged + hostPID load/attach the scx_layered BPF scheduler (sched_ext) /sys/kernel/sched_ext attach state; instant CFS revert on detach /sys/fs/cgroup (read) per-pod CPU stats + PSI pressure for contention detection /sys/kernel/tracing bounded, opt-in trace captures (training/canary only) # what it never does no external network calls - no phone home - no remote code no secrets access - no writes outside its own scheduler + annotations

03 Fail-safe story

The worst case is your current scheduler.

sched_ext’s kernel contract: if the BPF scheduler misbehaves, stalls, or detaches for any reason, the kernel ejects it and resumes CFS scheduling.

0.61 / 0.64 / 0.61 p99 (ms) before / during / after force-killing the agent under load — no blackout docs/training-artifacts/binpack/SAVINGS-REPORT.md

15.0 s replacement agent Ready and scheduler re-attached after the kill docs/training-artifacts/binpack/SAVINGS-REPORT.md

1 annotation fleet-wide rollback path: safe mode stands the scheduler down, pods run stock CFS infera.io/safe-mode-requested · docs/platform-matrix.md

Full fail-safe mechanics — the kernel revert contract, safe-mode semantics, churn cost, and the honest cpu.max disclosure — are documented in Safety & rollback.

04 Compliance & disclosure

In progress means in progress.

We publish status honestly rather than badge-collecting. If a claim on this page ever outruns reality, that is a bug — report it like one.

SOC 2 Type II: In progress. Audit engagement is underway; Type II reports will be available to customers and design partners under NDA once issued. Because the product ships no SaaS control plane and holds no customer data, the audit scope centers on our build and release pipeline rather than a data platform.
Signed images & SBOM: Release images are signed and ship with a Software Bill of Materials, including the pinned scx_layered version and the exact carry-patches applied at image build — the same patches are committed in the repository (docker/agent/patches/), so what runs on your nodes is diffable against source. Air-gap kits (Enterprise) bundle images, signatures, and SBOM for offline verification.
Vulnerability disclosure: Report suspected vulnerabilities to security@temper.codes. We commit to acknowledging reports within 2 business days and to coordinated disclosure. Please include reproduction steps; encrypted reporting details and a security.txt are published with the docs.
Data handling: The agent processes scheduling metadata only: pod identities, priorities, resource specs, cgroup statistics, and thread-level timing counters. It never reads workload payloads, environment variables, or secrets. All of it stays on the node or in your cluster’s API objects and metrics stack.