Kernel-level QoS enforcement
Every pod on a Temper node belongs to one of five QoS tiers, derived from standard Kubernetes PriorityClasses and resource requests — no CRDs, no custom labels, no application changes. Tiers become kernel scheduler layers with real teeth.
infera); the commands below are what works today. A rename migration is planned.The five tiers
| Tier | PriorityClass | Priority value | Layer kind | Protected | Preempt | Exclusive cores | Timeslice |
|---|---|---|---|---|---|---|---|
| Critical | infera-critical | ≥ 1,000,000 | Confined | yes | yes | yes | 5 ms |
| High | infera-high | ≥ 100,000 | Grouped | yes | yes | no | 10 ms |
| Normal | infera-normal | ≥ 0 | Grouped | no | no | no | 20 ms |
| Low | infera-low | ≥ −100 | Open | no | no | no | 30 ms |
| Background | infera-background | < −100 | Open | no | no | no | 5 ms |
- Confined layers get their own CPU allocation, fenced. Grouped layers get a preferred CPU set they share. Open layers run on whatever is idle.
- Protected + preempt means the tier’s threads kick lower tiers off a CPU the moment they wake, instead of waiting in a runqueue.
- Background’s short 5 ms slice is a deliberate latency bound: outside the protected fence, the longest a higher tier can wait behind a background thread is one short timeslice.
The thresholds are configurable in the agent’s [priority_mapping] config
section; the values above are the shipped defaults, matching the PriorityClasses the helm
chart installs.
Assigning a tier
Assignment is one field in the pod spec:
spec:
priorityClassName: infera-critical
Because the input is pod.spec.priority, existing PriorityClasses you already
use for eviction ordering participate automatically — the same priority signal now also
means something at the CPU scheduler.
Defaults for unlabeled pods
Pods with no PriorityClass (priority 0) are tiered by their Kubernetes QoS class:
| Kubernetes QoS class | Default tier |
|---|---|
| Guaranteed / Burstable | Normal |
| BestEffort | Background |
So an untouched cluster behaves sensibly on day one: nothing is fenced, best-effort work yields, and enforcement sharpens only as you hand out PriorityClasses.
Resource-aware layer parameters
Tier membership decides the kind of layer; your pods’ actual resource specs decide its size. On every assignment change the agent recomputes, per tier:
- Weight — from the aggregate CPU requests of the tier’s pods, so a tier holding 6 requested cores outweighs one holding 2. Never hardcoded.
- CPU range — for Confined and Grouped layers, derived from aggregate requests and limits: how many CPUs the layer may occupy.
- Utilization band — from the mix of Kubernetes QoS classes in the tier: Guaranteed pods produce a tight band, Burstable a medium one, BestEffort none.
Two system layers are always appended: a small always-on layer that keeps the scheduler and agent threads serviced even under total saturation, and a catch-all Open layer for everything else (kubelet, node daemons). Empty layers are omitted. If the Critical tier requests more whole cores than the node can actually fence, Temper demotes that layer to Grouped instead of silently caging it — graceful degradation over false confinement (sizing guidance: operations).
Inspecting the result
The generated scheduler configuration is never a black box:
kubectl get node <node> -o jsonpath='{.metadata.annotations.infera\.io/qos-distribution}'
# per-tier pod counts and CPU millis, as published for the scheduler plugin
The full generated layer config is available via the agent’s gRPC
(GetLayeredConfig), the CLI, or the dashboard, which
also shows a diff across config generations.
Division of labor with profiles
Tiers arbitrate between workloads. To schedule the threads inside one workload differently — hot loops vs. I/O chains vs. housekeeping — use workload profiles on top of the tier system.