Deep dives

Scenario write-ups over committed benchmark records.

Six technical articles, one per scenario a buyer actually cares about. Each groups the logically-related experiments, explains the mechanism before the numbers, and keeps every report’s caveats attached to its results — single-run arms are labeled, anomalies are published, and the two negatives we found are written up alongside the wins. Every figure traces to a committed report path in the repository; the benchmark harness itself is part of the product, so the runs are reproducible on your own cluster.

Mechanism first, numbers second CFS baseline in every comparison Caveats travel with the numbers Raw record paths in every footer

01 · capacity

Sideloading: harvesting idle capacity without paying for it in tail latency

Why clusters idle (requests ≠ usage), why CFS’s proportional weights cannot protect a wakeup, and the memcached/redis ladders where batch work filled protected nodes while the tail stayed flat.

flat @ 0.92p99 vs node util (CFS 3.1×) −88%p99, heavy point +72%pods at equal SLO

Read the deep dive → 02 · databases

CPU limits without the freeze-cliff

CFS quota enforcement freezes whole cgroups mid-transaction — proven with kernel throttle counters on PostgreSQL, MySQL, and Cassandra. Also the article that owns our cpu.max disclosure, in full, with the quota-parity measurement.

33–68→4–6 mspgbench p99 +199throttles/20s under CFS (0 under Temper*) 1.353 < 1.5cores consumed vs quota, measured

Read the deep dive → 03 · service chains

Tail amplification in service chains

A 19-service DeathStarBench application under co-located load — end-to-end tail growth under CFS vs. attached Temper, plus the full honest story of the scheduler eject this run surfaced and the v14 fix that closed it.

9.4×→~1.5×end-to-end p99 growth −75/−83%p99 at bg=2/4, attached 8→0watchdog ejects, v13→v14

Read the deep dive → 04 · accelerators

Keeping accelerators fed

CPU-side starvation of DataLoader threads idles the GPU you are billed for: the measured L4 wedge, the CPU-training result where Kubernetes’ own remedies were worse than nothing, and the honest negative on GPU-bound serving.

−25% vs flatCFS vs Temper on L4 at density +67%samples/s at density 8 paritysmall-model vLLM (published negative)

Read the deep dive → 05 · inside the pod

Scheduling inside the pod

Container averages hide thread structure — the SMT-stacking gap that pod-level metrics could not see, the exclusive-core profiles that closed it on ONNX and llama.cpp, and the MySQL profile validation including the bug it exposed.

44.5 = 44.5ONNX peak parity + dead flat +3.7% vs +27%llama drift under density −78%Cassandra p99 at idle, tier-only

Read the deep dive → 06 · operations

Failure and rollback engineering

The sched_ext kernel contract, the measured agent-kill failover, the 8-hour soak, reconfiguration churn cost, the watchdog eject as designed behavior, and the one-annotation kill switch — every fail-safe number in one place.

0.61/0.64/0.61p99 ms across an agent kill 8 h cleansoak: flat memory, no drift ~52 msCFS gap per reconfig

Read the deep dive →

Methodology, in one paragraph: each comparison runs the same workload, same nodes, same load generator in two arms — stock Kubernetes on CFS (obtained by putting Temper’s nodes in safe mode, so hardware and noise are held constant), then Temper attached. Density tests step a background-workload ladder and record where the primary’s SLO breaks in each arm. Where the mechanism can be verified with kernel counters instead of inferred from latency, it is. *throttle-counter zeros under Temper are a frozen counter, not pacing — explained in full in the CPU-limits deep dive