What we shipped in LLMKube 0.7.9: a new mlx-server runtime for Apple Silicon, four bugs the autoscaling tutorial flushed out, and kubectl scale support
0.7.9 adds mlx-server as a first-class runtime on the metal-agent: an OpenAI-compatible MLX inference server you select with --runtime mlx-server. We dogfooded it serving Qwen3.6-35B-A3B-8bit to opencode on an M5 Max, fixed four real bugs that surfaced while building toward a metrics-driven autoscaling tutorial (a dead PodMonitor selector, the operator fighting the HPA, the Metal-path InferenceService never going Ready, and a skipped memory pre-flight), and landed a Kubernetes scale subresource so kubectl scale works on InferenceService. Here's what landed.