Allen Kuo (kwyshell)·6d agoSpeculative Decoding for Local LLMs Was a Mixed Bag. Then DFlash Landed in vLLMAn audit of MTP, DFlash, PFlash, and CPU-MoE offload across vLLM, llama.cpp, and Ollama on a desktop Blackwell.
Allen Kuo (kwyshell)·May 3Choosing a Real-Time Whisper EngineTesting CT2, TheWhisper, and whisper.cpp for live speech-to-text and why the pipeline mattered as much as the engine
Allen Kuo (kwyshell)·Apr 29Building My Own Spec-Driven WorkflowSpec-Driven Development · Agentic Engineering
Allen Kuo (kwyshell)·Apr 27Bringing Up a Secure Embedded Linux Platform on Arm Cortex ChipFrom BL1 to OP-TEE, U-Boot, Linux, NAND, recovery, and production delivery
Allen Kuo (kwyshell)·Apr 26GPU-Resident Frame Interpolation on AndroidRIFE, NCNN, Vulkan, and the Real Pipeline Problem
Allen Kuo (kwyshell)·Apr 26Driverless USB Proxy over SCSI and CD-ROM EmulationA historical note on turning a constrained CD-ROM path into a client/server redirect route
Allen Kuo (kwyshell)·Apr 18Qwen3.6–35B-A3B on Desktop Blackwell: The First Time vLLM Beats Ollama on DecodeThird in a series. Previous articles: Gemma 4 on vLLM vs Ollama · NVFP4 CompletionA response icon3A response icon3
Allen Kuo (kwyshell)·Apr 16Finishing What We Started: Gemma 4 NVFP4 on vLLM, Desktop Blackwell, WSL2A follow-up to Gemma 4 on vLLM vs Ollama: Benchmarks on a 96 GB Blackwell GPU.
Allen Kuo (kwyshell)·Apr 12當 Visual Studio 2026 偷走了我的 vcpkg一場從 VCPKG_ROOT 被悄悄換掉開始、最後挖到 CodeView 二進位層級的深度追查。以及我學到的一件事:Headers 才是 ABI 的契約,cl.exe 不是。