AI Inference at Scale: From KV Cache to TurboQuant — Making Your Models 10× Faster
You own the model (AI 11). Now make it fast and cheap. Covers GPU memory anatomy, KV Cache mechanics, model quantization (GPTQ/AWQ/GGUF), KV Cache compression (GQA, TurboQuant), speculative decoding, serving infrastructure (vLLM, TensorRT-LLM), and hardware selection — the complete inference optimization playbook.