Eduardo Alvarez
NVIDIA engineer. Published on the NVIDIA Developer Blog on KV cache quantisation and inference optimisation for Blackwell GPUs.
Appearances in this wiki
- Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache — Author; introduces NVFP4 KV cache quantisation and its latency/throughput benefits on Blackwell.