Eduardo Alvarez

NVIDIA engineer. Published on the NVIDIA Developer Blog on KV cache quantisation and inference optimisation for Blackwell GPUs.

Appearances in this wiki

Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache — Author; introduces NVFP4 KV cache quantisation and its latency/throughput benefits on Blackwell.