Not Lain
Hugging Face community blogger. Author of introductory articles on transformer internals and inference optimisation.
Appearances in this wiki
- Mastering Tensor Dimensions in Transformers — Author; explains tensor shape propagation through a decoder-only transformer.
- KV Caching Explained — Author; explains KV caching mechanics and its speedup benefits for autoregressive inference.