Not Lain

Hugging Face community blogger. Author of introductory articles on transformer internals and inference optimisation.

Appearances in this wiki

Mastering Tensor Dimensions in Transformers — Author; explains tensor shape propagation through a decoder-only transformer.
KV Caching Explained — Author; explains KV caching mechanics and its speedup benefits for autoregressive inference.