AI Accelerator Architectures

Hardware accelerator architectures for ML inference workloads — GPU microarchitecture, custom ASICs, NPUs, interconnect topology, and the systems-level design choices that determine throughput, latency, and cost-per-token in production.