Microsoft has launched its second-generation custom silicon, the Maia 200, targeting the high costs of running large language models. Built on TSMC’s 3nm process with over 140 billion transistors, the chip is expressly designed to handle inference workloads rather than model training.
Key specs include 216GB of high-speed HBM3e memory with 7TB/s bandwidth. Microsoft states the chip offers 30% better performance-per-dollar than existing hardware and outperforms competitors like Amazon Trainium 3 and Google TPU v7 in specific FP4 and FP8 tasks.

The chips are already live in US data centers, powering GPT-5.2 and Microsoft 365 Copilot.

