Google Ironwood TPU
Google has introduced Ironwood, its seventh-generation Tensor Processing Unit (TPU) — the first designed specifically for inference workloads. Announced at Google Cloud Next 25, Ironwood represents a significant shift toward the “age of inference” where AI models provide proactive insights rather than just responsive information.
Key Specifications
- Peak Compute: 4,614 TFLOPs per chip
- Memory: 192 GB HBM per chip (6x increase from Trillium)
- Bandwidth: 7.37 TB/s HBM bandwidth (4.5x increase from Trillium)
- Scale: Up to 9,216 chips per pod for 42.5 Exaflops total compute
- Power Efficiency: 2x improvement over Trillium, 30x over first Cloud TPU
The Age of Inference
Ironwood is purpose-built for “thinking models” that require massive parallel processing and efficient memory access:
- Large Language Models (LLMs): Advanced reasoning and generation capabilities
- Mixture of Experts (MoEs): Efficient scaling through specialized model components
- Advanced Reasoning Tasks: Complex problem-solving and inference workloads
Key Innovations
Enhanced SparseCore
- Specialized accelerator for processing ultra-large embeddings
- Expanded support for ranking and recommendation workloads
- Extends beyond traditional AI to financial and scientific domains
Inter-Chip Interconnect (ICI)
- Bandwidth: 1.2 TBps bidirectional (1.5x improvement over Trillium)
- Low-latency, high-bandwidth networking for coordinated communication
- Supports synchronous communication at full TPU pod scale
Liquid Cooling
- Advanced cooling solutions for sustained performance
- Up to twice the performance of standard air cooling
- Reliable operation under continuous, heavy AI workloads
Scale and Performance
Pod Configurations
- 256-chip configuration: For standard AI workload demands
- 9,216-chip configuration: For the most demanding workloads
- 42.5 Exaflops total: More than 24x the compute power of the world’s largest supercomputer (El Capitan at 1.7 Exaflops)
Pathways Integration
- Google’s ML runtime developed by Google DeepMind
- Enables efficient distributed computing across multiple TPU chips
- Supports composition of hundreds of thousands of Ironwood chips
Impact on AI Development
Ironwood enables the transition from:
- Responsive AI: Real-time information for human interpretation
- Proactive AI: Generation of insights and interpretation
Leading models like Gemini 2.5 and AlphaFold run on TPUs today, and Ironwood will power the next generation of AI breakthroughs.
Availability
Ironwood will be available to Google Cloud customers later in 2025, representing a new era in AI infrastructure for inference workloads.
Reference: Google Cloud Blog - Ironwood TPU