Google Ironwood TPU

Google has introduced Ironwood, its seventh-generation Tensor Processing Unit (TPU) — the first designed specifically for inference workloads. Announced at Google Cloud Next 25, Ironwood represents a significant shift toward the “age of inference” where AI models provide proactive insights rather than just responsive information.

Key Specifications

Peak Compute: 4,614 TFLOPs per chip
Memory: 192 GB HBM per chip (6x increase from Trillium)
Bandwidth: 7.37 TB/s HBM bandwidth (4.5x increase from Trillium)
Scale: Up to 9,216 chips per pod for 42.5 Exaflops total compute
Power Efficiency: 2x improvement over Trillium, 30x over first Cloud TPU

The Age of Inference

Ironwood is purpose-built for “thinking models” that require massive parallel processing and efficient memory access:

Large Language Models (LLMs): Advanced reasoning and generation capabilities
Mixture of Experts (MoEs): Efficient scaling through specialized model components
Advanced Reasoning Tasks: Complex problem-solving and inference workloads

Key Innovations

Enhanced SparseCore

Specialized accelerator for processing ultra-large embeddings
Expanded support for ranking and recommendation workloads
Extends beyond traditional AI to financial and scientific domains

Inter-Chip Interconnect (ICI)

Bandwidth: 1.2 TBps bidirectional (1.5x improvement over Trillium)
Low-latency, high-bandwidth networking for coordinated communication
Supports synchronous communication at full TPU pod scale

Liquid Cooling

Advanced cooling solutions for sustained performance
Up to twice the performance of standard air cooling
Reliable operation under continuous, heavy AI workloads

Scale and Performance

Pod Configurations

256-chip configuration: For standard AI workload demands
9,216-chip configuration: For the most demanding workloads
42.5 Exaflops total: More than 24x the compute power of the world’s largest supercomputer (El Capitan at 1.7 Exaflops)

Pathways Integration

Google’s ML runtime developed by Google DeepMind
Enables efficient distributed computing across multiple TPU chips
Supports composition of hundreds of thousands of Ironwood chips

Impact on AI Development

Ironwood enables the transition from:

Responsive AI: Real-time information for human interpretation
Proactive AI: Generation of insights and interpretation

Leading models like Gemini 2.5 and AlphaFold run on TPUs today, and Ironwood will power the next generation of AI breakthroughs.

Availability

Ironwood will be available to Google Cloud customers later in 2025, representing a new era in AI infrastructure for inference workloads.

Reference: Google Cloud Blog - Ironwood TPU