Google Ironwood TPU

Google has introduced Ironwood, its seventh-generation Tensor Processing Unit (TPU) — the first designed specifically for inference workloads. Announced at Google Cloud Next 25, Ironwood represents a significant shift toward the “age of inference” where AI models provide proactive insights rather than just responsive information.

Key Specifications

  • Peak Compute: 4,614 TFLOPs per chip
  • Memory: 192 GB HBM per chip (6x increase from Trillium)
  • Bandwidth: 7.37 TB/s HBM bandwidth (4.5x increase from Trillium)
  • Scale: Up to 9,216 chips per pod for 42.5 Exaflops total compute
  • Power Efficiency: 2x improvement over Trillium, 30x over first Cloud TPU

The Age of Inference

Ironwood is purpose-built for “thinking models” that require massive parallel processing and efficient memory access:

  • Large Language Models (LLMs): Advanced reasoning and generation capabilities
  • Mixture of Experts (MoEs): Efficient scaling through specialized model components
  • Advanced Reasoning Tasks: Complex problem-solving and inference workloads

Key Innovations

Enhanced SparseCore

  • Specialized accelerator for processing ultra-large embeddings
  • Expanded support for ranking and recommendation workloads
  • Extends beyond traditional AI to financial and scientific domains

Inter-Chip Interconnect (ICI)

  • Bandwidth: 1.2 TBps bidirectional (1.5x improvement over Trillium)
  • Low-latency, high-bandwidth networking for coordinated communication
  • Supports synchronous communication at full TPU pod scale

Liquid Cooling

  • Advanced cooling solutions for sustained performance
  • Up to twice the performance of standard air cooling
  • Reliable operation under continuous, heavy AI workloads

Scale and Performance

Pod Configurations

  • 256-chip configuration: For standard AI workload demands
  • 9,216-chip configuration: For the most demanding workloads
  • 42.5 Exaflops total: More than 24x the compute power of the world’s largest supercomputer (El Capitan at 1.7 Exaflops)

Pathways Integration

  • Google’s ML runtime developed by Google DeepMind
  • Enables efficient distributed computing across multiple TPU chips
  • Supports composition of hundreds of thousands of Ironwood chips

Impact on AI Development

Ironwood enables the transition from:

  • Responsive AI: Real-time information for human interpretation
  • Proactive AI: Generation of insights and interpretation

Leading models like Gemini 2.5 and AlphaFold run on TPUs today, and Ironwood will power the next generation of AI breakthroughs.

Availability

Ironwood will be available to Google Cloud customers later in 2025, representing a new era in AI infrastructure for inference workloads.

Reference: Google Cloud Blog - Ironwood TPU