Google unveils Ironwood at Cloud Next 2026: 7th-gen TPU built for inference, 10x v5p, Anthropic commits to 1M chips, aipedia.wiki News

Google unveiled Ironwood, its seventh-generation TPU, at Cloud Next 2026 on April 22, 2026. It is the first Google TPU architected specifically for inference rather than training.

Generational context: Ironwood (TPU v7) is GA today. One day later on Cloud Next Day 2, Google revealed the 8th-gen TPU 8t (training) and TPU 8i (inference) as the follow-on. See TPU 8t/8i coverage. Ironwood remains the production inference chip; the 8th-gen ships progressively through 2026-2027.

The chip

10x peak performance of TPU v5p.
192 GB HBM3E per chip.
7.2 TB/s memory bandwidth per chip.
9,216 liquid-cooled chips per superpod.
42.5 FP8 exaflops per superpod.
Generally available to Google Cloud customers today.

Google says it plans to produce millions of Ironwood units in 2026. The chip is aimed at the “serving economy” that now dominates AI spend: frontier models are trained once but served billions of times.

Anthropic as anchor customer

Anthropic committed to up to 1 million Ironwood TPUs for Claude serving. That sits alongside the separate 5 GW AWS Trainium commitment announced two days earlier, giving Anthropic a genuinely multi-cloud serving footprint.

For Claude users this is straight capacity insurance. Rate limits and queue times on Opus 4.7, Sonnet 4.6, and the freshly-launched Claude Design product should ease as Ironwood capacity comes online.

Why an inference-first TPU

Training-optimized silicon maximizes raw FLOPs. Inference silicon maximizes tokens per dollar and latency per query. Different design trade-offs:

Memory bandwidth matters more than peak compute (attention layers are bandwidth-bound).
Batch-1 latency matters more than peak batch throughput.
Power efficiency under sustained inference matters more than peak draw under training bursts.

Ironwood’s 192 GB of HBM3E per chip and 7.2 TB/s bandwidth directly target the bandwidth bottleneck. The design is a structural response to where 2026 AI compute spend actually sits: inference, not training.

Gemini impact

Gemini 3.1 Pro and downstream Gemini-powered products (Gemini app, AI Overviews in Search, Gemini in Workspace) are expected to migrate serving onto Ironwood progressively through 2026. Google did not disclose migration timelines at keynote.

The price-performance implication: Google can push Gemini API pricing lower (especially on long-context workloads) without margin damage. Expect downward pricing pressure on Gemini Flash and Pro tiers once Ironwood saturates.

Ecosystem context

Ironwood ships as the centerpiece of Google’s broader four-partner custom-silicon strategy involving Broadcom, MediaTek, Marvell, and TSMC. That coalition extends Google’s silicon roadmap to TPU v8 on TSMC 2nm in late 2027, with MediaTek’s “Zebrafish” targeting 20-30% lower inference cost than Ironwood.

One day after the Ironwood GA announcement, Google’s Cloud Next 2026 Day 2 keynote unveiled the 8th-generation TPU family: TPU 8t for training and TPU 8i for inference. TPU 8t delivers 3x Ironwood’s processing power per superpod; TPU 8i triples on-chip SRAM for agent-serving workloads. Ironwood stays GA today and remains the production inference TPU. The 8th-gen is a roadmap reveal with staged availability through 2026-2027.

Sources

Primary and corroborating references used for this news item.

3 cited sources

Share LinkedIn

Spotted an error or want to share your experience with Google unveils Ironwood at Cloud Next 2026: 7th-gen TPU built for inference, 10x v5p, Anthropic commits to 1M chips?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used Google unveils Ironwood at Cloud Next 2026: 7th-gen TPU built for inference, 10x v5p, Anthropic commits to 1M chips and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki