The Pivot to “Inference Sovereignty” NVIDIA is shifting focus from raw training power to deterministic inference to solve the “Stochastic Wall”—the unpredictable latency jitter in current GPUs that hampers real-time AI agents.
Feynman Architecture (1.6nm): Utilizing TSMC’s A16 node with Backside Power Delivery (Super Power Rail) to achieve a projected 100x efficiency gain over Blackwell.
LPX Cores: Integration of Groq-derived deterministic logic to provide guaranteed p95 latency for “Chain of Thought” reasoning. ** Storage Next: **Collaboration on 100M IOPS SSDs that function as a peer to GPU memory, eliminating the “Memory Wall” for million-token contexts.
**Vertical Fusion: **3D logic-on-logic stacking that places SRAM-rich chiplets directly over compute dies to minimize token-generation energy costs.
**Supply Chain: **Rumors of a strategic shift to Intel Foundry (18A) for I/O sourcing to diversify away from total TSMC reliance.
