A Google paper just vaporized $40 billion in chip value
On March 24, Google published a research paper about compressing AI memory. By March 26, Samsung had lost 5%, SK Hynix dropped 6%, and Micron shed roughly 20% over five days. Billions in market cap, gone because a team of researchers figured out how to squeeze the same AI performance into one-sixth the memory.
The technique is called TurboQuant. It compresses the key-value cache that large language models use during inference down to 3 bits per value, from the standard 16. No retraining required. No accuracy loss. Google plans to present it at ICLR in April, but the market didn't wait for peer review.
I've been watching this space closely, and what happened the week of March 23 tells a story that most people are reading wrong.
The algorithm ate the hardware
TurboQuant uses two methods called PolarQuant and QJL to compress the cache that models build up while processing your prompts. That cache is the main bottleneck when running large models. The more cache you can hold in memory, the longer the context window, the more useful the model. Hardware companies have been selling the answer: more memory chips. Faster memory chips. HBM3E, HBM4, stacked higher and priced accordingly.
Then a paper showed you could get the same result with math.
This is the part that should make you uncomfortable if you're long on memory chip stocks. The fix isn't hardware. It's a compression algorithm. And compression algorithms don't have fab lead times or billion-dollar manufacturing plants. They replicate at the speed of git clone.
Meta's $135 billion reality check
That same week, we learned that Meta delayed its next flagship AI model, codenamed Avocado, from March to at least May. Internal benchmarks showed it lagging behind Gemini 3.0, OpenAI, and Anthropic in reasoning, coding, and agentic tasks.
Here's the part that got me: Meta's AI leadership reportedly discussed licensing Google's Gemini as a stopgap. The company planning $135 billion in AI capital expenditure for 2026 is talking about renting someone else's model because theirs doesn't work well enough.
Let that land for a second. $135 billion buys you a lot of GPUs. It buys you custom silicon, enormous data centers, and enough electricity to power a small country. What it apparently doesn't buy you is a model that can reason as well as the competition.
Across the industry, the big five cloud providers have committed somewhere between $660 billion and $690 billion to AI capex in 2026. That's a 70% increase from earlier projections. The assumption baked into those numbers is that more compute equals better models. TurboQuant and Avocado, in the same week, suggest that assumption has a shelf life.
The Jevons twist
Here's where it gets complicated. There's an 18th-century economist named William Stanley Jevons who observed that when coal engines became more efficient, total coal consumption went up, not down. The efficiency made coal cheaper per unit of work, so people found more uses for it.
The AI version of this is already playing out. When DeepSeek showed you could train competitive models for less, the response wasn't "great, we'll spend less." It was "great, we'll train more models." AMD's CEO said as much at Semicon China in March.
So will TurboQuant actually reduce demand for memory chips? Maybe not long-term. If running inference costs one-sixth the memory, companies will run six times more inference. Or they'll stuff six times more context into the same hardware. Or they'll deploy models to places that couldn't afford them before. The Jevons paradox suggests efficiency rarely reduces total consumption.
But that's a long-term argument. Short-term, what just happened is that a research paper repositioned the value in the AI stack. The expensive thing isn't the silicon. It's the algorithm running on it. And algorithms improve on a different curve than hardware.
Where the value actually lives
Training costs have dropped 99% in 18 months. Processing a billion tokens went from $36,000 in 2022 to $250 in late 2024. That's not Moore's Law. Hardware doesn't move that fast. That's algorithmic efficiency stacking on top of hardware improvements, at roughly 5x per year.
The companies that matter in this next phase aren't the ones spending the most. They're the ones whose researchers can publish a paper on Tuesday and restructure an entire sector's economics by Thursday. Anthropic won a federal court battle that same week while reportedly leaking details of their most powerful model yet. Their capex is a fraction of Meta's. Their models are beating Meta's.
Money can buy compute. It cannot buy the insight that turns 16 bits into 3 without losing accuracy.
What to watch
- Algorithmic efficiency is compounding faster than hardware. 5x per year versus Moore's Law's 2x every two years.
- Capex alone doesn't build moats. Meta's $135B is proof. If your model can't reason, more GPUs won't fix that.
- Jevons paradox will probably hold. Efficiency won't kill chip demand long-term, but it will shift who captures the value. Algorithm-heavy companies pull ahead of hardware-heavy ones.
- If TurboQuant-style compression gets baked into standard inference frameworks, the memory-per-parameter ratio changes permanently. That reprices every infrastructure investment made under the old assumptions.
The week of March 23 — when a research paper moved more market value than most quarterly earnings reports — is the week the AI industry's center of gravity shifted. From "who has the most GPUs" to "who has the best ideas for using them."
References: