Blog

Field notes on cutting inference costs.

What I find under the hood of AI startups — vector search, embeddings, GPU vs CPU, caching, and model routing. Written for the engineers who have to fix it.

May 29, 2026 · 13 min read

The 6 most expensive RAG infrastructure mistakes I see in AI startups

Six recoverable inference-cost mistakes I see in nearly every AI startup's stack, what each looks like from the inside, and roughly what it costs.
Read →

Field notes on cutting inference costs.

The 6 most expensive RAG infrastructure mistakes I see in AI startups

Send a message