Blog

Field notes on cutting inference costs.

What I find under the hood of AI startups — vector search, embeddings, GPU vs CPU, caching, and model routing. Written for the engineers who have to fix it.

Send a message