Writing
Notes on agentic systems, retrieval, and shipping ML to production. Some plain, some deep.
May 2026
Anthropic's natural language autoencoders translate a model's internal activations into plain English by making language the bottleneck of an autoencoder. How it trains with GRPO, how steering proves the explanations are causal, and what it reveals about what a model knows but does not say.