The future I imagine
Path to the full AI company
Last weekend Andrej Karpathy published autoresearch. 630 lines of Python. MIT license. 30,000+ stars in days.
The idea: give an AI agent a training script, a single GPU, and one rule — if the result improves, keep it; if not, revert. Loop forever.
Karpathy left it running for two days. ~700 experiments. ~20 real improvements. An 11% efficiency gain on a benchmark he thought was already well-tuned. The agent caught things he’d missed after twenty years of doing this work.
Then Shopify CEO Tobi Lütke applied the same pattern to Shopify’s Liquid template engine — a 20-year-old Ruby codebase. 120 automated experiments overnight. 53% faster, 61% fewer allocations. Zero test failures.
The pattern escaped the lab. It works on anything with a metric.
The pattern in four words
Loop. Measure. Keep. Repeat.
Define a metric the agent can’t game. Give it a constrained space. Let it run hundreds of iterations while you sleep. Improvements accumulate. Regressions get discarded. Baseline only moves forward.
Karpathy calls this a “ratchet.” It’s also the blueprint for an entire company.
The fully autonomous company
I imagine a fully autonomous company runs the ratchet on every function with a measurable objective. Humans do three things: set initial objectives, direction, define metrics, exercise judgment. Everything else loops.
R&D has agents running hundreds of experiments per day. Engineering runs continuous optimization against benchmark suites — exactly what Lütke demonstrated. Growth tests hundreds of copy and funnel variants daily — winners survive, losers revert. Operations iterates on pricing, allocation, forecasting against held-out actuals. Finance becomes a ratchet on capital efficiency.
The org chart inverts. Very few executors. Many metric designers and judgment callers.
The path: from chatbot to self-improving machine
No company jumps straight to full autonomy. But the progression is clear:
Stage 1 — Copilot. AI assists humans. You use Claude to draft emails and ChatGPT to summarize docs. This is where 95% of companies are today. No loop. No ratchet. No compounding.
Stage 2 — Agent. AI executes defined tasks end-to-end. Your CI pipeline runs AI-powered code review. Your marketing tool auto-generates variants. Humans still trigger, approve, and monitor everything. Loop exists, but a human is in it.
Stage 3 — Autonomous loop. AI runs the full cycle: hypothesize, execute, evaluate, keep or discard. Humans define the metric and the boundaries, then step back. This is autoresearch. This is the Liquid PR. The ratchet turns without you.
Stage 4 — Self-improving organization. Multiple autonomous loops running in parallel across departments, feeding results into each other. Engineering improvements make the product faster, which changes growth metrics, which shifts operational priorities. The loops interconnect. The company optimizes itself as a system.
Most companies are stuck between Stage 1 and 2. Karpathy just open-sourced Stage 3. Stage 4 is where the exponential gap opens — because every loop accelerates every other loop.
Evaluation design
Autoresearch works because Karpathy defined a clean, frozen, ungameable metric. The agent doesn’t need taste. It needs a clear signal and infinite patience.
Scale this to a company and the hard problem is obvious: can you define “better” for every function, precisely enough for a machine to pursue it, without gaming it?
Conversion rate is easy to measure, hard to freeze — an agent optimizing for clicks will produce dark patterns. Engineering performance has clean benchmarks but “code quality” doesn’t.
The winners of the next decade won’t have the best models. They’ll have the best evaluation design — metrics precise enough for machines to optimize and robust enough that optimization doesn’t destroy what you actually care about.
Goodhart’s Law at machine speed. The governance problem of the autonomous era.
Can you define “better” well enough to let a machine chase it while you sleep?
If yes, the ratchet turns. Indefinitely.
If no, you’re still a meat computer. And the gap is widening every night.
