The Bitter Lesson and Hayek's Revenge

August 1, 2025 8-minute read

In Friedrich Hayek’s cornerstone “The Use of Knowledge in Society” he introduces the “man on the spot.”

If we can agree that the economic problem of society is mainly one of rapid adaptation to changes in the particular circumstances of time and place, it would seem to follow that the ultimate decisions must be left to the people who are familiar with these circumstances, who know directly of the relevant changes and of the resources immediately available to meet them.

But today, with powerful AI models and dominant centralized technology platforms, Hayek’s 1945 piece feels antiquated and unsatisfying.

On the other hand, a simple and modern recasting of Hayek – The Use of Compute in Society – will show that the scarce resource is no longer dispersed knowledge but the joules, milliseconds, and marginal dollars needed to turn context into action.

So what is Hayek’s Revenge?

The “man on the spot” is now cybernetic. Enter “compute on the spot."

Why Hayek feels unsatisfying today Link to heading

To compose rational economic order we need to aggregate all context (preferences, means, knowledge) and allocate resources accordingly. Hayek argued this problem is hard because knowledge necessary for this economic order is inherently dispersed among individuals and cannot be given to anyone in its totality.

Hayek saw prices as a critical abstraction, allowing economic actors to engage changes in supply or demand without having to know the underlying particulars. But with incredible computational power and modern concentration of data and technology among a few centralized platforms, Hayek’s position seems like cope.

Hayek believed a candidate central planner would principally operate in the realm of statistics and would struggle engaging our “man on the spot”:

Central planning based on statistical information by its nature cannot take direct account of these circumstances of time and place [so] that the central planner will have to find some way or other in which the decisions depending on them can be left to the “man on the spot.”

This clearly fails with language models – just add the relevant context to the prompt! Have you heard of context engineering?

💫 At this moment Hayek could still find salvation among the post-structuralists like Lyotard, who say reality is rather principally composed of tacit feelings that resist representation (in any telemetry). Even a million-token context window can’t ingest the ineffable vibe.

So with AI appearing to eviscerate Hayek’s principal objections to central planning, is it over? Should we all get ready for our centralized benevolent AI dictator?

Not yet.

Even if all the context could be assembled, we now face a different problem – the efficient conversion of energy to intelligence. Moving data to centralized compute and back costs time and energy. Running oversized models for simple tasks wastes energy.

Optimization of latency and economics makes way for a new Hayekian motivation.

Optimizing the conversion of energy to intelligence Link to heading

Intelligence – irrespective of AI – is itself contingent (e.g., Putnam “there is no God’s-eye view”). Contrary to common views in the Valley, intelligence is not a monolithic capability but rather a collection of context-dependent computational instances.

Context itself has context!

This brings us to Rich Sutton’s famous Bitter Lesson – the observation that in AI research, general methods leveraging scale consistently outperform attempts to build in human knowledge or specialized structure. Sutton explains

The bitter lesson is based on the historical observations that AI researchers have often tried to build knowledge into their agents [but] breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.

So, does the Bitter Lesson irrevocably condemn Hayek?

We shouldn’t try to make AI contingent as attempting to specialize AI to applications will fail relative to drawing on scaled large models. And to achieve Sutton’s scale, labs have had to assemble huge sums of capital and compute. Centralization and Sutton’s scale are empirically intertwined!

But a candidate end-state of the Bitter Lesson with only a few large expensive models is also contrary to standard profit maximization objectives of every enterprise. Businesses and consumers will not to pay for large model tokens for the vibes. You wouldn’t put a Xeon processor in your lawn mower. It’s in the interest of every business and consumer to leverage intelligence for as cheaply (as little energy!) as possible.

At closer look, the Bitter Lesson implies decentralized edge intelligence. But for Western venture capital interests why wouldn’t a scale-pilled self-improving AI optimize edge intelligences against context-specific use cases to minimize energy use? Supposing otherwise seems less scale-pilled, not more!

China’s Moonshot AI poses this objective function directly: they are

seeking the optimal conversion from energy to intelligence

This reflects China’s view that value will accrue to the application layer with intelligence another cheap commodity. This is also consistent with my Smart Squeeze thesis: where B2B (re-)selling cognition at the app layer is a path to zero, and instead applications should sell things whose value is orthogonal to intelligence consumption.

Bitter Lesson, meet Physics Link to heading

“Compute on the spot” follows naturally from physics and economics.

Consider a security camera identifying faces: streaming HD video to a cloud model might cost 50W continuously, while an edge chip purpose-built for facial recognition uses 2W. That’s a 25x difference in energy-to-intelligence conversion. Somebody eventually pays the bill.

Or autonomous vehicles: using cloud-intelligence to process 4TB of daily sensor is likely physically impossible. At highway speeds, a car travels ~100 feet per second. Even with 5G’s theoretical 1ms latency, real-world round-trip to cloud inference takes 50-100ms minimum. By the time cloud intelligence says “brake now,” you’ve traveled 5-10 feet into oncoming traffic. The physics of latency makes edge compute mandatory, not optional. The context of the context demands “compute on the spot”.

Energy costs even compound the latency problem: transmitting raw sensor streams would require dedicating significant battery capacity just for connectivity––imagine your car’s “data plan” eating 5-10% of your driving range.

“Compute on the spot” is about the context of the context; often the fundamental physical and economic limits: energy, latency and willingness to pay. Some computations must happen at the point of sensing, or they’re worthless or economically unviable.

Meta-Hayek: energy economics is the new price signal Link to heading

Hayek writes

If we can agree that the economic problem of society is mainly one of rapid adaptation to changes in the particular circumstances of time and place, it would seem to follow that the ultimate decisions must be left to [the man on the spot] who is familiar with these circumstances, who know[s] directly of the relevant changes and of the resources immediately available to meet them.

Of course, in this era it is not a “man on the spot” responding to and translating information into prices, it is “compute on the spot” specialized to the particular circumstances of time and place.

This is Meta-Hayek: It is not that information can’t be centralized (it can!), but that intelligence deployment can’t be efficiently centralized. Intelligence-needing context flows create the same coordination problem Hayek identified, just one level up. The “man on the spot” may not need the information-aggregating price mechanism, but the compute in his place will need to be specialized to the physics and economics of its application. The problem framing is nearly identical: we will get decentralization through price signals, except now the price is energy per unit of useful intelligence.

Meta-Hayek continues

It is with respect to this that practically every [edge compute device] has some advantage over all others because [it] possesses unique information of which beneficial use might be made, but of which use can be made only if the decisions depending on it are left to [it].

The bitter lesson says that scale will win. But the bitter lesson does not account for the physics and economics of applications that strictly constrain how intelligence is used. Applications that live within latency, energy and cost ceilings will always push compute to the edge. Say goodbye to expensive, slow, and risky cloud-based language models.

However scale-pilled you are, you shouldn’t have doubted Hayek.

While the Bitter Lesson tells us to bet on scale – and scale has given us remarkable general-purpose models – the physics and economics of deployment create a counter-force. We’re heading not toward a world of one omniscient central AI, but toward an ecosystem of specialized intelligences – each optimized for the particular context of energy, latency, and cost constraints.

This suggests a different AI development path than pure model scaling: investing in efficient model compression, edge deployments, and application-specific optimizations. The companies that win won’t just have the biggest models, but the best energy-to-intelligence conversion for each use case.

Pareto Frontier of Compute with ‘Tasks/kWh’ on the y-axis and ‘FLOPS’ on the x-axis, plotting various applications from battery-powered smart doorbells to Claude Code along a curved efficiency frontier.

Hayek’s Revenge is that while context may be aggregated, compute must be distributed to be economically and physically viable. The market will speak through energy prices, latency requirements, and deployment costs. And just as Hayek predicted for economic planning, these price signals will drive intelligence to where it’s needed most: to the “compute on the spot.”

The future is less “one model to rule them all” and more “the right model in the right place” – a vindication of Hayek’s core insight, elevated to the age of artificial intelligence.

Energy economics is the new price signal.

–

Thanks to Peter Wang for inspiring and energizing this entry.