Summary: Lab margins and value participation fall out of the distribution of task complexity, but these present conflicting incentives for the labs.
⚠️ Fair warning this blog is more technical – feel free to copy as markdown to a LM.

Dario continues to predict high unemployment with high GDP growth

The signature of this technology is it’s going to take us to a world where we have very high GDP growth, and potentially also very high unemployment and inequality.

Mustafa Suleyman of Microsoft expects similarly

So white-collar work, where you’re sitting down at a computer, either being a lawyer or an accountant or a project manager or a marketing person — most of those tasks will be fully automated by an AI within the next 12 to 18 months.

These predictions come as estimates roll in that labs are making somewhere in the neighborhood of 80% margins on inference.

Indeed, American labs face a threat of open source. Businesses for whom inference constitutes a critical component of their production supply chains are turning to open source (see Cursor) to get a handle on their COGS.

With these doomer narratives directed at meatbrains, it seems only natural that we also examine the wage-collecting capacity of the American lab models themselves.

Will models earn a living wage either?

Why should we think so?

Price elasticity by task complexity Link to heading

I spend the rest of this blog extending a framework introduced last month by Fausto Uribe.

The economic idea is very simple: American AI labs can continue to charge high prices only if there remain tasks for which customers will pay almost anything. Whether these tasks exist, or are a hallucination of a customer set desperate to escape the Permanent Underclass, is the open question. I’ve previously explored whether tokenmaxxers have adverse selection–that the businesses worse positioned to deliver durable fruits from AI are also the most likely to spend on frontier AI, while the well-positioned just wait.

Either way, we can readily answer this question by characterizing the distribution of how much value AI creates across tasks. The labs need the distribution to have a heavy right tail. If the distribution has a heavy right tail, American labs would appear better positioned to command high prices even as cheaper alternatives catch up. If the distribution is thin, however, their pricing power may slip.

Returning to @fauxzus setup, let

  • $h$ denote hours saved in a task by pairing with AI v not
  • $w$ denote a worker’s hourly wage in dollars
  • $T$ the tokens consumed by a given task and
  • $p$ the price per one million tokens

So, the dollar value of using the model on a given task is $h w$ and the dollar cost is $p T$. Then, of course, the user will apply the model whenever $w h \ge p T$ and have willingness to pay $w h / T \ge p$. So demand at $p$ is the share of tasks satisfying this inequality and the collection of tasks using the model have $h \ge p T / w$.

So far we’ve considered $h$ for a single task, but really $h$ belongs to a distribution of tasks. @fauxzus supposed that $h$ was exponentially distributed, but that leaves the result of his blog––labs have little pricing power––feeling nearly tautological. That is, assume that very high-value tasks are exponentially rare and––no surprise––labs have no pricing power. This also seems counter to emerging narratives that AI can actually save businesses lots of time.

Let’s instead suppose that $h$ is Pareto distributed

$$h \sim \text{ Pareto}(\alpha, x_m)$$ $$P(h > z) = \left(\frac{x_m}{z}\right)^\alpha, \quad z \ge x_m$$

where $x_m > 0$ is the minimum time savings any AI-assisted task delivers and $\alpha > 0$ is the shape parameter controlling how fat the right tail is. Smaller $\alpha$ means a fatter right tail, and a larger difference between the 95th percentile time savings and the medianth. @fauxzus’ exponential distribution implied that price elasticity rose linearly as price increased, but under Pareto it’s constant––$\alpha$––and more generous to the lab narrative: at all price levels there is a cohort of high-value tasks worth paying for.

Applying Lerner condition for the price-setting monopolist––markup over marginal cost should be set to the inverse of price elasticity––we get

$$\frac{p - \text{Marginal Cost}}{p} = \frac{1}{\alpha}$$ and $$p = \text{Marginal Cost } \cdot \frac{\alpha}{\alpha-1}$$

For Pareto distributed tasks, you can see how pricing power is a direct function of the shape of the distribution. Lab per-token markup is just $1/\alpha$. Decide the distribution of task value, and you can characterize lab future pricing power.

Of course, the inference market is not a monopoly. @fauxzus supposed an analytical bridge to oligopoly which I’ll ignore here. The monopoly position is the most favorable condition for an enterprising lab. Under other competitive scenarios––choose your favorite––price elasticity is certainly higher as users have more opportunities to defect.

Applying @fauxzus’ calibration

  • $p = $ $6/millions of tokens (using blended Opus 4.6 pricing)
  • marginal cost ~ $2/millions of tokens

and we get $\alpha = 1.5$. For the Pareto distribution, this means that variance is infinite: a small set of tasks dominates the distribution.

Of course, however, $\alpha$ is not fixed, but evolves.

Distillation thins the tail Link to heading

So far we’ve written the distribution of task value as a function of frontier capabilities. That said, whenever an open source model matches capability that previously required an American lab, the corresponding tasks migrate out of the right tail into the body of the distribution where cheaper alternatives work fine, effectively shortening the right tail.

A proper modeling then would write $\alpha$ as a function of time. These could take functional forms

  • linear: $\alpha(t) = \alpha_0 + \rho t$
  • race: $\alpha(t) = \alpha_0 + t (\rho_D - \rho_F)$

where linear supposes distillation or open source alternatives grow at a constant rate year over year. The race state models distillation rate $\rho_D$ relative to frontier capability expansion $\rho_F$ that re-fatten the tail. This explicitly sets frontier model capability expansion rate relative to open source alternatives.

Looking at OpenRouter’s State of AI 2025, open-source model inference share grew to 33% of weekly tokens from 25% November ‘24 to ‘25. OpenRouter is likely a biased sample (e.g., is OpenRouter merely the easiest place to get open source inference?), but directionally it appears that $\rho_D$ may be outpacing $\rho_F$. This puts us in @fauxzus “grim future” where $\alpha$ rises over time and lab pricing power decays.

Hayek’s Revenge Link to heading

So far we’ve considered lab margins as a function of the distribution of task complexity. But what about value capture? Even if $1/\alpha$ is high, do labs actually accrue very much value anyway?

This question reduces to the lab’s (and the B2B application layer’s) pricing problem. Ideally, labs could observe the value that accrues to customers through the application of their models. This, however, is private information.

Lab customers wish to shield this surplus from the labs, as this would signal the areas where the labs should vertically integrate. This is essentially a Hayekian effect: The Man On The Spot has superior signal into the value of the goods and services he sells than the labs, and it’s only revealed to the labs indirectly through a price and demand signal.

Formalizing this, at the threshold where customers buy inference $h^\ast = p T /w$ we have $$ \Pi(p) = \left(p - \text{Marginal Cost }\right) T D(p)$$ $$ \text{Customer Surplus}(p) = \int_{h^\ast}^\infty (w h - p T) f(h) dh$$ where $f(h)$ is the density function of $h$.

Lab profit is a constant $\left(p - \text{Marginal Cost }\right) T $ independent of hours saved $h$. Consumer (or application layer) surplus per served task is wages saved minus token cost $w h - p T$, which grows linearly in $h$. The labs’ pricing only touches the cutoff while everything above accrues to users and businesses. Evaluating consumer surplus

$$\text{Customer Surplus}(p) = D(p) \left( w \mathbb{E}[h : h \ge h^\ast] - p T\right)$$

This matches intuition: total customer surplus is number of served tasks times the average surplus per task. Since $h$ is Pareto distributed, for $\alpha > 1$ this further reduces

$$\mathbb{E}[h : h \ge h^\ast] = h^\ast \frac{\alpha}{\alpha - 1}$$

saying the average served task has time savings $\alpha/(\alpha-1)$ times the cutoff $h^\ast$. Substituting, and using $w h^\ast = p T$ we get clean

$$\text{Customer Surplus}(p) = \frac{p T D(p)}{\alpha - 1}$$

emitting two useful intuitions. First, customer surplus scales linearly with token revenue $p T D(p)$. Second, and interestingly, the multiplier $1/(\alpha -1)$ measures how much extra value the customer keeps relative to what the labs get paid.

To see the ratio of surplus that goes the lab v the customer, using the monopolist lab’s chosen price $p^\ast = \text{Marginal Cost } \alpha / (\alpha - 1)$

$$\frac{\Pi(p^\ast)}{\Pi(p^\ast) + \text{Customer Surplus}(p^\ast)} = \frac{\alpha - 1}{2\alpha - 1}$$

So lab per-token markup and surplus participation is entirely given by the parameterization of task complexity. At @fauxzus’ calibration ($\alpha=1.5$) this evaluates to $0.5/2$ or 25%. That is, the lab keeps a quarter of the value its tokens create. The other 75% accrue to customers as customer surplus.

The marginal return to vertically integrating is highest when $\alpha$ is smallest. The labs do best by vertically integrating––e.g., with Deepmind’s Isomorphic or Anthropic’s Coefficient––when the tails are heaviest. Lighter tails (larger $\alpha$) leave the labs with worse per-token margins $1/\alpha$ and fewer opportunities to charge higher prices, because any right tail that could justify high token prices vanishes.

You might worry that this result is an expedient result of Pareto, but Pareto is not a controversial distribution assumption –– it merely supposes that task complexity has a heavy tail––is power law distributed. Choosing other distributions leads to the same qualitative observation.

Complexity Frenemies Link to heading

Greater task complexity––lower $\alpha$––is both good and bad for American labs.

On one hand, greater task complexity leads to greater per-token margins. In this case, value from models is high for certain tasks: customers will pay a lot for the time savings.

On the other hand, greater task complexity also means that more surplus goes to lab customers. Lab pricing only touches the marginal buyer––value above the cutoff $h^\ast$ accrues to end customers.

We could evaluate our Pareto assumption by calculating the true distribution of $h$ within a company. Doing so could reveal whether we’re in an exponential or Pareto regime. Further, it could be helpful to try to estimate $\rho_D$, the rate at which distillation thins the right tail. This could be estimated in a number of ways e.g., by running competing open and closed models through the same evals.

Cybernetic Rollup Supremacy Link to heading

AI labs are frenemies with complexity.

More complexity means high $1/\alpha$ margins and pricing power on tasks that absolutely require frontier intelligence. On the other hand, higher complexity also means more surplus goes to lab customers than the labs themselves. At $\alpha=1.5$, the lab captures only 25% of the value its tokens create. The remaining 75% of the value created goes to lab customers. At even heavier tails e.g., $\alpha=1.2$ the lab captures only $0.2/1.4 \approx 14$% of the value their models create, with the rest accruing to downstream operators. If you’re bullish on the tremendous value creation potential of AI, a Cybernetic Rollup is a superior vehicle to accrue value than selling inference.

A naive read might suppose this explains the holdcos, but it appears the labs are using them to buy token revenue, not to sell EBITDA. In a regime of heavy tails, value accrues to the vertically integrated assets, who can participate in the full surplus of intelligence delivered (and decide whether or not they care about high margin token revenue at all).

This is the Cybernetic Rollup thesis I posed in December: value accrues to the vertically integrated assets that own context origination at the edge. Information economics––not attended to here––also favor vertical integration, as distribution of intelligence yields negative externalities for both seller and buyer.

It seems people are starting to wake up to token economics and surplus accrual:

“We’re using $300M of @AnthropicAI this year … the vast majority of those tokens don’t need to go to Anthropic.”

explained @benioff on @theallinpod. @usvlibrian is feeling similarly.

It’s entirely unclear what labs should do –– they’ve got a scaling token-sales business that either leaves a ton of money on the table (thick tail) or is about to get compressed (thin tail).

Neither are a recipe for high wages.