Outcomes Protocol

August 13, 2025 24-minute read

The diffusion of AI into the economy is too slow and too concentrated.

Today’s top application-layer companies are mostly not end-to-end. They succeed where tasks can be precisely defined and near-instantly verified. We spin up bespoke vertical SaaS companies to wrap AI labor in trustworthy veneers, often hiding “very negative” unit economics.

The Outcomes Protocol replaces vertical AI with a single mechanism: prediction markets that trade actual liability of AI labor. Traders who bet e.g., AI contract review is flawless pay real damages if it later misses a clause; those who bet correctly collect the premium. By commodifying residual AI risk, the protocol optimally allocates intelligence costs and lets any AI service guarantee its work without building a full-stack company, unlocking deployment across economically valuable tasks (“AGI”) and creating a pathway for public markets to participate in the fruits of AGI.

Crystallized winners Link to heading

Today AI B2B SaaS market has crystallized in verticals

code
legal
medical scribing
customer service or experience
search

Why have these verticals emerged with clarity while others are still developing?

Last week Anjali Shrivastava posed an explanation

Successful B2B AI products have been in areas where there is well-defined work output (code change, legal memo, etc.,). But these are rare – for most work there is no “work output.”

The crystallized categories seem to fit Anjali’s analysis. In each of these verticals, there is well-defined “work output”

code: new code that passes linting tests
legal: red-lined contract clause citing the correct statute
medical scribing: note summarizing the physician-patient conversation
customer service: ticket reply that reportedly resolves the user’s issue
search: ranked list of links whose top result answers the query

Of course, these are not the only use cases for AI, and delivering them comes at a high cost: each apparently needs its own company to develop, secure, and market these use cases. If our goal is to automate all economically valuable work, that wish should apply to both

the tasks themselves
the meta process of taking these automated tasks to market

Anjali’s observation would seem to be a blocker to AGI: Is the space of available B2B AI applications restricted to verticals with well-defined work output? For verticals of well-defined outputs, do we need individual companies to wrap their delivery?

Today, it’d seem the answer is yes. For every candidate vertical we need to spin up companies to handle ill-defined inputs and outputs, absorbing risk, providing guarantees, and (implicitly) price the uncertainty. This is wildly inefficient if our goal is to accelerate the automation of all economically valuable work.

To explore these questions, it’s helpful to unpack Anjali’s definition.

Definition. Work output is well-defined whenever

it’s easy to specify
output verification is fast

This will make way for formality that can help highlight where AI applications succeed or fail.

Four quadrants of work Link to heading

Put formally, you could say the nature of work is described by two axes

Specification entropy ($\text{SE}$): how much stable (exogenous) information you must provide to bound the set of acceptable outcomes
Verifiability horizon ($\text{VH}$): earliest moment an oracle with perfect knowledge of all past and present facts can determine the final success state of a produced artifact. That is, the $\text{VH}$ is the earliest point at which any guarantee, warranty, or outcome-linked contract on that work could be definitively settled.

Under this lens, a quarterly tax return cannot be judged correct until the quarter’s books are closed; the oracle must wait for the calendar quarter to end before the verdict is knowable. On the other hand, a unit-test result is verifiable the instant the test finishes—the oracle needs no future data.

From a cost perspective, $\text{SE}$ determines the risk of getting the spec wrong (missing clauses, unseen edge cases, idiosyncratic AI errors), while $\text{VH}$ determines the risk of learning too late (capital locked up, irreversible decisions made, penalties triggered).

Varying $\text{VH}$ and $\text{SE}$, we see:

	Short VH	Long VH
Low SE	“Tool” spell-check suggestion passed unit-test diff that merges clean	“Template” quarterly tax return insurance claim letter customer concern resolved
High SE	“Copilot” generate NDA from one-pager SQL migration from schema diff Pulumi for new micro-service	“Consulting” product launch campaign enterprise re-architecture drug Phase-III design

Today’s crystallized winners are all Tools: they have Low $\text{VH}$ and Low SE.

For instance, medical scribing has low Specification Entropy because the AI can be judged by a short, fixed list of facts: patient’s reason for the visit, symptoms, diagnosis, and next steps—that fit into simple boxes. The doctor reads the note and affirms the transcription on the spot. There’s no waiting for extra results, so $\text{VH}$ is low.

Code generation fits this too. $\text{SE}$ is low e.g.,

“make this feature and confirm the patch compiles and every automated test passes”

and the CI process returns status in seconds. Engineers can see the verdict immediately and no external event can later overturn it. $\text{VH}$ is low.

Low SE means there’s no ambiguity about the requested task; there’s no room to claim

no that’s not what I meant

while short VH means there’s no delay before judgment; no

let’s wait and see how this plays out

These are the conditions for perfect accountability and complete contracts. Low SE means requirements can be readily specified upfront while short VH means success is immediately observable. There’s no residual uncertainty about whether the work met expectations.

AI thrives where the handshake between human and machine can be perfectly defined. The moment your task involves:

High SE—large or uncertain requirements
High VH—success that only time will reveal

you can’t write a complete contract. The AI product company must absorb this contractual incompleteness, essentially becoming an insurance company without insurance markets. This doesn’t scale.

This is likely why Balaji says most AI is “middle-to-middle” not “end-to-end.” Prompting and verifications are less technical bottlenecks and more likely struggles of the diffused responsibility boundary that makes AI-work delivery inefficient or incomplete.

This raises the question: must we remain trapped in “middle-to-middle” AI work or the narrow zone where contracts are already complete?

The Bottleneck Link to heading

Are AI applications doomed to live in the upper-left quadrant forever?

Every current winner lives in the Tool cell: their specs are readily defined and results quickly verified. As soon as a task needs

High SE—more bits of stable spec
High VH—time to reveal correctness

costs explode. You need to spend more time in customer discovery to understand the customer problem while reserves against future failure scale with revenue.

Companies are already experimenting with guaranteeing outcomes with warranties. Two months ago Intercom CEO Eoghan McCabe launched the Fin Million Dollar Guarantee guaranteeing customer satisfaction and a resolution rate greater 65% or up to $1mm back. Of course, this is a bespoke solution: Intercom must price the risk internally without any market benchmark. And with no market price for that uncertainty, buyers anchor to the lowest credible bid, forcing sellers to shade below their actuarial cost. This will accelerate margin erosion, leaving only bespoke vertical wrappers; most of which won’t survive the Smart Squeeze.

Clearly real economic activity doesn’t stop where uncertainty gets expensive. So how do we expand AI application cheaply and at scale?

Escape routes Link to heading

Imagine you’re a founder with an AI that can review vendor contracts.

Your product catches obvious issues today, but you can’t know if you missed subtle liabilities until after the contract is signed and terms are exercised. You’ve built intake flows for contract PDFs and company policies, but clients might not flag all relevant context that’s critical to your work. You’ve run your agents through evals but edge cases in bespoke terms are endless.

So you employ a human-assisted AI playbook:

Hire contract lawyers to spot-check every high-value review until AI confidence improves
Try to minimize human lawyer spend but suffer consequences of failure – churn or reputational damage – without a way to hedge

Ideally you can contractually promise reliable service (unclear if Crosby does this today), but to do this you must absorb

Time Risk (high VH): Problems only surface when counterparties exercise obscure clauses weeks later

Spec Risk (high SE): The client didn’t mention their subsidiary relationships, the AI missed interplay between clauses, or unusual governing law creates unexpected obligations

Either risk pushes the task out of the Tool cell:

Template (low SE, high VH): Standard NDAs with clear terms, but liability emerges over time
Copilot (high SE, low VH): Complex M&A doc where specs are unclear but issues surface quickly
Consulting (high SE, high VH): Bespoke partnership agreements where both problems compound

Each non-Tool scenario kind of sucks. They blur the responsibility boundary between application layer and end-user and have only homespun tools like Intercom’s self-managed warranty to solve.

It’s hard not to wonder then – instead of building another vertical company for every uncertain task, why couldn’t we monetize the uncertainty itself?

Instead of each company managing these risks independently, why shouldn’t we create a market for this time and specification uncertainty? Such a market could enable the delivery of specialized AI services with payments settled immediately and residual exposures priced, traded, and settled by whoever values it best.

This would constitute a direct application of Christensen’s Law of Conservation of Modularity: here integrating guarantees and inference demand aggregation while modularizing AI-labor itself.

Christensen’s Law of Conservation of Modularity applied to Outcomes Protocol

The Outcomes Protocol decouples trust from service delivery and transforms uncertainty into a tradeable commodity, unlocking AI application without customer discovery, manual risk pricing, or company formation.

Outcomes Protocol Link to heading

How does Outcomes Protocol work? Link to heading

Outcomes Protocol is prediction market infrastructure: it takes AI work and manufactures tradeable outcome contracts, turning accountability problems into pricing problems.

It works by making uncertainty explicit. Every AI task carries residual risk – the gap between what AI delivers now and what ultimately determines success. Today’s application layer operates in spaces where that gap can be taken to zero, either by being a “middle-to-middle” service (Harvey, Cursor) – so all the risk is carried by the user – or engaging in spaces where VH and SE are effectively zero (Abridge, Crosby). The protocol treats this gap as a financial object that can be priced and traded.

Why can’t this live at the application layer? Link to heading

Notably the protocol doesn’t exist at the application layer.

And can’t.

The application layer’s entire value proposition is the vertical integration of work and trust. Intercom doesn’t sell “customer service plus separate insurance;” they sell customer service guaranteed. Crosby doesn’t sell

MSAs and DPAs we hope are good plus insurance just in case you don’t trust us or our institutional investor Sequoia

they sell “Expert accuracy.” The integration of trusted low-cost work is their moat. This creates a conflict with candidate app-layer insurance providers.

The application layer wants to hide uncertainty and intelligence consumption. It’s a cost center it wishes to privately minimize, while efficient markets want to modularize and expose the uncertainty and effort level so it can be priced.

When Intercom self-insures their AI they face mixed incentives: transparent risk reporting is good for users but bad for competitive positioning. As soon as Intercom decomposes its integrated offering into modular components, they invite commoditization by simple price comparisons:

what guaranteed resolution rate can I get for how much money did you say?

This is why agent insurance sold to the application layer is backwards. Application layer companies view uncertainty-management as a proprietary advantage or an embarrassing cost center. They’ll never voluntarily commoditize it. And if they do, they want to own the surplus, not pass it to a middleman in perpetuity.

💁 This appears consistent with Waymo’s move to self-insuring their rides from buying spot insurance from a third party.

This protocol sidesteps all this.

By operating below the application layer (likely at the router or lab layers - I explore this later), it modularizes AI work from risk-bearing and energy-optimizing; and integrates trusted delivery with demand aggregation across all AI verticals. This is Christensen’s Law in action.

Making uncertainty tradeable Link to heading

The protocol standardizes every task as a bundle:

Category	Notation	Example
Request	spec $s$, context $x$, model $m$, harness $h$	$s$: “Do my taxes”; $x$: W-2s, 1099s, receipts; $m$: `claude-opus-4-1-20250805`; $h$: `claude-code-legal-harness`
Effort	effort $e$, cost $c(e)$	$e$: “deep liability scan + precedent check”; $c(e)=75$ dollars in compute + lawyer spot check
Deliverable	artifact $y$; ex-ante: random $Y_e$	Redlined contract with risk assessment produced under effort $e$
Verification	horizon $V$	$V=30$ days (issues surface after signing)
Future state	realized $\omega$ in possible futures $\Omega$	Hidden liability triggered; unfavorable term exercised; clean execution
Success test	$g(y,\omega) \in \{0,1\}$	$g=1$ if no problematic clauses activated, 0 if costly issue emerges
Loss function	realized $\ell(y,\omega) \in \mathbb R_+$	$\ell=$ $10,000 if problematic clause triggers, 0 otherwise
Residual risk	random variable $L_e = \ell(Y_e,\Omega)$: possible dollar loss after effort $e$	Binary loss: $10,000 if missed clause causes damage, 0 otherwise; with ~3% probability
Risk price	$\rho_V(L_e)$: premium now to offload the residual risk $L_e$ until $V$	In a risk-neutral market: $0.03\times10000=300$ dollars. With risk/time loading, could be $400
Outcomes Contract	$\mathsf{OC}(y)$ pays $\ell(y,\omega)$ at $V$	Claim pays $10,000 if reviewed contract causes loss within 30 days

That is, every task specializes a request $(s, x, m, h)$ which produces an artifact $y$ through effort $e$. At verification horizon $V$ the world reveals state $\omega$ determining success $g(y, \omega)$ and any loss $\ell(y, \omega)$. The residual risk $L_e = \ell(Y_e, \Omega)$ (the ex‑ante dollar loss after effort $e$), can be priced today at $\rho_V(L_e)$ and traded via an Outcomes Contract $\mathsf{OC}(y).$

This machinery transforms unpriced accountability into financial instruments. Instead of end-to-end AI leaving users wondering

Legal: “did they miss any problematic clauses?”
Code: “will this feature break production?”
Support: “did they actually solve our customer’s problem?”

we have an artifact $y$ that carries risk $L_e$ priced $\rho_V(L_e)$ for downtime, liabilities, or escalation cost coverage that will be revealed $V$ time in the future, protecting the user if AI made a mistake.

You can see machinery of prediction markets as AGI infrastructure starting to take shape.

Outcomes Contract $\mathsf{OC}(y)$ is essentially a prediction market on whether artifact $y$ will fail. Market participants bet on artifact future success $g(y, \omega)$

will this contract have hidden liabilities?
will the code break in production?
will the resolution sequence resolve the customer’s problem?

The price $\rho_V(L_e)$ emerges from traders (possibly other AI!) competing on their beliefs about failure probability.

Unlike traditional prediction markets where you buy shares that pay $1 if you’re right about an outcome, here traders buy and sell actual economic liability. Traditional prediction markets discover information

Mamdani has a 81% chance of winning the NYC Mayoral Election

while the Outcomes Protocol transfers risk, enabling cheaper service (“AI legal review for 475 dollars guaranteed, v 2000 for a human lawyer).

When a trader sells $\mathsf{OC}(y)$ for 400, they’re betting the AI work is correct, collecting 400 now in exchange for potentially paying up 10,000 if it fails. Just like any insurer, they profit if their risk assessment is correct. The AI work provider charges the customer 475 total (75 for the AI work + 400 for the certainty) – still 75% cheaper than human review.

This transforms speculation into infrastructure.

💡 See Sam Lessin’s 𝕏 Hayek’s Revenge: prediction market betting on LLM output.

Every trade enables AI work to proceed by moving risk from those who can’t (or won’t!) efficiently or sustainably bear it (👀 vertical AI companies) to those who can price it (specialized traders; likely other AI). In the Outcomes Protocol, traders get paid to be right through insurance premiums. Every trade directly enables AI work to be provided at prices that undercut all but the most efficient AI labor, as per Christensen, the AI work provider aggregates demand and trust while modularizing the work itself.

This breaks open our quadrants: We escape the Tool quadrant not by making AI perfect but by making imperfection perfectly priced.

With the Outcomes Protocol we can deploy AI anywhere markets are willing to price the uncertainty.

Clean!!

The core mechanism Link to heading

The AI work provider faces a fundamental tradeoff.

More effort $e$ reduces residual risk but costs more to produce:

$$\text{TotalCost}(e) = c(e) + \rho_V(L_e)$$

where $c(e)$ is the artifact-production cost and $\rho_V(L_e)$ is what the market charges today to absorb the risk $L_e$ that resolves at horizon $V$.

The optimal effort satisfies the simple condition: stop adding certainty (whether by more compute, human review, or earlier $V$) when the marginal cost equals the marginal risk reduction. That is

$$c^\prime(e^\ast) = -\frac{\partial \rho_V(L_e)}{\partial e}$$

In short, stop working when certainty gets too expensive.

Protocol in action Link to heading

Let’s make this concrete with the legal example from the table.

Quote Request: A client provides a spec (“review this vendor agreement”) and context (the contract PDF).
Market Pricing & Optimization: The protocol queries the risk market: “What is the price of risk for this task at various effort levels?” The market provides a curve of live, tradable prices. The provider’s optimization engine finds the lowest total cost.
- Effort $e_1$ (single pass): Compute Cost 0.05 + Risk Price 800 = 800.05 Total
- Effort $e_2$ (triple scan): Compute Cost 75 + Risk Price 400 = 475 Total (Optimal)
Final Quote: The provider quotes the client the final, all-in price for a guaranteed outcome: 475.
Settle & Mint: The client accepts and pays the 475. Upon payment, the protocol mints the Outcomes Contract $\mathsf{OC}(y)$ and instantly allocates the funds: 400 to the traders who bought the risk, and 75 to the AI work provider.
Deliver: The now-paid provider executes the work at the agreed-upon optimal effort ($e^*$) and delivers the redlined contract $y$ to the client.
Resolve: At day 30, if no issues surface, traders keep their premium. If a problematic clause triggers a loss, the traders pay out the $10,000 to the client as stipulated by the contract.

The economics are clean:

End-users: Pay 475 for guaranteed AI legal review (75% savings vs 2000 human lawyer). If something goes wrong, receive $10,000 payout.
AI work providers: Earn 75 for the work, pass through $400 risk premium. No capital reserves needed.
Public market: Collect 400 premium for accepting a 3% chance of a 10000 payout. The 300 covers the expected loss, and the extra 100 is the market’s profit for bearing the risk. Expected value: $400 - (0.03 × $10,000) = $100 profit per contract.

By explicitly pricing uncertainty, the protocol is able to eliminate

No vertical SaaS needed: No company building or vertical integration of trust needed.
No bespoke warranties: No Intercom-style guarantees requiring custom pricing per customer.
No customer discovery: All AI labor is specified by the request tuple.
No capital reserves: The protocol decentralizes the securitization of risk.

And create:

Optimal effort by default: AI automatically chooses $e^*$ where marginal effort cost equals marginal risk reduction: e.g., no hidden service degradation
Risk finds its natural owner: a proper division of labor for the AI age.
Continuous improvement: Every resolved contract improves the full stack of AI labor efficiency and pricing.
Open markets for models: New AI providers can compete as soon as they make themselves legible to risk traders (non-trivial).

Let’s see how this can work in practice.

Toy example: legal contract review Link to heading

For a standard 12-page vendor agreement (50000 annual value, 30-day verification horizon, 10000 potential loss):

Provider	Model	Effort $e$	AI Cost $c(e)$	Risk Premium $\rho_{30}(L_e)$	Total Cost	Outcome
Human Lawyer	Human	2 hour review	$2,000	$0*	$2,000	Baseline
Raw AI	Claude 4.1	Minimal	$0.40	$\infty$	$\infty$	No recourse
Protocol	GPT-5	$e_1$: Single pass	$0.05	$800 (8% fail)	$800.05	Too risky
Protocol	Claude 4.1	$e_2$: Triple scan	$1.20	$150 (1.5% fail)	$151.20	Optimal ✅
Protocol	Claude 4.1	$e_3$: AI + 15min human	$251.20	$120 (1.2% fail)	$371.20	Unnecessary

The protocol selects $e^* = e_2$ (pure AI triple scan) at $151.20. [Note * we’re assuming the human lawyer is perfect but provides no contractual guarantee or recourse if errors occur.]

Here, adding even 15 minutes of human review at the 1000 / hour rate more than doubles the total cost while barely reducing risk (saves only $30 in premium). In this toy example the human is priced out of the market.

Of course, as models improve, this gap widens:

Timeline	Optimal Approach	AI Cost $c(e^*)$	Risk Premium $\rho_{30}(L_{e^*})$	Total Cost	vs Human
Today (2025)	Claude 4.1 triple scan	$1.20	$150 (1.5% fail)	$151.20	-92%
Near-term (2026)	Claude 5 deep analysis	$2	$20 (0.2% fail)	$22	-99%
Medium-term (2027)	Claude 6 single pass	$0.40	$2 (0.02% fail)	$2.40	-99.9%

With model improvements, guaranteed pure AI becomes the economically optimal choice—cheaper and faster (and possibly more accountable – here malpractice is explicitly priced!) than any alternative.

Implications and qualifications Link to heading

What about adverse selection? Link to heading

Won’t customers only buy guaranteed AI for the tasks they think will fail?

If they do, adverse selection is priced in. The protocol makes hard tasks auto-cost more while sloppy customers pay reputation surcharges. High-SE tasks with ambiguous specs generate higher risk prices $\rho_V(L_e)$. A bespoke partnership agreement costs more than a standard NDA. This will likely segment the market

Easy tasks (low SE, short VH): Near zero-premium, AI dominates
Hard tasks (high SE, long VH): High premium, might be cheaper to hire a human
Middle ground: Protocol enables AI application where it’s economically efficient

Customer reputation also becomes market-enforced. The protocol tracks which customers consistently submit underspecified work or withhold critical context. Because the protocol is actively optimizing effort and risk, it fights against this candidate adverse selection by rewarding customers who provide complete specifications with lower AI work prices.

Why won’t the protocol eventually commodify itself? Link to heading

If you have large flows of similar tasks, do you need to pay rents to the protocol?

Consider $N$ similar tasks processing tax returns through the same model $m$, spec $s$, and harness $h$, with efforts $e_1, …, e_N$ chosen based on complexity of context $x$. You can assemble these losses into a portfolio writing their total loss $S_N = \sum_{i=1}^N L_{e_i}^{(i)}$.

If these losses were independent, diversification would work: as $N \to \infty$, the per-task risk premium would vanish. You’d only pay expected losses, not uncertainty. The protocol becomes unnecessary and you self-insure.

But task correlation is persistent and valuable work resists this commodification:

Correlation persists: When tasks share the same model $m$ and harness $h$, they inherit common failure modes. A model hallucination pattern, a retrieval bug, or an IRS interpretation change cascades across all returns. Even small correlation ($r = 0.1$) with large $N$ keeps the effective sample size bounded: A thousand tasks might only provide the diversification of ten.
Per-instance stakes and reputation: Guarantees are non-fungible: each job must be right on its own. Telling a client “we’re right 99% of the time” after their return fails doesn’t restore their penalties, and a visible miss can cost future business. This “reputational spillover” makes even large portfolios of “similar” work sensitive to single failures.

The protocol seems to have durable value for all but the most scaled cases of commodity work where statistical self-insurance produces savings.

This protocol prices the marginal value of AI capability Link to heading

Today new models are released with fanfare. We get benchmarks and anecdotes but little around the economic value a new capability level unlocks. Outcomes Protocol changes this: it implicitly prices AI capability.

Consider a task specification $(s, x, h)$ and two candidate AI models $m_1$ and $m_2$ with optimal efforts $e_1^\ast, e_2^\ast$.

Each model-effort pair produces a random variable representing potential loss: $$L_{e_i^\ast}^{(m_i)} = \ell(Y_{e_i^\ast}^{(m_i)}, \Omega), \quad i = 1,2$$

The market prices each model’s service at: $$\text{Price}(m_i) = c(e_i^\ast) + \rho_V(L_{e_i^\ast}^{(m_i)}), \quad i = 1,2$$

The marginal value of using $m_2$ over $m_1$ is simply: $$\Delta = \text{Price}(m_1) - \text{Price}(m_2) = [c(e_1^\ast) - c(e_2^\ast)] + [\rho_V(L_{e_1^\ast}^{(m_1)}) - \rho_V(L_{e_2^\ast}^{(m_2)})]$$

This decomposes into:

Efficiency gain: How much less effort $m_2$ needs to achieve similar outcomes
Risk reduction: How much more reliable $m_2$ is at its optimal effort

If a new model reduces tax return error rates from 5% to 2% on a 500 dollar loss, the risk premium drops from 25 to 10. That $15 per return is the exact marginal value of the model improvement.

This extends to any capability axis:

Longer context: Reduces $L_e$ by catching more edge cases → lower $\rho_V(L_e)$
Better reasoning: Allows lower effort $e$ for same accuracy → lower $c(e)$

The protocol essentially creates a liquid market for intelligence itself where every capability improvement corresponds to energy or costs saved in the delivery of priced guaranteed work. This explicit optimization drags scaled cognitive work into open markets, leaving vertical startups to keep selling artisanal

we got you bro we’re backed by the Best VCs 💕

warranties like Etsy candle-makers at a Renaissance fair.

Oracle Mechanism: Self-reporting as optimal strategy Link to heading

Who reports if the work succeeded and why should we trust them?

⚠️ This will likely need more development.

One path is to rely on the customer to report failures, using their incentives to encourage honesty.

This choice follows ad-tech, where e.g., users of Facebook Advantage+ and Google’s Performance Max report conversions. They do this reliably because it helps the ads platform improve targeting and lowers customer acquisition costs. Misreporting conversions causes the algorithm to optimize for the wrong outcome, burning your budget. The Outcomes Protocol pursues identical incentives: honest failure reporting improves AI work quality and reduces costs.

This is a sharp contrast to vertical AI where market participants face the opposite incentives. Harvey claims to tie its renewals to hours saved but its customers are incentivized to lie. Why honestly report hours saved if it means your vendor will charge you more? Similarly, AI labs are blocking competitors from viewing outcomes streams – because outcomes data are their moat.

The protocol inverts this dynamic by making outcome data a public good that emerges from self-interest. Customers honestly report outcomes to minimize their AI labor costs, not altruism. That these reports improve pricing for everyone is a fortunate byproduct.

That said, for large policies $\rho_V(L_e)$ the protocol will likely need more sophisticated oracle adjudication.

How do you bootstrap liquidity? Link to heading

😔 Probably the hardest part.

One way to start could be building a vertical AI startup with a hidden Outcomes Protocol backend – a Crosby 2.0. Run the protocol initially off your own balance sheet, proving the mechanism works within your vertical and harvesting loss histories. Then begin to open your books to more investors, ideally learning how to expand your risk pricing across requests $(s, x, m, h)$ from other verticals.

While each tuple $(s, x, m, h)$ could imply the existence of isolated micro-markets that could make pricing the protocol expensive, it seems fair to imagine that

$s$ will collapse to best-practice for each vertical application
$m, h$ will be de-facto frontier model and harness to start so that request context $x$ is the only free variable, making requests more likely to be fungible.

Alternatively, you could go to existing prediction markets, but there may not be enough liquidity there, nor might participants have the tools needed to actually price. In this way, the vertically integrated approach appears more directed and grounded.

Clearly, expanding to all economically important activity will involve managing tail risk that may require more institutionalized insurance.

Should this live at the model or router layers? Link to heading

If the protocol lives at the model layer, the lab could use market information to directly update its model and harness, and guide users in writing effective specifications. On the other hand, its protocol would likely be constrained to its own model universe, which may not provide a complete menu of AI labor options suitable for all AI labor customers.

If the protocol lives at the router layer, the router may not be positioned to update weights, but it can communicate cognitive labor market information to model builders and focus entirely on AI work demand aggregation. The router may provide superior value to business customers who then can use a single API for all AI labor, without concerns of which lab protocol has the best prices.

Conclusion Link to heading

Returns from AGI should benefit everyone and be delivered at the optimal conversion of energy to intelligence.

America’s strongest candidate to diffuse AI into the economy – the B2B application layer – is

too slow
faces perverse incentives
has unsustainable unit economics
and is without efficient tools to manage AI risk

The application layer’s ‘do things that don’t scale’ approach to AI application is contrary to the Bitter Lesson. Economists have wondered how application-layer 5-9 year exit horizons align with AGI timelines. If AGI arrives in this window, why are we building thousands of artisanal wrappers?

The Outcomes Protocol offers a different and superior path: make uncertainty liquid. By explicitly pricing the conversion of energy or effort (compute, human review, better specs or models) and risk (what remains uncertain at the moment of deferral to AI), the protocol transforms AI deployment from a company-building problem into a market pricing problem. Every AI API gets dragged into a ruthless open market where only the optimal energy-to-certainty conversion survives.

There are a whole lot of details I’ve skipped or glossed over here

regulatory compliance
tail risk management
more detailed liquidity bootstrapping plan

but I remain excited about the core insight: AI deployment isn’t bottlenecked by capability but the inability to price uncertainty. Today’s vertical AI companies either skip end-to-end applications, sell commodity low-risk deployments at massive margin, or take on risk without the infrastructure. Tomorrow’s AI infrastructure will separate risk-bearing from service delivery, letting markets take the wheel and allocate resources efficiently.

AI risk isn’t going anywhere. World context continues to change. We won’t ever be completely certain about the candidate correctness of a piece of AI labor. If successful, the Outcomes Protocol could allow public markets to participate in AGI’s durable AI-risk problem. And it could yield energy-optimized AI deployment whereever it’s economically efficient.

Why should AGI deployment wait for vertical application when it could be deployed at a protocol layer?