Independent Verification Goes Mainstream
A bipartisan federal discussion draft, Anthropic, OpenAI, and Google DeepMind are all converging around independent verification as a key element of AI governance.
For most of the last two years, AI governance has faced a core structural challenge: government can’t keep pace with the speed and complexity of technological change, and industry, even if well-intentioned, can’t be expected or trusted to grade its own homework. Meeting the moment—setting up a system of governance that ensures these systems are safe, secure, and behave as expected—requires a model that pairs public accountability with private technical rigor. In the span of a few weeks, the most consequential players in American AI policy have begun to converge on an answer: independent verification.
The Independent Verification Organization (IVO) model is straightforward: government sets outcome-based safety standards. A competitive marketplace of licensed, expert-led organizations verifies whether AI systems meet them. And AI companies opt in to the verification process to earn a trusted signal and the legal clarity that comes with it. The builder, tester, and standard-setter are three separate entities, rather than one. The result is a system of earned trust, one that benefits users, developers, deployers, and regulators alike.
There is now real convergence, across both the public and private sectors, that independent verification is a critical element of AI governance. But that convergence is only the first step. The harder question of what independent verification should actually look like is where the debate now turns, and the recent proposals are where that debate is taking shape. Fathom has spent two years developing and pressure-testing this model, which informs the read that follows.
The most consequential sign came from Capitol Hill. On June 4, Representatives Jay Obernolte (R-CA) and Lori Trahan (D-MA) released a bipartisan discussion draft of The Great American AI Act (GAAIA): a substantive federal framework to govern how the most powerful AI models are built, tested, and deployed. It names Independent Verification Organizations explicitly. Under the draft, IVOs licensed by the Center for AI Standards and Innovation (CAISI) would provide ongoing oversight of frontier models for catastrophic risk. If a system falls out of compliance or poses an imminent threat, the IVO notifies the U.S. Attorney General and the attorneys general of states that opt in, who, depending on the infraction, can then seek injunctive relief or pursue penalties in court. There are many details that need to be ironed out in the draft, but that is real accountability paired with due process. Releasing it as a discussion draft, rather than a finished bill, was the right move: it invites the whole community to pressure-test and strengthen the language in the open.
The labs have moved in a similar direction. The following week, Anthropic published its Advanced AI Framework, alongside an essay by its CEO Dario Amodei, Policy on the AI Exponential. The framework states plainly that “self-assessment is not enough”—that developers cannot grade their own homework. Instead, it calls for qualified independent evaluators, paired with a real effort to build that ecosystem through standards, licensing, and pooled funding, including safeguards against developers shopping for lenient evaluators.
OpenAI has delved into the technical foundation. Its May 29 playbook for trustworthy third-party evaluations emphasizes that what a model can do depends on the system around it, not the model alone. Performance turns on the “harness”: the tools, scaffolding, memory, and budget an evaluator wraps around the model. Test the model in isolation and you are crash-testing the engine, not the car; credible verification has to assess the whole vehicle. Days later, OpenAI released a blueprint for democratic governance of frontier AI, calling for policymakers to have ongoing visibility into matters like progress toward recursive self-improvement (RSI), high-capability internal deployments, frontier model security, internal monitoring, and whether safeguards actually work.
The signals reach beyond Anthropic and OpenAI. On Kevin Frazier’s Scaling Laws podcast on June 2, Owen Larter, Google DeepMind’s Head of Frontier Policy and Public Affairs, argued for “a fluid, dynamic, standards-based approach to governance.” The approach he describes—keeping pace with the technology by drawing on expertise beyond any single lab, and treating governance as an ongoing scientific challenge rather than a one-time rulebook—points in a similar direction.
These positions build on concrete law, not just argument. Virginia became the first state to advance the IVO model through legislation in April, directing a formal study of the framework. Connecticut followed in May, authorizing a voluntary IVO pilot in which the state sets safety outcomes, independent experts verify whether products meet them, and participating companies earn a trusted market signal.
So the “what” is now widely shared. The “how” is where the work begins. The recent bills and proposals differ in their details, but the strongest versions converge on a few elements we consider essential to the model.
Real evaluation, not a transparency audit. IVOs are not there merely to confirm that developers are doing what they claim. They are required to examine the models themselves, using their own tools and methods, to verify that a system does not pose a threat of catastrophic harm. That is not an audit of a company’s paperwork; it is an independent crash-test of the product. The federal framework calls for exactly this.
Accountability: a real mechanism to stop a dangerous deployment. Here the proposals run the gamut. GAAIA empowers attorneys general to seek injunctive relief in court: accountability without handing government a unilateral veto. Anthropic’s framework goes further, arguing that government should be able to block or deter deployment of a model that third-party assessment finds to present unacceptable risk. OpenAI’s blueprint goes least far, granting government an advisory rather than a blocking role. Where that line falls is one of the central design questions.
Separation of builder, tester, and standard-setter. All three proposals separate the builder from the tester, but not equally. OpenAI’s version is the weakest: it would have developers retain their own third-party auditor, rather than a government-licensed corps of evaluators with conflict-of-interest and financial-independence protections. The same gap shows up in standard-setting: GAAIA and Anthropic have CAISI set the safety outcomes, while OpenAI would have CAISI set only the audit methodology. Outcome standards are the stronger choice: they fix what “safe enough” means, rather than only how to measure it.
Other questions are genuinely open, and should be settled in public rather than asserted. Should there be a single IVO or a competitive marketplace? Should verification reach only catastrophic risk, or extend to tort-like harms to person and property? Reasonable people will disagree, and the market and the public should have a large say.
Our own position is this: where the stakes are catastrophic—cyber, CBRN, loss of control, recursive self-improvement—they are too high to leave participation optional. For harms to person and property, a voluntary system is the right tool, pulled forward by real incentives: market advantage, legal clarity, and access to insurance. We want a growing ecosystem of IVOs across the board, while recognizing that a competitive marketplace fits the voluntary tier more naturally than the mandatory one.
The IVO model has moved from proposal to live national debate, and these choices now have to be argued in the open. The architecture is no longer theoretical. The only question left is whether we build it well, and fast enough to meet the moment.

